Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for char filters in the analyze API #5148

Closed
wants to merge 7 commits into from

Conversation

brusic
Copy link
Contributor

@brusic brusic commented Feb 18, 2014

Allow char filters to be used in the analyze API. Potentially breaks AnalyzeRequest serialization. The REST action contains the now ambiguous 'filter' parameter, which will denote a 'token_filter', not a 'char_filter'.

One additional item I noticed is the exception message for invalid token filters. To me it appears like an overzealous copy and paste, but should the exception contain the token filter name, not the tokenizer? I can fix this item as well.

Example:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java?source=cc#L173

@@ -142,6 +153,13 @@ public void readFrom(StreamInput in) throws IOException {
tokenFilters[i] = in.readString();
}
}
size = in.readVInt();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can make this backwards compatible by checking the version of the node we talk to. Assuming that this feature will be released with 1.1.0, you can do the following in readFrom:

if (in.getVersion().onOrAfter(Version.1_1_0)) {
    //read the newly added bits
}

and the following in writeTo:

if (out.getVersion().onOrAfter(Version.1_1_0)) {
    //write the newly added bits
}

@javanna javanna self-assigned this Feb 18, 2014
@javanna
Copy link
Member

javanna commented Feb 18, 2014

Thanks for your PR @brusic ! Looks good, I left a few comments, if you can fix those we can get this in soon ;)

@brusic
Copy link
Contributor Author

brusic commented Feb 18, 2014

Made the changes suggested. I went ahead and changed the exception messages that I referenced above. I do believe they were incorrect.

@@ -142,6 +154,15 @@ public void readFrom(StreamInput in) throws IOException {
tokenFilters[i] = in.readString();
}
}
if (in.getVersion().onOrAfter(Version.V_1_1_0)) {
size = in.readVInt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we sue StreamInput#readStringArray() here?

@brusic
Copy link
Contributor Author

brusic commented Feb 19, 2014

All great comments. I was adhering to the standards that were defined in each file and not the global Elasticsearch guidelines. Great way to learn them. :) Will make the appropriate changes after work.

Should I contain using Version.V_1_1_0 as the serialization check? Somewhat of a chicken and the egg problem, especially when you do not commit/release directly. I did notice there are issues tagged v1.0.1, but there is no corresponding Version in the master branch (it does exist in 1.0). No rush for a release on my behalf (easy workaround by using a custom analyzer), looking more for guidelines.

@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2014

hey @brusic no worries about the guidelines - we do reviews everytime so we carry over knowledge! The change looks good though mostly cosmetics! I think you should keep the Version.V_1_1_0 since this is not a bugfix so it won't go to 1.0.1 - not sure if we will ever release that version.

@@ -102,6 +107,7 @@ public String tokenizer() {
}

public AnalyzeRequest tokenFilters(String... tokenFilters) {
if (tokenFilters == null) throw new ElasticsearchIllegalArgumentException("token filters must not be null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add brackets to this statement and move the throw on a new line?

@javanna
Copy link
Member

javanna commented Feb 26, 2014

Hey @brusic sorry it took a while, I left a few comments, if you can address those this is ready to be pushed. Thanks!

@brusic
Copy link
Contributor Author

brusic commented Feb 26, 2014

I thought I had a line note reply to Simon's comment, but apparently I do not.

As of now, the setters are inconsistent. Some check for a null value, others do not. Also, the setters prevalidate the input despite the existence of an explicit validate method. I prefer consistency over avoiding null checks. :)

Either way, the ultimate endgoal of validating the input is achieved. If you want, I can extend the removal of null checks to TransportAnalyzeAction. In this case, one level of nested ifs is removed and the factory array does not need to have two assignments.

EDIT: the comment is in fact still there , just hidden: #5148 (comment)

@javanna
Copy link
Member

javanna commented Feb 28, 2014

Thanks @brusic I read the hidden comments around consistency...agreed... IMO we can keep the checks only in the validate as this is how we do things in most of the cases.

If you can do that I think this is ready.

@javanna javanna closed this in 95274c1 Mar 6, 2014
@javanna
Copy link
Member

javanna commented Mar 6, 2014

Thanks @brusic !

javanna pushed a commit that referenced this pull request Mar 6, 2014
@clintongormley clintongormley added the :Search/Analysis How text is split into tokens label Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants