New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for char filters in the analyze API #5148
Conversation
@@ -142,6 +153,13 @@ public void readFrom(StreamInput in) throws IOException { | |||
tokenFilters[i] = in.readString(); | |||
} | |||
} | |||
size = in.readVInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can make this backwards compatible by checking the version of the node we talk to. Assuming that this feature will be released with 1.1.0
, you can do the following in readFrom
:
if (in.getVersion().onOrAfter(Version.1_1_0)) {
//read the newly added bits
}
and the following in writeTo
:
if (out.getVersion().onOrAfter(Version.1_1_0)) {
//write the newly added bits
}
Thanks for your PR @brusic ! Looks good, I left a few comments, if you can fix those we can get this in soon ;) |
Made the changes suggested. I went ahead and changed the exception messages that I referenced above. I do believe they were incorrect. |
@@ -142,6 +154,15 @@ public void readFrom(StreamInput in) throws IOException { | |||
tokenFilters[i] = in.readString(); | |||
} | |||
} | |||
if (in.getVersion().onOrAfter(Version.V_1_1_0)) { | |||
size = in.readVInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we sue StreamInput#readStringArray()
here?
All great comments. I was adhering to the standards that were defined in each file and not the global Elasticsearch guidelines. Great way to learn them. :) Will make the appropriate changes after work. Should I contain using Version.V_1_1_0 as the serialization check? Somewhat of a chicken and the egg problem, especially when you do not commit/release directly. I did notice there are issues tagged v1.0.1, but there is no corresponding Version in the master branch (it does exist in 1.0). No rush for a release on my behalf (easy workaround by using a custom analyzer), looking more for guidelines. |
hey @brusic no worries about the guidelines - we do reviews everytime so we carry over knowledge! The change looks good though mostly cosmetics! I think you should keep the |
@@ -102,6 +107,7 @@ public String tokenizer() { | |||
} | |||
|
|||
public AnalyzeRequest tokenFilters(String... tokenFilters) { | |||
if (tokenFilters == null) throw new ElasticsearchIllegalArgumentException("token filters must not be null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add brackets to this statement and move the throw
on a new line?
Hey @brusic sorry it took a while, I left a few comments, if you can address those this is ready to be pushed. Thanks! |
I thought I had a line note reply to Simon's comment, but apparently I do not. As of now, the setters are inconsistent. Some check for a null value, others do not. Also, the setters prevalidate the input despite the existence of an explicit validate method. I prefer consistency over avoiding null checks. :) Either way, the ultimate endgoal of validating the input is achieved. If you want, I can extend the removal of null checks to TransportAnalyzeAction. In this case, one level of nested ifs is removed and the factory array does not need to have two assignments. EDIT: the comment is in fact still there , just hidden: #5148 (comment) |
Thanks @brusic I read the hidden comments around consistency...agreed... IMO we can keep the checks only in the If you can do that I think this is ready. |
Thanks @brusic ! |
Allow char filters to be used in the analyze API. Potentially breaks AnalyzeRequest serialization. The REST action contains the now ambiguous 'filter' parameter, which will denote a 'token_filter', not a 'char_filter'.
One additional item I noticed is the exception message for invalid token filters. To me it appears like an overzealous copy and paste, but should the exception contain the token filter name, not the tokenizer? I can fix this item as well.
Example:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java?source=cc#L173