Add support for char filters in the analyze API #5148

brusic · 2014-02-18T04:31:33Z

Allow char filters to be used in the analyze API. Potentially breaks AnalyzeRequest serialization. The REST action contains the now ambiguous 'filter' parameter, which will denote a 'token_filter', not a 'char_filter'.

One additional item I noticed is the exception message for invalid token filters. To me it appears like an overzealous copy and paste, but should the exception contain the token filter name, not the tokenizer? I can fix this item as well.

Example:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java?source=cc#L173

javanna · 2014-02-18T09:48:13Z

src/main/java/org/elasticsearch/action/admin/indices/analyze/AnalyzeRequest.java

@@ -142,6 +153,13 @@ public void readFrom(StreamInput in) throws IOException {
                tokenFilters[i] = in.readString();
            }
        }
+        size = in.readVInt();


we can make this backwards compatible by checking the version of the node we talk to. Assuming that this feature will be released with 1.1.0, you can do the following in readFrom:

if (in.getVersion().onOrAfter(Version.1_1_0)) { //read the newly added bits }

and the following in writeTo:

if (out.getVersion().onOrAfter(Version.1_1_0)) { //write the newly added bits }

javanna · 2014-02-18T10:18:51Z

Thanks for your PR @brusic ! Looks good, I left a few comments, if you can fix those we can get this in soon ;)

brusic · 2014-02-18T15:41:07Z

Made the changes suggested. I went ahead and changed the exception messages that I referenced above. I do believe they were incorrect.

s1monw · 2014-02-19T16:26:27Z

src/main/java/org/elasticsearch/action/admin/indices/analyze/AnalyzeRequest.java

@@ -142,6 +154,15 @@ public void readFrom(StreamInput in) throws IOException {
                tokenFilters[i] = in.readString();
            }
        }
+        if (in.getVersion().onOrAfter(Version.V_1_1_0)) {
+            size = in.readVInt();


can't we sue StreamInput#readStringArray() here?

brusic · 2014-02-19T17:13:06Z

All great comments. I was adhering to the standards that were defined in each file and not the global Elasticsearch guidelines. Great way to learn them. :) Will make the appropriate changes after work.

Should I contain using Version.V_1_1_0 as the serialization check? Somewhat of a chicken and the egg problem, especially when you do not commit/release directly. I did notice there are issues tagged v1.0.1, but there is no corresponding Version in the master branch (it does exist in 1.0). No rush for a release on my behalf (easy workaround by using a custom analyzer), looking more for guidelines.

s1monw · 2014-02-19T17:21:58Z

hey @brusic no worries about the guidelines - we do reviews everytime so we carry over knowledge! The change looks good though mostly cosmetics! I think you should keep the Version.V_1_1_0 since this is not a bugfix so it won't go to 1.0.1 - not sure if we will ever release that version.

javanna · 2014-02-26T12:32:00Z

src/main/java/org/elasticsearch/action/admin/indices/analyze/AnalyzeRequest.java

@@ -102,6 +107,7 @@ public String tokenizer() {
    }

    public AnalyzeRequest tokenFilters(String... tokenFilters) {
+        if (tokenFilters == null) throw new ElasticsearchIllegalArgumentException("token filters must not be null");


can you please add brackets to this statement and move the throw on a new line?

javanna · 2014-02-26T12:44:14Z

Hey @brusic sorry it took a while, I left a few comments, if you can address those this is ready to be pushed. Thanks!

brusic · 2014-02-26T15:47:39Z

I thought I had a line note reply to Simon's comment, but apparently I do not.

As of now, the setters are inconsistent. Some check for a null value, others do not. Also, the setters prevalidate the input despite the existence of an explicit validate method. I prefer consistency over avoiding null checks. :)

Either way, the ultimate endgoal of validating the input is achieved. If you want, I can extend the removal of null checks to TransportAnalyzeAction. In this case, one level of nested ifs is removed and the factory array does not need to have two assignments.

EDIT: the comment is in fact still there , just hidden: #5148 (comment)

javanna · 2014-02-28T12:30:10Z

Thanks @brusic I read the hidden comments around consistency...agreed... IMO we can keep the checks only in the validate as this is how we do things in most of the cases.

If you can do that I think this is ready.

javanna · 2014-03-06T11:24:27Z

Thanks @brusic !

Closes #5148

Added support for char filters in the analyze API

554a08a

javanna reviewed Feb 18, 2014
View reviewed changes

javanna self-assigned this Feb 18, 2014

Made change for elastic#5148 suggested by @javanna

e236b33

s1monw reviewed Feb 19, 2014
View reviewed changes

brusic added 3 commits February 21, 2014 06:59

Added chained token filter and char filter tests.

187846c

Opt for empty arrays over null ones. Use Stream string array methods.

d0dc1e3

Check for nulls in setters.

36235ac

javanna reviewed Feb 26, 2014
View reviewed changes

Enforce null checks with setters only

1fe04bf

Move null checks back into the validate method

8eb4170

javanna added enhancement labels Mar 6, 2014

javanna closed this in 95274c1 Mar 6, 2014

javanna pushed a commit that referenced this pull request Mar 6, 2014

Added support for char filters in the analyze API

593d94d

Closes #5148

clintongormley added the :Search/Analysis How text is split into tokens label Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for char filters in the analyze API #5148

Add support for char filters in the analyze API #5148

brusic commented Feb 18, 2014

javanna Feb 18, 2014

javanna commented Feb 18, 2014

brusic commented Feb 18, 2014

s1monw Feb 19, 2014

brusic commented Feb 19, 2014

s1monw commented Feb 19, 2014

javanna Feb 26, 2014

javanna commented Feb 26, 2014

brusic commented Feb 26, 2014

javanna commented Feb 28, 2014

javanna commented Mar 6, 2014

Navigation Menu

Add support for char filters in the analyze API #5148

Add support for char filters in the analyze API #5148

Conversation

brusic commented Feb 18, 2014

javanna Feb 18, 2014

Choose a reason for hiding this comment

javanna commented Feb 18, 2014

brusic commented Feb 18, 2014

s1monw Feb 19, 2014

Choose a reason for hiding this comment

brusic commented Feb 19, 2014

s1monw commented Feb 19, 2014

javanna Feb 26, 2014

Choose a reason for hiding this comment

javanna commented Feb 26, 2014

brusic commented Feb 26, 2014

javanna commented Feb 28, 2014

javanna commented Mar 6, 2014