Support terms filtering #9561

alexksikes · 2015-02-04T14:38:36Z

This adds a new feature to the Term Vectors API which allows for filtering of
terms based on their tf-idf scores. With dfs option on, this could be useful
for finding out a good characteristic vector of a document or a set of documents.
The parameters are similar to the ones used in the MLT Query.

This adds a new feature to the Term Vectors API which allows for filtering of terms based on their tf-idf scores. With `dfs` option on, this could be useful for finding out a good characteric vector of a document or a set of documents. The parameters are similar to the ones used in the MLT Query.

jpountz · 2015-02-04T15:35:18Z

src/main/java/org/elasticsearch/action/termvectors/TermVectorsFilter.java

+            topLevelTermsEnum = topLevelTerms.iterator(topLevelTermsEnum);
+            while (termsEnum.next() != null) {
+                BytesRef termBytesRef = termsEnum.term();
+                topLevelTermsEnum.seekExact(termBytesRef);


Is it ok to ignore the return value?

Can you just assert that it returns true?

jpountz · 2015-02-04T15:40:16Z

Looks good overall, can you add some tests?

clintongormley · 2015-02-23T12:12:22Z

docs/reference/docs/termvectors.asciidoc

@@ -86,6 +86,40 @@ or the field statistics of the entire index, and not just at the shard. Use it
 with caution as distributed frequencies can have a serious performance impact.

 [float]
+==== Terms Filtering coming[2.0]
+
+With the parameter `filter`, the terms returned could also be filtered based


I'm nervous about using the word filter here, as we try to reserve that purely for query DSL filters (although we do use it for fielddata filters as well http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html#field-data-filtering). I'm struggling to come up with a better name though.

I'm wondering if we need this sub-level at all, or if we can just move the max_num_terms etc parameters up to the top level? There doesn't seem to be any name clash (although I do like the fact that having them as sub-params indicates their action more clearly).

One nice thing about using filter as a sub-level parameter is that filter : {} with no parameter will perform filtering but using the default sub parameters. Another reason would be to hide the complexity of a feature which might be used only in some very specific cases. It also allows us to expand on this sub feature, in order, for example, to allow for filtering based on a script more cleanly I think. But I do agree, I am not sure if filter should be the proper name for this feature.

s1monw · 2015-03-20T20:46:16Z

@alexksikes are you picking this up again or can we close it?

alexksikes · 2015-03-20T21:30:47Z

Just need to add the tests for this. We would need this for Item Query also for which I wanted to write a dev issue to explain the plan.

jpountz · 2015-04-14T13:48:38Z

LGTM, just left a minor comment that does not need further review. However before pushing can you make sure to settle on the naming of this feature with @clintongormley ?

clintongormley · 2015-04-14T14:13:02Z

I can't think of a better name for this parameter, and it fits with fielddata filtering, so I think we should just stick to filter here (especially in light of the fact that query DSL filters will kinda go away in the future)

alexksikes added :Term Vectors v2.0.0-beta1 >feature labels Feb 4, 2015

jpountz reviewed Feb 4, 2015
View reviewed changes

drewr force-pushed the master branch from dcc3da0 to 7c20a8a Compare February 20, 2015 16:48

clintongormley reviewed Feb 23, 2015
View reviewed changes

comments + tests

fe02356

jpountz self-assigned this Apr 13, 2015

addressed comments

5287159

alexksikes removed the review label Apr 14, 2015

alexksikes closed this in d339ee4 Apr 14, 2015

alexksikes deleted the feature/tvs-terms-filtering branch April 14, 2015 17:18

clintongormley changed the title ~~Term Vectors: terms filtering~~ Support terms filtering Jun 6, 2015

alexksikes mentioned this pull request Jun 22, 2015

Item Query #11814

Closed

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Term Vectors labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support terms filtering #9561

Support terms filtering #9561

alexksikes commented Feb 4, 2015

jpountz Feb 4, 2015

jpountz Apr 14, 2015

jpountz commented Feb 4, 2015

clintongormley Feb 23, 2015

alexksikes Feb 23, 2015

s1monw commented Mar 20, 2015

alexksikes commented Mar 20, 2015

jpountz commented Apr 14, 2015

clintongormley commented Apr 14, 2015

Support terms filtering #9561

Support terms filtering #9561

Conversation

alexksikes commented Feb 4, 2015

jpountz Feb 4, 2015

Choose a reason for hiding this comment

jpountz Apr 14, 2015

Choose a reason for hiding this comment

jpountz commented Feb 4, 2015

clintongormley Feb 23, 2015

Choose a reason for hiding this comment

alexksikes Feb 23, 2015

Choose a reason for hiding this comment

s1monw commented Mar 20, 2015

alexksikes commented Mar 20, 2015

jpountz commented Apr 14, 2015

clintongormley commented Apr 14, 2015