Return term vectors as part of the search response #10729

alexksikes · 2015-04-22T15:27:01Z

Adds a new parameter to the search API called term_vectors which takes as
input true, false or an object of parameters. The parameters are exactly
the same as the ones specified in the Term Vectors API, with the exception of
_index, _type, _id, doc, _routing, _version and _version_type.

Relates to #10823

Adds a new parameter to the search API called `term_vectors` which takes as input `true`, `false` or an `object` of parameters. The parameters are exactly the same as the ones specified in the Term Vectors API, with the exception of `_index`, `_type`, `_id`, `doc`, `_routing`, `_version` and `_version_type`.

brwe · 2015-04-22T16:57:56Z

Test suite does not seem to pass. Can you fix that before review?

brwe · 2015-04-23T12:22:29Z

docs/reference/search/request/termvectors.asciidoc

+The parameters are the same as for the <<docs-termvectors,Term Vectors API>>.
+Use `"term_vectors": true` with no parameters, to only return the term vectors
+stored for each document hit. This will ensure that if the term vectors are
+not stored, they will not be computed on the fly.


so, when I do not want term vectors to be generated on the fly if they are not there, then I cannot configure any options?

you'd do "term_vectors" : true, which will return the stored term vectors only.

jpountz · 2015-04-28T16:33:08Z

The code looks good to me but I have concerns about the API: the ability to get term vectors for every search hit sounds a bit esoteric to me and we have been leaning towards removing esoteric features recently. I will let other chime in but at the minimum I think this should be marked experimental.

alexksikes · 2015-04-28T16:57:08Z

Thanks for the review. I think this will be useful especially for NLP tasks. Note that term vectors are never returned by default.

clintongormley · 2015-04-29T10:10:00Z

I agree with @jpountz here - I think that the existing term vectors APIs, plus this integration, plus vectorizers (#10823) should be moved into a plugin to reduce the complexity in core. See this issue about reducing complexity in core: #10368

alexksikes · 2015-05-04T15:38:49Z

I agree that the vectorize API should be moved to a plugin. However, I think that this would also be useful for MLT to find documents similar to a group of documents provided by a given query. So I'm not sure anymore whether this PR should be moved to the vectorize plugin. Any thoughts?

alexksikes · 2015-06-29T16:01:46Z

What is the status quo concerning this PR. We can't remove the TVs API as it is being used by MLT. We agreed that the Vectorize API should be a plugin. However to its support this particular integration is useful:

for the Vectorize plugin
in MLT search by query scenario
in NLP tasks it could be useful to return the result of analysis for a set of documents.

Also this integration does not add much complexity and its usage is purely optional. On the other hand, it is not too difficult to leave this operation to the application client. It would just mean that the user would have to perform a lot of TVs requests for each document returned.

@brwe @jpountz @clintongormley WDYT?

jpountz · 2015-07-01T09:25:15Z

Also this integration does not add much complexity and its usage is purely optional.

I think any new parameter adds complexity by increasing the surface area of an API.

it is not too difficult to leave this operation to the application client

+1 on this option

It would just mean that the user would have to perform a lot of TVs requests for each document returned

You could still avoid the round trips with a multi-term-vectors request?

jpountz · 2015-07-01T13:26:12Z

I just talked with @brwe and one option I had not considered was to add this option to the search API as a plugin. If I'm not mistaken, you can't plug custom phases in the search API today, so this is infrastructure we would need to add. But then I'm worried about exposing even more internals of elasticsearch than today, since it would make it harder to perform internal refactorings without impacting plugins and would potentially allow plugins to "poison" internal workings of elasticsearch.

clintongormley · 2015-07-01T13:28:14Z

Another option we explored with @brwe was including a native script in a plugin, then the script can be called (with parameters) as a script_field in the search API.

brwe · 2015-07-01T13:35:13Z

just for reference, here is my workaround script for the vectorizer right now, term_vectors would look similar: https://github.com/brwe/es-token-plugin/blob/master/src/main/java/org/elasticsearch/script/SparseVectorizerScript.java
just don't look too closely, just a hack...

alexksikes · 2015-07-01T15:11:00Z

If understand correctly in order to have TVs returned as part of a scan and scroll request, I would have to:

either build the infrastructure to allow for fetch phase extensions
or perform a mTVs request on the client side
use a script_field
adds complexity for so far only one use case
is not doable on thousands of documents. This would be a huge request without the session mechanism of scan / scroll, and there may also be version conflicts.
seems to be more of hack

I'm sorry to insist but I think that support for this option is a natural one. We just ask for more information than just the source and it would make implementing certain features (vectorizer for example as a plugin) a lot easier.

brwe · 2015-07-22T16:10:58Z

Since one of the arguments against pluggable sub-phases was that it might make it harder to maintain the code I took a closer look at what would need to be done. I made a pr with a prof of concept here to show what would happen: #12400
I now actually think we would remove a little complexity by this. Let me know what you think!

kevinkluge added the in progress label Apr 22, 2015

alexksikes added >feature v2.0.0-beta1 review :Search/Search Search-related issues that do not fall into other categories v1.6.0 labels Apr 22, 2015

alexksikes added 2 commits April 22, 2015 19:44

make test suite pass

6d53959

some minor changes in tests

bcec6e5

brwe reviewed Apr 23, 2015
View reviewed changes

alexksikes closed this Apr 28, 2015

kevinkluge removed in progress labels Apr 28, 2015

alexksikes reopened this Apr 28, 2015

alexksikes added the stalled label May 20, 2015

clintongormley removed the v1.6.0 label May 29, 2015

clintongormley changed the title ~~Search API: returns term vectors as part of the search response~~ Return term vectors as part of the search response Jun 6, 2015

clintongormley removed the :Search/Search Search-related issues that do not fall into other categories label Jun 6, 2015

jpountz closed this Jul 1, 2015

brwe mentioned this pull request Jul 22, 2015

Make fetch sub phases pluggable #12400

Merged

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Term Vectors labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return term vectors as part of the search response #10729

Return term vectors as part of the search response #10729

alexksikes commented Apr 22, 2015

brwe commented Apr 22, 2015

brwe Apr 23, 2015

alexksikes Apr 23, 2015

jpountz commented Apr 28, 2015

alexksikes commented Apr 28, 2015

clintongormley commented Apr 29, 2015

alexksikes commented May 4, 2015

alexksikes commented Jun 29, 2015

jpountz commented Jul 1, 2015

jpountz commented Jul 1, 2015

clintongormley commented Jul 1, 2015

brwe commented Jul 1, 2015

alexksikes commented Jul 1, 2015

brwe commented Jul 22, 2015

Return term vectors as part of the search response #10729

Return term vectors as part of the search response #10729

Conversation

alexksikes commented Apr 22, 2015

brwe commented Apr 22, 2015

brwe Apr 23, 2015

Choose a reason for hiding this comment

alexksikes Apr 23, 2015

Choose a reason for hiding this comment

jpountz commented Apr 28, 2015

alexksikes commented Apr 28, 2015

clintongormley commented Apr 29, 2015

alexksikes commented May 4, 2015

alexksikes commented Jun 29, 2015

jpountz commented Jul 1, 2015

jpountz commented Jul 1, 2015

clintongormley commented Jul 1, 2015

brwe commented Jul 1, 2015

alexksikes commented Jul 1, 2015

brwe commented Jul 22, 2015