New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to using the multi-termvectors API #7014
Switch to using the multi-termvectors API #7014
Conversation
private static Fields generateFields(String[] fieldNames, String text) throws IOException { | ||
MemoryIndex index = new MemoryIndex(); | ||
for (String fieldName : fieldNames) { | ||
index.addField(fieldName, text, new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_4_9)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use Lucene.VERSION
instead of hardcoding 4.9?
Left one minor comment but other than that it looks good to me! |
Thank you. @s1monw would mind having a look at it as well? |
LGTM |
@@ -52,6 +53,14 @@ public MultiTermVectorsRequest add(String index, @Nullable String type, String i | |||
return this; | |||
} | |||
|
|||
public MultiTermVectorsRequest add(MultiGetRequest.Item item) { | |||
TermVectorRequest request = new TermVectorRequest(item.index(), item.type(), item.id()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This internal request creation loses the headers and the context of the original request that generated it (the multi_get one), can we make sure we pass in the original request, add a new constructor to TermVectorRequest
and make sure we call super there, which will finally copy headers and context from the original request?
2238b39
to
9999917
Compare
public List<LikeText> fetch(List<MultiGetRequest.Item> items) throws IOException { | ||
MultiGetRequest request = new MultiGetRequest(); | ||
public Fields[] fetch(List<MultiGetRequest.Item> items) throws IOException { | ||
MultiTermVectorsRequest request = new MultiTermVectorsRequest(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the misleading comment, here is where we lose headers and request context, the items don't need them, only the main request. This one should be created passing in the original request, but that would be MoreLikeThisRequest
not multi_get. Actually I'm confused on why we still have multi_get items but we don't use multi_get internally anymore
The term vector API can now generate term vectors on the fly, if the terms are not already stored in the index. This commit exploits this new functionality for the MLT query. Now the terms are directly retrieved using multi- termvectors API, instead of generating them from the texts retrieved using the multi-get API. Closes elastic#7014
b2d2349
to
f1a6b4e
Compare
The term vector API can now generate term vectors on the fly, if the terms are not already stored in the index. This commit exploits this new functionality for the MLT query. Now the terms are directly retrieved using multi- termvectors API, instead of generating them from the texts retrieved using the multi-get API. Closes #7014
The term vector API can now generate term vectors on the fly, if the terms are not already stored in the index. This commit exploits this new functionality for the MLT query. Now the terms are directly retrieved using multi- termvectors API, instead of generating them from the texts retrieved using the multi-get API. Closes #7014
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of retrieving the texts using the multi-get API. The
terms are then directly passed to Lucene MLT.