Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to using the multi-termvectors API #7014

Merged
merged 1 commit into from Aug 21, 2014

Conversation

alexksikes
Copy link
Contributor

The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of retrieving the texts using the multi-get API. The
terms are then directly passed to Lucene MLT.

private static Fields generateFields(String[] fieldNames, String text) throws IOException {
MemoryIndex index = new MemoryIndex();
for (String fieldName : fieldNames) {
index.addField(fieldName, text, new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_4_9));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use Lucene.VERSION instead of hardcoding 4.9?

@jpountz
Copy link
Contributor

jpountz commented Jul 30, 2014

Left one minor comment but other than that it looks good to me!

@jpountz jpountz removed the review label Jul 30, 2014
@alexksikes
Copy link
Contributor Author

Thank you. @s1monw would mind having a look at it as well?

@jpountz
Copy link
Contributor

jpountz commented Aug 21, 2014

LGTM

@jpountz jpountz removed the review label Aug 21, 2014
@@ -52,6 +53,14 @@ public MultiTermVectorsRequest add(String index, @Nullable String type, String i
return this;
}

public MultiTermVectorsRequest add(MultiGetRequest.Item item) {
TermVectorRequest request = new TermVectorRequest(item.index(), item.type(), item.id())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This internal request creation loses the headers and the context of the original request that generated it (the multi_get one), can we make sure we pass in the original request, add a new constructor to TermVectorRequest and make sure we call super there, which will finally copy headers and context from the original request?

public List<LikeText> fetch(List<MultiGetRequest.Item> items) throws IOException {
MultiGetRequest request = new MultiGetRequest();
public Fields[] fetch(List<MultiGetRequest.Item> items) throws IOException {
MultiTermVectorsRequest request = new MultiTermVectorsRequest();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the misleading comment, here is where we lose headers and request context, the items don't need them, only the main request. This one should be created passing in the original request, but that would be MoreLikeThisRequest not multi_get. Actually I'm confused on why we still have multi_get items but we don't use multi_get internally anymore

The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.

Closes elastic#7014
@alexksikes alexksikes merged commit f1a6b4e into elastic:master Aug 21, 2014
alexksikes added a commit that referenced this pull request Aug 21, 2014
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.

Closes #7014
@alexksikes alexksikes deleted the feature/mlt-use-termvectors branch August 27, 2014 17:17
alexksikes added a commit that referenced this pull request Sep 8, 2014
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.

Closes #7014
@clintongormley clintongormley changed the title More Like This Query: Switch to using the multi-termvectors API Switch to using the multi-termvectors API Jun 7, 2015
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :More Like This labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v1.4.0.Beta1 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants