New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for artificial documents #7530
Closed
Closed
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
e9921b4
Term Vectors: Support for artificial documents
alexksikes d033e59
flyweight instead of specifying an id
alexksikes 9556a27
fix for when shard has no doc indexed
alexksikes 4ec2acc
allow for term stats and mention to use routing
alexksikes 13640dd
fix and test for field not present in mapping
alexksikes 35cd69a
addressed comments
alexksikes 688f9b5
addressed comments and settled for dynamic mapping
alexksikes 33c6364
onOrAfter and note on dynamic mapping
alexksikes File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,10 +3,11 @@ | |
|
||
added[1.0.0.Beta1] | ||
|
||
Returns information and statistics on terms in the fields of a | ||
particular document as stored in the index. Note that this is a | ||
near realtime API as the term vectors are not available until the | ||
next refresh. | ||
Returns information and statistics on terms in the fields of a particular | ||
document. The document could be stored in the index or artificially provided | ||
by the user coming[1.4.0]. Note that for documents stored in the index, this | ||
is a near realtime API as the term vectors are not available until the next | ||
refresh. | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
|
@@ -41,10 +42,10 @@ statistics are returned for all fields but no term statistics. | |
* term payloads (`payloads` : true), as base64 encoded bytes | ||
|
||
If the requested information wasn't stored in the index, it will be | ||
computed on the fly if possible. See <<mapping-types,type mapping>> | ||
for how to configure your index to store term vectors. | ||
computed on the fly if possible. Additionally, term vectors could be computed | ||
for documents not even existing in the index, but instead provided by the user. | ||
|
||
coming[1.4.0,The ability to computed term vectors on the fly is only available from 1.4.0 onwards (see below)] | ||
coming[1.4.0,The ability to computed term vectors on the fly as well as support for artificial documents is only available from 1.4.0 onwards (see below example 2 and 3 respectively)] | ||
|
||
[WARNING] | ||
====== | ||
|
@@ -86,7 +87,9 @@ The term and field statistics are not accurate. Deleted documents | |
are not taken into account. The information is only retrieved for the | ||
shard the requested document resides in. The term and field statistics | ||
are therefore only useful as relative measures whereas the absolute | ||
numbers have no meaning in this context. | ||
numbers have no meaning in this context. By default, when requesting | ||
term vectors of artificial documents, a shard to get the statistics from | ||
is randomly selected. Use `routing` only to hit a particular shard. | ||
|
||
[float] | ||
=== Example 1 | ||
|
@@ -231,7 +234,7 @@ Response: | |
[float] | ||
=== Example 2 coming[1.4.0] | ||
|
||
Additionally, term vectors which are not explicitly stored in the index are automatically | ||
Term vectors which are not explicitly stored in the index are automatically | ||
computed on the fly. The following request returns all information and statistics for the | ||
fields in document `1`, even though the terms haven't been explicitly stored in the index. | ||
Note that for the field `text`, the terms are not re-generated. | ||
|
@@ -246,3 +249,29 @@ curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true' -d '{ | |
"field_statistics" : true | ||
}' | ||
-------------------------------------------------- | ||
|
||
[float] | ||
=== Example 3 coming[1.4.0] | ||
|
||
Additionally, term vectors can also be generated for artificial documents, | ||
that is for documents not present in the index. The syntax is similar to the | ||
<<search-percolate,percolator>> API. For example, the following request would | ||
return the same results as in example 1. The mapping used is determined by the | ||
`index` and `type`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you leave a note about the fact that it can introduce new mappings? |
||
|
||
[WARNING] | ||
====== | ||
If dynamic mapping is turned on (default), the document fields not in the original | ||
mapping will be dynamically created. | ||
====== | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
curl -XGET 'http://localhost:9200/twitter/tweet/_termvector' -d '{ | ||
"doc" : { | ||
"fullname" : "John Doe", | ||
"text" : "twitter test test test" | ||
} | ||
}' | ||
-------------------------------------------------- | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ | |
import org.elasticsearch.action.ActionListener; | ||
import org.elasticsearch.action.ActionRequestBuilder; | ||
import org.elasticsearch.client.Client; | ||
import org.elasticsearch.common.xcontent.XContentBuilder; | ||
|
||
/** | ||
*/ | ||
|
@@ -35,6 +36,38 @@ public TermVectorRequestBuilder(Client client, String index, String type, String | |
super(client, new TermVectorRequest(index, type, id)); | ||
} | ||
|
||
/** | ||
* Sets the index where the document is located. | ||
*/ | ||
public TermVectorRequestBuilder setIndex(String index) { | ||
request.index(index); | ||
return this; | ||
} | ||
|
||
/** | ||
* Sets the type of the document. | ||
*/ | ||
public TermVectorRequestBuilder setType(String type) { | ||
request.type(type); | ||
return this; | ||
} | ||
|
||
/** | ||
* Sets the id of the document. | ||
*/ | ||
public TermVectorRequestBuilder setId(String id) { | ||
request.id(id); | ||
return this; | ||
} | ||
|
||
/** | ||
* Sets the artificial document from which to generate term vectors. | ||
*/ | ||
public TermVectorRequestBuilder setDoc(XContentBuilder xContent) { | ||
request.doc(xContent); | ||
return this; | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add javadocs since there are user-facing APIs? |
||
/** | ||
* Sets the routing. Required if routing isn't id based. | ||
*/ | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the term vectors API currently return statistics that are aggregated across all shards? Documentation suggests so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope it does not.
"The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok