Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for artificial documents #7530

Closed
wants to merge 8 commits into from

Conversation

alexksikes
Copy link
Contributor

This adds the ability to the Term Vector API to generate term vectors for
artifical documents, that is for documents not present in the index. Following
a similar syntax to the Percolator API, a new 'doc' parameter is used, instead
of '_id', that specifies the document of interest. The parameters '_index' and
'_type' determine the mapping and therefore analyzers to apply to each value
field.

This adds the ability to the Term Vector API to generate term vectors for
artifical documents, that is for documents not present in the index. Following
a similar syntax to the Percolator API, a new 'doc' parameter is used, instead
of '_id', that specifies the document of interest. The parameters '_index' and
'_type' determine the mapping and therefore analyzers to apply to each value
field.
numbers have no meaning in this context.
numbers have no meaning in this context. By default, when requesting
term vectors of artificial documents, a shard to get the statistics from
is randomly selected. Use `routing` only to hit a particular shard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the term vectors API currently return statistics that are aggregated across all shards? Documentation suggests so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope it does not.

"The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

private String routing;

protected String preference;

private static AtomicInteger randomInt = new AtomicInteger(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it final?

@jpountz
Copy link
Contributor

jpountz commented Sep 3, 2014

@alexksikes I left some comments

@jpountz jpountz removed the review label Sep 3, 2014
@alexksikes
Copy link
Contributor Author

@jpountz Thanks for comments. We should decide on allowing dynamic mappings or not, and if not what would be the easiest way to implement it? I'd be in favor of disabling dynamic mapping and only returning the TVs from the fields found in the original mapping. That because there is just too much room for mistakes and unintended behaviors. Maybe @clintongormley has some ideas?

that is for documents not present in the index. The syntax is similar to the
<<search-percolate,percolator>> API. For example, the following request would
return the same results as in example 1. The mapping used is determined by the
`index` and `type`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a note about the fact that it can introduce new mappings?

@jpountz
Copy link
Contributor

jpountz commented Sep 4, 2014

I left some comments but I think it is close

@jpountz
Copy link
Contributor

jpountz commented Sep 4, 2014

LGTM

@jpountz jpountz removed the review label Sep 4, 2014
@alexksikes alexksikes closed this in 07d741c Sep 5, 2014
@alexksikes alexksikes deleted the feature/termvector-docs branch September 5, 2014 05:54
alexksikes added a commit that referenced this pull request Sep 5, 2014
This adds the ability to the Term Vector API to generate term vectors for
artifical documents, that is for documents not present in the index. Following
a similar syntax to the Percolator API, a new 'doc' parameter is used, instead
of '_id', that specifies the document of interest. The parameters '_index' and
'_type' determine the mapping and therefore analyzers to apply to each value
field.

Closes #7530
alexksikes added a commit that referenced this pull request Sep 8, 2014
This adds the ability to the Term Vector API to generate term vectors for
artifical documents, that is for documents not present in the index. Following
a similar syntax to the Percolator API, a new 'doc' parameter is used, instead
of '_id', that specifies the document of interest. The parameters '_index' and
'_type' determine the mapping and therefore analyzers to apply to each value
field.

Closes #7530
Mpdreamz added a commit that referenced this pull request Dec 12, 2014
@clintongormley clintongormley changed the title Term Vectors: Support for artificial documents Support for artificial documents Jun 6, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Term Vectors labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v1.4.0.Beta1 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants