Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to specify the analyzer used for each Field #6329

Closed

Conversation

alexksikes
Copy link
Contributor

When using multiple items, the user may want to specify which analyzer to use
for each field. Previously, either the analyzer specified by 'analyzer' would
be used for all the fields, or if not set, the analyzer associated with the
field would be chosen. This commit provides the ability to fine grain which
analyzer should be used for each field by providing a new 'fields_analyzer'
parameter to the More Like This Query.

…or each field.

When using multiple items, the user may want to specify which analyzer to use
for each field. Previously, either the analyzer specified by 'analyzer' would
be used for all the fields, or if not set, the analyzer associated with the
field would be chosen. This commit provides the ability to fine grain which
analyzer should be used for each field by providing a new 'fields_analyzer'
parameter to the More Like This Query.
fieldsAnalyzer = Maps.newHashMap();
while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {
String field = parseContext.indexName(parser.text());
parser.nextToken();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we assert here that the the next token is actually a value?

MoreLikeThisQuery mlt = new MoreLikeThisQuery();
mlt.setMoreLikeFields(new String[] {fieldName});
mlt.setLikeText(likeText);
mlt.setAnalyzer(mltQuery.getAnalyzer());
if (fieldAnalyzers != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I only have a style problem here. IMO this is harder than it needs to be though... IMO we should require that fieldAnalyzers is non-null ie. use Collections.emptyMap() as a default. then we can just do this:

Analyzer analyzer = fieldAnalyzer.get(fieldName);
if (analyzer == null) {
  analyzer = defaultAnalyzer;
}
mlt.setAnalyzer(analyzer);

I think the mlt.setAnalyzer(mltQuery.getAnalyzer()); is obsolete then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the analyzer of the fields for each item always default to the one associated with the field, regardless of the value of analyzer. Essentially this makes analyzer the analyzer of the like_text only. Previously we would be using analyzer for all fields if specified. So what should be the desired behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, today we can specify an analyzer that overrides everything. If it is set it can only be beaten by a fieldAnalyzer associated with a field. If the analyzer is not set we use the default analyzer. IMO what I suggested is equivalent with what you had? do I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we pass the var analyzer to addMoreLikeThis instead of DefaultAnalyzer then it is almost right, only that I wanted to have that if field_analyzers is specified (even if empty) then the default analyzer is always overridden by the one associated with the field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it sorry :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the user specifies analyzer and some fields in field_analyzers. What should be the default analyzer associated to the unspecified fields in field_analyzers? Should it be the default for the field or the one specified by analyzer? Maybe @clintongormley has some ideas.

@s1monw s1monw removed the review label Jun 12, 2014
@s1monw
Copy link
Contributor

s1monw commented Jun 12, 2014

left a small comment - it's close

@s1monw
Copy link
Contributor

s1monw commented Jun 18, 2014

LGTM

@alexksikes
Copy link
Contributor Author

This should be integrated to the term vector APIs? Closing for now.

@alexksikes alexksikes closed this Aug 23, 2014
@clintongormley clintongormley changed the title More Like This Query: Adds the ability to specify the analyzer used for each Field Add the ability to specify the analyzer used for each Field Jun 6, 2015
@clintongormley clintongormley added the :Search/Search Search-related issues that do not fall into other categories label Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v1.4.0.Beta1 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants