New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to specify the analyzer used for each Field #6329
Conversation
…or each field. When using multiple items, the user may want to specify which analyzer to use for each field. Previously, either the analyzer specified by 'analyzer' would be used for all the fields, or if not set, the analyzer associated with the field would be chosen. This commit provides the ability to fine grain which analyzer should be used for each field by providing a new 'fields_analyzer' parameter to the More Like This Query.
fieldsAnalyzer = Maps.newHashMap(); | ||
while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) { | ||
String field = parseContext.indexName(parser.text()); | ||
parser.nextToken(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we assert here that the the next token is actually a value?
MoreLikeThisQuery mlt = new MoreLikeThisQuery(); | ||
mlt.setMoreLikeFields(new String[] {fieldName}); | ||
mlt.setLikeText(likeText); | ||
mlt.setAnalyzer(mltQuery.getAnalyzer()); | ||
if (fieldAnalyzers != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I only have a style problem here. IMO this is harder than it needs to be though... IMO we should require that fieldAnalyzers
is non-null ie. use Collections.emptyMap()
as a default. then we can just do this:
Analyzer analyzer = fieldAnalyzer.get(fieldName);
if (analyzer == null) {
analyzer = defaultAnalyzer;
}
mlt.setAnalyzer(analyzer);
I think the mlt.setAnalyzer(mltQuery.getAnalyzer());
is obsolete then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the analyzer of the fields for each item always default to the one associated with the field, regardless of the value of analyzer
. Essentially this makes analyzer
the analyzer of the like_text
only. Previously we would be using analyzer
for all fields if specified. So what should be the desired behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, today we can specify an analyzer that overrides everything. If it is set it can only be beaten by a fieldAnalyzer associated with a field. If the analyzer is not set we use the default analyzer. IMO what I suggested is equivalent with what you had? do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we pass the var analyzer
to addMoreLikeThis
instead of DefaultAnalyzer
then it is almost right, only that I wanted to have that if field_analyzers
is specified (even if empty) then the default analyzer is always overridden by the one associated with the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it sorry :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose the user specifies analyzer
and some fields in field_analyzers
. What should be the default analyzer associated to the unspecified fields in field_analyzers
? Should it be the default for the field or the one specified by analyzer
? Maybe @clintongormley has some ideas.
left a small comment - it's close |
LGTM |
This should be integrated to the term vector APIs? Closing for now. |
When using multiple items, the user may want to specify which analyzer to use
for each field. Previously, either the analyzer specified by 'analyzer' would
be used for all the fields, or if not set, the analyzer associated with the
field would be chosen. This commit provides the ability to fine grain which
analyzer should be used for each field by providing a new 'fields_analyzer'
parameter to the More Like This Query.