Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting on a wildcard field name uses the same highlighter for all fields that match #5175

Closed
ccw-morris opened this issue Feb 19, 2014 · 2 comments · Fixed by #5223
Closed

Comments

@ccw-morris
Copy link

HighlightPhase.hitExecute(SearchContext context, HitContext hitContext)

Assume that the field object represents a wildcard field (i.e. field.field() returns "*")
Assume that the index contains two fields:
field_1 with "index_options":"freq"
field_2 with "index_options":"offsets"
Assume that a highlighter has not been explicitly specified (i.e. field.highlighterType()==null)

The object fieldNamesToHighlight will contain both of the above field names.
The first pass through the for (String fieldName : fieldNamesToHighlight) loop will evaluate the statement: if (field.highlighterType() == null){} to true.
The block will evaluate the index_options of the first field (from a HashSet, so essentially random choice) to determine the appropriate highlighter. The highlighter will then be set using field.highlighterType("postings").

Subsequent executions of the fieldNamesToHighlight for loop will evaluate the statement: if (field.highlighterType() == null){} to false, based on the settings from the first execution.

Result: The highlighter chosen for the (arbitrary) first field will be used for all subsequent fields. If the selected highlighter is not the plain highlighter then there is the potential for the code to throw an IllegalArgumentException due to a mismatch between the highlighter and the indexing options of the field.

The solution is to treat the user provided configuration (the field object) as immutable. For each iteration of the fieldNamesToHighlight loop, extract field.highlighterType() to a local variable. Test that local variable for null - and set to a selected highlighter as appropriate.

@ccw-morris
Copy link
Author

For completeness, I can reproduce it with the following commands:

curl -XPUT 'http://localhost:9202/test' -d '{ "mappings": { "document": { "properties": { "field1": { "type":"string" }, "field2": { "type":"string", "index_options":"offsets" } } } } }'
curl -XPUT 'http://localhost:9202/test/document/1' -d '{ "field2":"This is the first field", "field1":"This is the second field" }'
curl -XPOST 'http://localhost:9202/_search' -d '{ "query": { "term": { "field2": "field" } }, "highlight": { "fields": { "field*": {} } } }'

However, please note that the bug relies on a specific ordering of a HashSet of String objects (field names). To reproduce it you may have to modify which field name has postings. Also, you may have to modify the regular expression that selects the fields: "field_" and "_" seems to produce different results. On production I can get the error with "*", but not with the test scripts above.

@javanna javanna self-assigned this Feb 19, 2014
@javanna
Copy link
Member

javanna commented Feb 20, 2014

Great catch @ccw-morris , will fix this!

@javanna javanna added the bug label Feb 20, 2014
javanna added a commit that referenced this issue Feb 24, 2014
…oneously updating it, as it doesn't necessarily map to a single field

A Field instance can map to multiple actual fields when using wildcard expressions. Each actual field should use the proper highlighter depending on the available data structure (e.g. term_vectors), while we currently select the highlighter for the first field and we keep using the same for all the fields that match the wildcard expression.

Modified also how the PercolateContext sets the forceSource option, in a global manner now rather than per field.

Closes #5175
javanna added a commit to javanna/elasticsearch that referenced this issue Feb 24, 2014
…oneously updating it, as it doesn't necessarily map to a single field

A Field instance can map to multiple actual fields when using wildcard expressions. Each actual field should use the proper highlighter depending on the available data structure (e.g. term_vectors), while we currently select the highlighter for the first field and we keep using the same for all the fields that match the wildcard expression.

Modified also how the PercolateContext sets the forceSource option, in a global manner now rather than per field.

Closes elastic#5175
javanna added a commit that referenced this issue Feb 24, 2014
…oneously updating it, as it doesn't necessarily map to a single field

A Field instance can map to multiple actual fields when using wildcard expressions. Each actual field should use the proper highlighter depending on the available data structure (e.g. term_vectors), while we currently select the highlighter for the first field and we keep using the same for all the fields that match the wildcard expression.

Modified also how the PercolateContext sets the forceSource option, in a global manner now rather than per field.

Closes #5175
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…oneously updating it, as it doesn't necessarily map to a single field

A Field instance can map to multiple actual fields when using wildcard expressions. Each actual field should use the proper highlighter depending on the available data structure (e.g. term_vectors), while we currently select the highlighter for the first field and we keep using the same for all the fields that match the wildcard expression.

Modified also how the PercolateContext sets the forceSource option, in a global manner now rather than per field.

Closes elastic#5175
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…oneously updating it, as it doesn't necessarily map to a single field

A Field instance can map to multiple actual fields when using wildcard expressions. Each actual field should use the proper highlighter depending on the available data structure (e.g. term_vectors), while we currently select the highlighter for the first field and we keep using the same for all the fields that match the wildcard expression.

Modified also how the PercolateContext sets the forceSource option, in a global manner now rather than per field.

Closes elastic#5175
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants