Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the lowercase_expanded_terms and locale options from (simple_)query_string. #19057

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Jun 24, 2016

This pull request uses the MultiTermAwareComponent interface in order to
figure out how to deal with queries that match partial strings. This provides a
better out-of-the-box experience and allows to remove the
lowercase_expanded_terms and locale (which was only used for lowercasing)
options.

Things are expected to work well for custom analyzers. However, built-in
analyzers make it challenging to know which components should be kept for
multi-term analysis. The way it is implemented today is thet there is a default
implementation that returns a lowercasing analyzer, which should be fine for
most language analyzers for european languages. I did not want to go crazy
with configuring the correct multi-term analyzer for those until we have a way
to test that we are sync'ed with what happens in Lucene like we do for testing
which factories need to implement MultiTermAwareComponent.

In the future we could consider removing analyze_wildcards as well, but the
query parser currently has the ability to tokenize it and generate a term query
for the n-1 first tokens and a wildcard query on the last token. I suspect some
users are relying on this behaviour so I think this should be explored in a
separate change.

Closes #9978

…ple_)query_string`.

This pull request uses the `MultiTermAwareComponent` interface in order to
figure out how to deal with queries that match partial strings. This provides a
better out-of-the-box experience and allows to remove the
`lowercase_expanded_terms` and `locale` (which was only used for lowercasing)
options.

Things are expected to work well for custom analyzers. However, built-in
analyzers make it challenging to know which components should be kept for
multi-term analysis. The way it is implemented today is thet there is a default
implementation that returns a lowercasing analyzer, which should be fine for
most language analyzers for european languages. I did not want to go crazy
with configuring the correct multi-term analyzer for those until we have a way
to test that we are sync'ed with what happens in Lucene like we do for testing
which factories need to implement `MultiTermAwareComponent`.

In the future we could consider removing `analyze_wildcards` as well, but the
query parser currently has the ability to tokenize it and generate a term query
for the n-1 first tokens and a wildcard query on the last token. I suspect some
users are relying on this behaviour so I think this should be explored in a
separate change.

Closes elastic#9978
@jpountz
Copy link
Contributor Author

jpountz commented Jun 30, 2016

I have been working on https://issues.apache.org/jira/browse/LUCENE-7355 on the Lucene side, that would help simplify this PR considerably. So I am stalling it until LUCENE-7355 is resolved.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 29, 2016

Superseded by #20208

@jpountz jpountz closed this Aug 29, 2016
@jpountz jpountz deleted the feature/remove_lowercase_expanded_terms branch August 29, 2016 13:08
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature release highlight :Search/Search Search-related issues that do not fall into other categories stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants