Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify analyzer settings #9371

Closed
rjernst opened this issue Jan 21, 2015 · 7 comments · Fixed by #9451
Closed

Simplify analyzer settings #9371

rjernst opened this issue Jan 21, 2015 · 7 comments · Fixed by #9451
Labels
>breaking :Search/Mapping Index mappings, including merging and defining field types v2.0.0-beta1

Comments

@rjernst
Copy link
Member

rjernst commented Jan 21, 2015

We currently have the following settings on text fields and some other types: analyzer, index_analyzer, search_analyzer, and search_quote_analyzer. I think we should simplify this by removing index_analyzer, and then add validation so that if you set one of these, you must also set the ones before it in the list. Otherwise you get the current trappy behavior of being able to specify search_analyzer and not index_analyzer, which leads to using the standard analyzer at index time. It also leaves the "normal" use case as setting just analyzer, and removes the weird case where you can set analyzer, index_analyzer and search_analyzer.

As an aside, I think we also need a mechanism to distinguish when parsing mappings between "this is an api call" and "we are reading old mappings from cluster state". With this flag, we can then modify the mappings on parse, so that the old settings disappear when upgrading. AFAIK right now, we only have the indexVersionCreated() which only says on what version the index was created, but updates could still happen for that index and the version would not change.

@rjernst rjernst added :Search/Mapping Index mappings, including merging and defining field types discuss v2.0.0-beta1 >breaking and removed >breaking labels Jan 21, 2015
@clintongormley
Copy link

Previously, setting just search_analyzer and leaving index_analyzer blank, would have caused the index analyzer to use the "default" at index time, where the default could have been specified in the _analyzer field.

Given that we're planning on removing that in #9279, I'm +1 on this change.

@colings86
Copy link
Contributor

index_analyzer is a useful name since it is explicit about what it applies to. I think we should keep the settings as they are but throw an exception if the analyzer setting is used with any of the other settings (index_analyzer, search_analyzer, search_quote_analyzer). This removes the trappy behaviours since if you are doing something complex you have to set each option explicitly but allows the simple use cases where you want to set an analyzer for all situations in one go. It also means that we don't negate the point of having a default analyzer in the mappings so you can still set e.g. the index_analyzer to be blank and it will use the default analyzer. Having the three names explicit means that its clear which analyzer applies to which situation (index, search or lucene phrase)

@rmuir
Copy link
Contributor

rmuir commented Jan 23, 2015

@colings86 That does not remove all the trappy behaviors. There are two traps: one is the "defaults" if you leave something blank (see clinton's example), the other is using 'analyzer' in combination with index/search analyzer.

+1 to ryan's proposal. I'm -1 to any solution that doesnt solve both the traps.

@colings86
Copy link
Contributor

@rmuir sorry, I think what I wrote didn't explain completely what my thinking is. What I propose is that we leave the names of the options as they are (because the names are currently clear on what they affect) but we add validation that says:

  1. If you specify analyzer you can't specify index_analyzer, search_analyzer or search_quote_analyzer as well. Doing so will throw a parse exception
  2. You must specify index_analyzer and search_analyzer together or we throw a parse exception
  3. If you specify search_quote_analyzer you must specify search_analyzer (and because of 2. you must also specify index_analyzer) or we throw a parse exception

Thinking about it, I don't think it's a good idea to let people set either search_analyzer or index_analyzer to the default analyzer since you need to be careful about these settings and don't want an unexpected change to the default analyzer to screw up these settings. But I think the above rules still apply

@rmuir
Copy link
Contributor

rmuir commented Jan 23, 2015

I agree, with those rules it can also solve the problems.

However, i think its a more complicated set of rules to enforce, just to support exotic cases. Either way, as long as the 'analyzer' case is easy, i am fine.

@rjernst
Copy link
Member Author

rjernst commented Jan 23, 2015

@colings86 Yes, those rules will "work" for validation, but I agree with Robert that it is just added complication. It also makes the system more difficult on the user. Imagine a typical case where you have set your analyzer, and now you want to add some query time synonyms. You go to set your search_analyzer, and now you have to remove analyzer, add index_analyzer and add search_analyzer. But why? You won't be changing the analyzer for indexing, you would only be moving it from one setting name to another. Now instead, with what i proposed, you would only add search_analyzer and you are done. Anyone who is advanced enough to understand they need to set a search_analyzer can understand that the generically named analyzer is the general analyzer used, and setting search_analyzer is an override of the analyzer at search time, and likewise, setting search_quote_analyzer is an override of the search analyzer at search time. This build up pattern is easy to understand, and easy on the user to use.

@clintongormley
Copy link

I like the clarity of the name index_analyzer, but I do find @rjernst's logic compelling and easier to explain. It also solves the issue we have currently that you can set the analyzer, but if you retrieve the mapping you get back the index_analyzer and the search_analyzer.

I think I'm opting for analyzer < search_analyzer < search_quote_analyzer.

@rjernst rjernst removed the discuss label Jan 28, 2015
rjernst added a commit to rjernst/elasticsearch that referenced this issue Jan 28, 2015
The `analyzer` setting is now the base setting, and `search_analyzer`
is simply an override of the search time analyzer.  When setting
`search_analyzer`, `analyzer` must be set.

closes elastic#9371
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Search/Mapping Index mappings, including merging and defining field types v2.0.0-beta1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants