New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unsupported postings_format
/ doc_values_format
#7604
Conversation
+1 |
so how does this work with the reading part if somebody used one of the postings formats before? We removed the codecs JAR entirely will we be able to read old indices created with Lucene < 4.8? I think that can be very tricky though - I am not sure if all the old postings formats and DV formats are in core? |
The codecs jar only contained experimental codecs for which lucene doesn't maintain backward compatibility, so I think it's fine? Users who happened to use one of these non-default codecs would need to either reindex or switch their indices to the defaut codec and trigger a merge before upgrading. All the old postings/dv formats are in core today (but will move to a module in 4.11 that we can add a dependency to when we upgrade to lucene 4.11 https://issues.apache.org/jira/browse/LUCENE-5858). |
Right, users who use the default (back compat supported) codecs will be fine: those are in core (moving to separate JAR in 5.0). But non-back-compat codecs (e.g. bloom_pulsing, pulsing) won't be recognized anymore, which I think is OK? (Better than the "false corruption" we saw on #7238 ).
Hmm do we allow changing the postings_format / doc_values_format in the mapping for a field after it's created? Or is that "write once"? |
you can change it via the update mapping API |
It can be updated, see |
OK that's great, so there is a migration path. |
++ just double checking... |
LGTM |
Lucene's experimental codecs (from the codecs module) do not provide backwards compatibility and are free to change from release to release. When they do change, they typically cannot in general read older indices and the resulting exceptions look like index corruption. So, we are removing built-in support for them to prevent applications from choosing one and then seeing strange exceptions on upgrade. Closes #7566 Closes #7604
postings_format
/ doc_values_format
Today ES allows you to pick e.g. "pulsing", but this is very dangerous because that format, and all other postings/doc values formats from the Lucene codecs module, has no backwards compatibility support in Lucene. So on upgrade you can easily hit strange exceptions that make your index unusable / look like index corruption.
So I removed lucene-codecs JAR entirely from ES, which e.g. removes direct, simple text, memory PF, Lucene's BloomFilteringPF, and disk/memory DVF.
I haven't verified, but I think users can still put the Lucene codecs JAR onto ES's CLASSPATH (e.g. in with a plugin) and then use these formats in their own apps (at their own risk). I think this extra step is better than the ease today with which users can select these formats that Lucene doesn't support.
Today ES allows you to pick e.g. "pulsing", but this is very dangerous because that format, and all other postings/doc values formats from the Lucene codecs module, has no backwards compatibility support in Lucene. So on upgrade you can easily hit strange exceptions that make your index unusable / look like index corruption.
So I removed lucene-codecs JAR entirely from ES, which e.g. removes direct, simple text, memory PF, Lucene's BloomFilteringPF, and disk/memory DVF.
I haven't verified, but I think users can still put the Lucene codecs JAR onto ES's CLASSPATH (e.g. in with a plugin) and then use these formats in their own apps (at their own risk). I think this extra step is better than the ease today with which users can select these formats that Lucene doesn't support.
See #7566 and #7238