Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unsupported postings_format / doc_values_format #7604

Closed
wants to merge 1 commit into from

Conversation

mikemccand
Copy link
Contributor

Today ES allows you to pick e.g. "pulsing", but this is very dangerous because that format, and all other postings/doc values formats from the Lucene codecs module, has no backwards compatibility support in Lucene. So on upgrade you can easily hit strange exceptions that make your index unusable / look like index corruption.

So I removed lucene-codecs JAR entirely from ES, which e.g. removes direct, simple text, memory PF, Lucene's BloomFilteringPF, and disk/memory DVF.

I haven't verified, but I think users can still put the Lucene codecs JAR onto ES's CLASSPATH (e.g. in with a plugin) and then use these formats in their own apps (at their own risk). I think this extra step is better than the ease today with which users can select these formats that Lucene doesn't support.

Today ES allows you to pick e.g. "pulsing", but this is very dangerous because that format, and all other postings/doc values formats from the Lucene codecs module, has no backwards compatibility support in Lucene. So on upgrade you can easily hit strange exceptions that make your index unusable / look like index corruption.

So I removed lucene-codecs JAR entirely from ES, which e.g. removes direct, simple text, memory PF, Lucene's BloomFilteringPF, and disk/memory DVF.

I haven't verified, but I think users can still put the Lucene codecs JAR onto ES's CLASSPATH (e.g. in with a plugin) and then use these formats in their own apps (at their own risk). I think this extra step is better than the ease today with which users can select these formats that Lucene doesn't support.

See #7566 and #7238

@jpountz
Copy link
Contributor

jpountz commented Sep 4, 2014

+1

@jpountz jpountz removed the review label Sep 4, 2014
@s1monw
Copy link
Contributor

s1monw commented Sep 5, 2014

so how does this work with the reading part if somebody used one of the postings formats before? We removed the codecs JAR entirely will we be able to read old indices created with Lucene < 4.8? I think that can be very tricky though - I am not sure if all the old postings formats and DV formats are in core?

@jpountz
Copy link
Contributor

jpountz commented Sep 5, 2014

The codecs jar only contained experimental codecs for which lucene doesn't maintain backward compatibility, so I think it's fine? Users who happened to use one of these non-default codecs would need to either reindex or switch their indices to the defaut codec and trigger a merge before upgrading. All the old postings/dv formats are in core today (but will move to a module in 4.11 that we can add a dependency to when we upgrade to lucene 4.11 https://issues.apache.org/jira/browse/LUCENE-5858).

@mikemccand
Copy link
Contributor Author

Right, users who use the default (back compat supported) codecs will be fine: those are in core (moving to separate JAR in 5.0). But non-back-compat codecs (e.g. bloom_pulsing, pulsing) won't be recognized anymore, which I think is OK? (Better than the "false corruption" we saw on #7238 ).

either reindex or switch their indices to the defaut codec and trigger a merge before upgrading.

Hmm do we allow changing the postings_format / doc_values_format in the mapping for a field after it's created? Or is that "write once"?

@s1monw
Copy link
Contributor

s1monw commented Sep 5, 2014

Hmm do we allow changing the postings_format / doc_values_format in the mapping for a field after it's created? Or is that "write once"?

you can change it via the update mapping API

@jpountz
Copy link
Contributor

jpountz commented Sep 5, 2014

Hmm do we allow changing the postings_format / doc_values_format in the mapping for a field after it's created? Or is that "write once"?

It can be updated, see AbstractFieldMapper.merge.

@mikemccand
Copy link
Contributor Author

you can change it via the update mapping API

It can be updated, see AbstractFieldMapper.merge.

OK that's great, so there is a migration path.

@s1monw
Copy link
Contributor

s1monw commented Sep 5, 2014

++ just double checking...

@s1monw
Copy link
Contributor

s1monw commented Sep 5, 2014

LGTM

@mikemccand mikemccand closed this in 130fdef Sep 8, 2014
mikemccand added a commit that referenced this pull request Sep 8, 2014
Lucene's experimental codecs (from the codecs module) do not provide
backwards compatibility and are free to change from release to
release.  When they do change, they typically cannot in general read
older indices and the resulting exceptions look like index corruption.
So, we are removing built-in support for them to prevent applications
from choosing one and then seeing strange exceptions on upgrade.

Closes #7566

Closes #7604
@clintongormley clintongormley changed the title Don't allow selecting unsupported postings_format / doc_values_format Mapping: Remove unsupported postings_format / doc_values_format Sep 8, 2014
@clintongormley clintongormley changed the title Mapping: Remove unsupported postings_format / doc_values_format Remove unsupported postings_format / doc_values_format Jun 6, 2015
@clintongormley clintongormley added the :Search/Mapping Index mappings, including merging and defining field types label Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >bug :Search/Mapping Index mappings, including merging and defining field types v1.3.3 v1.4.0.Beta1 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants