New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to Lucene 4.8.0 #5932
Upgrade to Lucene 4.8.0 #5932
Conversation
Lucene 4.8 included several bugfixes and API Improvements. The most relevant for Elasticsearch are: * LUCENE-4747, LUCENE-5514: Move to Java 7 as minimum Java version. * LUCENE-5516: MergeScheduler#merge() now accepts a MergeTrigger as well as a boolean that indicates if a new merge was found in the caller thread before the scheduler was called. * LUCENE-5487: Separated bulk scorer (new Weight.bulkScorer method) from normal scoring (Weight.scorer) for those queries that can do bulk scoring more efficiently, e.g. BooleanQuery in some cases. This also simplified the Weight.scorer API by removing the two confusing booleans. * LUCENE-5497: HunspellStemFilter properly handles escaped terms and affixes without conditions. * LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads all known openoffice dictionaries without error, and supports an additional longestOnly option for a less aggressive approach. * LUCENE-5505: HunspellStemFilter ignores BOM markers in dictionaries and handles varying types of whitespace in SET/FLAG commands. * LUCENE-5507: Fix HunspellStemFilter loading of dictionaries with large amounts of aliases etc before the encoding declaration. * LUCENE-5111: Fix WordDelimiterFilter to return offsets in correct order. * LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile for NIOFSDirectory and MMapDirectory. This allows to delete open files on Windows if NIOFSDirectory is used, mmapped files are still locked. The changes to WordDelimiterFilter causes a behavior change in the next Elasticsearch release. The previous version of this tokenfilter will still be available for indices that are created in previous version or with the according lucene version.
@s1monw is there any estimation about when we can expect official Elasticsearch release based on Lucene 4.8.0 supporting all the Hunspell improvements? ATM this PR is assigned to v1.2.0 and above. |
@lukas-vlcek it will be release with ES We can't backport this 4.8 dependency to |
@s1monw yes it does, thanks. (Though it does not address the question of time estimate - days, weeks?) |
we don't have a fix release date but I am pretty sure it will happen in the next 4 weeks. |
Lucene 4.8 included several bugfixes and API Improvements. The most relevant for Elasticsearch are:
well as a boolean that indicates if a new merge was found in the
caller thread before the scheduler was called.
from normal scoring (Weight.scorer) for those queries that can do
bulk scoring more efficiently, e.g. BooleanQuery in some cases.
This also simplified the Weight.scorer API by removing the two
confusing booleans.
affixes without conditions.
loads all known openoffice dictionaries without error, and supports
an additional longestOnly option for a less aggressive approach.
and handles varying types of whitespace in SET/FLAG commands.
large amounts of aliases etc before the encoding declaration.
for NIOFSDirectory and MMapDirectory. This allows to delete open files
on Windows if NIOFSDirectory is used, mmapped files are still locked.
The changes to WordDelimiterFilter causes a behavior change in the next Elasticsearch release. The previous version of this tokenfilter will still be available for indices that are created in previous version or with the according lucene version.