Upgrade to Lucene 4.8.0 #5932

s1monw · 2014-04-24T13:37:56Z

Lucene 4.8 included several bugfixes and API Improvements. The most relevant for Elasticsearch are:

LUCENE-4747, LUCENE-5514: Move to Java 7 as minimum Java version.
LUCENE-5516: MergeScheduler#merge() now accepts a MergeTrigger as
well as a boolean that indicates if a new merge was found in the
caller thread before the scheduler was called.
LUCENE-5487: Separated bulk scorer (new Weight.bulkScorer method)
from normal scoring (Weight.scorer) for those queries that can do
bulk scoring more efficiently, e.g. BooleanQuery in some cases.
This also simplified the Weight.scorer API by removing the two
confusing booleans.
LUCENE-5497: HunspellStemFilter properly handles escaped terms and
affixes without conditions.
LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also
loads all known openoffice dictionaries without error, and supports
an additional longestOnly option for a less aggressive approach.
LUCENE-5505: HunspellStemFilter ignores BOM markers in dictionaries
and handles varying types of whitespace in SET/FLAG commands.
LUCENE-5507: Fix HunspellStemFilter loading of dictionaries with
large amounts of aliases etc before the encoding declaration.
LUCENE-5111: Fix WordDelimiterFilter to return offsets in correct order.
LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile
for NIOFSDirectory and MMapDirectory. This allows to delete open files
on Windows if NIOFSDirectory is used, mmapped files are still locked.

The changes to WordDelimiterFilter causes a behavior change in the next Elasticsearch release. The previous version of this tokenfilter will still be available for indices that are created in previous version or with the according lucene version.

Lucene 4.8 included several bugfixes and API Improvements. The most relevant for Elasticsearch are: * LUCENE-4747, LUCENE-5514: Move to Java 7 as minimum Java version. * LUCENE-5516: MergeScheduler#merge() now accepts a MergeTrigger as well as a boolean that indicates if a new merge was found in the caller thread before the scheduler was called. * LUCENE-5487: Separated bulk scorer (new Weight.bulkScorer method) from normal scoring (Weight.scorer) for those queries that can do bulk scoring more efficiently, e.g. BooleanQuery in some cases. This also simplified the Weight.scorer API by removing the two confusing booleans. * LUCENE-5497: HunspellStemFilter properly handles escaped terms and affixes without conditions. * LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads all known openoffice dictionaries without error, and supports an additional longestOnly option for a less aggressive approach. * LUCENE-5505: HunspellStemFilter ignores BOM markers in dictionaries and handles varying types of whitespace in SET/FLAG commands. * LUCENE-5507: Fix HunspellStemFilter loading of dictionaries with large amounts of aliases etc before the encoding declaration. * LUCENE-5111: Fix WordDelimiterFilter to return offsets in correct order. * LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile for NIOFSDirectory and MMapDirectory. This allows to delete open files on Windows if NIOFSDirectory is used, mmapped files are still locked. The changes to WordDelimiterFilter causes a behavior change in the next Elasticsearch release. The previous version of this tokenfilter will still be available for indices that are created in previous version or with the according lucene version.

lukas-vlcek · 2014-04-25T06:33:35Z

@s1monw is there any estimation about when we can expect official Elasticsearch release based on Lucene 4.8.0 supporting all the Hunspell improvements? ATM this PR is assigned to v1.2.0 and above.

s1monw · 2014-04-25T06:59:48Z

@lukas-vlcek it will be release with ES 1.2.0 This will likely happen in the near future but given the fact that we are moving with Lucene to Java 1.7 as a minimum requirement we are stretching the release cycle a little longer as in the past. The Lucene 4.8 release is more about robustness since it now has checksumming on the FS level #5924 which we need / want to integrate before the release is due.

We can't backport this 4.8 dependency to 1.1.2 since it requires Java 1.7 vs Java 1.6 being the minimum requirement on 1.1.1. I hope this answers your question.

lukas-vlcek · 2014-04-25T07:26:23Z

@s1monw yes it does, thanks. (Though it does not address the question of time estimate - days, weeks?)

s1monw · 2014-04-25T07:33:51Z

we don't have a fix release date but I am pretty sure it will happen in the next 4 weeks.

Closes #5932

s1monw added 2 commits April 24, 2014 15:28

nocommit: use RC repository until Lucene 4.8.0 is released

b25e7c1

s1monw added v2.0.0 labels Apr 24, 2014

use this filter from lucene instead now

4bc5498

s1monw added the blocker label Apr 24, 2014

mikemccand and others added 2 commits April 24, 2014 12:25

remove dead code

d6c123d

add test to alert us to new analyzers in lucene

f836c32

s1monw self-assigned this Apr 24, 2014

Lucene 4.8 RC2

6abe464

rmuir added 2 commits April 28, 2014 06:32

pull 4.8 from maven central instead

70c503e

merge

c2b6a0a

rmuir closed this in 8e0a479 Apr 28, 2014

rmuir added a commit that referenced this pull request Apr 28, 2014

Upgrade to Lucene 4.8

18c9643

Closes #5932

ppat mentioned this pull request May 6, 2014

Upgrade to Java 7 as Lucene/ES has moved to Java 7 Philippus/elastic4s#146

Closed

s1monw deleted the enhancement/lucene_4_8_upgrade branch April 21, 2015 20:22

clintongormley added the >upgrade label Jun 7, 2015

clintongormley added :Core/Infra/Core Core issues without another label and removed :Core/Infra/Core Core issues without another label >enhancement labels Aug 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to Lucene 4.8.0 #5932

Upgrade to Lucene 4.8.0 #5932

s1monw commented Apr 24, 2014

lukas-vlcek commented Apr 25, 2014

s1monw commented Apr 25, 2014

lukas-vlcek commented Apr 25, 2014

s1monw commented Apr 25, 2014

Upgrade to Lucene 4.8.0 #5932

Upgrade to Lucene 4.8.0 #5932

Conversation

s1monw commented Apr 24, 2014

lukas-vlcek commented Apr 25, 2014

s1monw commented Apr 25, 2014

lukas-vlcek commented Apr 25, 2014

s1monw commented Apr 25, 2014