Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Lucene 4.8.0 #5932

Closed
wants to merge 8 commits into from
Closed

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Apr 24, 2014

Lucene 4.8 included several bugfixes and API Improvements. The most relevant for Elasticsearch are:

  • LUCENE-4747, LUCENE-5514: Move to Java 7 as minimum Java version.
  • LUCENE-5516: MergeScheduler#merge() now accepts a MergeTrigger as
    well as a boolean that indicates if a new merge was found in the
    caller thread before the scheduler was called.
  • LUCENE-5487: Separated bulk scorer (new Weight.bulkScorer method)
    from normal scoring (Weight.scorer) for those queries that can do
    bulk scoring more efficiently, e.g. BooleanQuery in some cases.
    This also simplified the Weight.scorer API by removing the two
    confusing booleans.
  • LUCENE-5497: HunspellStemFilter properly handles escaped terms and
    affixes without conditions.
  • LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also
    loads all known openoffice dictionaries without error, and supports
    an additional longestOnly option for a less aggressive approach.
  • LUCENE-5505: HunspellStemFilter ignores BOM markers in dictionaries
    and handles varying types of whitespace in SET/FLAG commands.
  • LUCENE-5507: Fix HunspellStemFilter loading of dictionaries with
    large amounts of aliases etc before the encoding declaration.
  • LUCENE-5111: Fix WordDelimiterFilter to return offsets in correct order.
  • LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile
    for NIOFSDirectory and MMapDirectory. This allows to delete open files
    on Windows if NIOFSDirectory is used, mmapped files are still locked.

The changes to WordDelimiterFilter causes a behavior change in the next Elasticsearch release. The previous version of this tokenfilter will still be available for indices that are created in previous version or with the according lucene version.

Lucene 4.8 included several bugfixes and API Improvements.
The most relevant for Elasticsearch are:

 * LUCENE-4747, LUCENE-5514: Move to Java 7 as minimum Java version.
 * LUCENE-5516: MergeScheduler#merge() now accepts a MergeTrigger as
   well as a boolean that indicates if a new merge was found in the
   caller thread before the scheduler was called.
 * LUCENE-5487: Separated bulk scorer (new Weight.bulkScorer method)
   from normal scoring (Weight.scorer) for those queries that can do
   bulk scoring more efficiently, e.g. BooleanQuery in some cases.
   This also simplified the Weight.scorer API by removing the two
   confusing booleans.
 * LUCENE-5497: HunspellStemFilter properly handles escaped terms and
   affixes without conditions.
 * LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also
   loads all known openoffice dictionaries without error, and supports
   an additional longestOnly option for a less aggressive approach.
 * LUCENE-5505: HunspellStemFilter ignores BOM markers in dictionaries
   and handles varying types of whitespace in SET/FLAG commands.
 * LUCENE-5507: Fix HunspellStemFilter loading of dictionaries with
   large amounts of aliases etc before the encoding declaration.
 * LUCENE-5111: Fix WordDelimiterFilter to return offsets in correct order.
 * LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile
   for NIOFSDirectory and MMapDirectory. This allows to delete open files
   on Windows if NIOFSDirectory is used, mmapped files are still locked.

The changes to WordDelimiterFilter causes a behavior change in the
next Elasticsearch release. The previous version of this tokenfilter will
still be available for indices that are created in previous version or with
the according lucene version.
@s1monw s1monw added the blocker label Apr 24, 2014
@s1monw s1monw self-assigned this Apr 24, 2014
@lukas-vlcek
Copy link
Contributor

@s1monw is there any estimation about when we can expect official Elasticsearch release based on Lucene 4.8.0 supporting all the Hunspell improvements? ATM this PR is assigned to v1.2.0 and above.

@s1monw
Copy link
Contributor Author

s1monw commented Apr 25, 2014

@lukas-vlcek it will be release with ES 1.2.0 This will likely happen in the near future but given the fact that we are moving with Lucene to Java 1.7 as a minimum requirement we are stretching the release cycle a little longer as in the past. The Lucene 4.8 release is more about robustness since it now has checksumming on the FS level #5924 which we need / want to integrate before the release is due.

We can't backport this 4.8 dependency to 1.1.2 since it requires Java 1.7 vs Java 1.6 being the minimum requirement on 1.1.1. I hope this answers your question.

@lukas-vlcek
Copy link
Contributor

@s1monw yes it does, thanks. (Though it does not address the question of time estimate - days, weeks?)

@s1monw
Copy link
Contributor Author

s1monw commented Apr 25, 2014

we don't have a fix release date but I am pretty sure it will happen in the next 4 weeks.

@rmuir rmuir closed this in 8e0a479 Apr 28, 2014
rmuir added a commit that referenced this pull request Apr 28, 2014
@s1monw s1monw deleted the enhancement/lucene_4_8_upgrade branch April 21, 2015 20:22
@clintongormley clintongormley added :Core/Infra/Core Core issues without another label and removed :Core/Infra/Core Core issues without another label >enhancement labels Aug 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants