Permalink
Switch branches/tags
releases/solr/1.4.0 releases/solr/1.3.0 releases/solr/1.2.0 releases/solr/1.1.0 releases/lucene/3.0.1 releases/lucene/3.0.0 releases/lucene/2.9.2 releases/lucene/2.9.1 releases/lucene/2.9.0 releases/lucene/2.4.1 releases/lucene/2.4.0 releases/lucene/2.3.2 releases/lucene/2.3.1 releases/lucene/2.3.0 releases/lucene/2.2.0 releases/lucene/2.1.0 releases/lucene/2.0.0 releases/lucene/1.9.1 releases/lucene/1.9 releases/lucene/1.9-rc1 releases/lucene/1.4.3 releases/lucene/1.4.2 releases/lucene/1.4.1 releases/lucene/1.4 releases/lucene/1.4-rc3 releases/lucene/1.4-rc2 releases/lucene/1.4-rc1 releases/lucene/1.3 releases/lucene/1.3-rc3 releases/lucene/1.3-rc2 releases/lucene/1.3-rc1 releases/lucene/1.2 releases/lucene/1.2-rc5 releases/lucene/1.2-rc4 releases/lucene/1.2-rc3 releases/lucene/1.2-rc2 releases/lucene/1.2-rc1 releases/lucene/1.0.1 releases/lucene-solr/7.4.0 releases/lucene-solr/7.3.1 releases/lucene-solr/7.3.0 releases/lucene-solr/7.2.1 releases/lucene-solr/7.2.0 releases/lucene-solr/7.1.0 releases/lucene-solr/7.0.1 releases/lucene-solr/7.0.0 releases/lucene-solr/6.6.5 releases/lucene-solr/6.6.4 releases/lucene-solr/6.6.3 releases/lucene-solr/6.6.2 releases/lucene-solr/6.6.1 releases/lucene-solr/6.6.0 releases/lucene-solr/6.5.1 releases/lucene-solr/6.5.0 releases/lucene-solr/6.4.2 releases/lucene-solr/6.4.1 releases/lucene-solr/6.4.0 releases/lucene-solr/6.3.0 releases/lucene-solr/6.2.1 releases/lucene-solr/6.2.0 releases/lucene-solr/6.1.0 releases/lucene-solr/6.0.1 releases/lucene-solr/6.0.0 releases/lucene-solr/5.5.5 releases/lucene-solr/5.5.4 releases/lucene-solr/5.5.3 releases/lucene-solr/5.5.2 releases/lucene-solr/5.5.1 releases/lucene-solr/5.5.0 releases/lucene-solr/5.4.1 releases/lucene-solr/5.4.0 releases/lucene-solr/5.3.2 releases/lucene-solr/5.3.1 releases/lucene-solr/5.3.0 releases/lucene-solr/5.2.1 releases/lucene-solr/5.2.0 releases/lucene-solr/5.1.0 releases/lucene-solr/5.0.0 releases/lucene-solr/4.10.4 releases/lucene-solr/4.10.3 releases/lucene-solr/4.10.2 releases/lucene-solr/4.10.1 releases/lucene-solr/4.10.0 releases/lucene-solr/4.9.1 releases/lucene-solr/4.9.0 releases/lucene-solr/4.8.1 releases/lucene-solr/4.8.0 releases/lucene-solr/4.7.2 releases/lucene-solr/4.7.1 releases/lucene-solr/4.7.0 releases/lucene-solr/4.6.1 releases/lucene-solr/4.6.0 releases/lucene-solr/4.5.1 releases/lucene-solr/4.5.0 releases/lucene-solr/4.4.0 releases/lucene-solr/4.3.1 releases/lucene-solr/4.3.0 releases/lucene-solr/4.2.1 releases/lucene-solr/4.2.0 releases/lucene-solr/4.1.0
Nothing to show
Find file Copy path
15774 lines (11525 sloc) 685 KB
Lucene Change Log
For more information on past and future Lucene versions, please see:
http://s.apache.org/luceneversions
======================= Lucene 8.0.0 =======================
API Changes
* LUCENE-8469: Deprecated StringHelper.compare has been removed. (Dawid Weiss)
* LUCENE-8039: Introduce a "delta distance" method set to GeoDistance. This
allows distance calculations, especially for paths, to take into account an
"excursion" to include the specified point.
* LUCENE-8007: Index statistics Terms.getSumDocFreq(), Terms.getDocCount() are
now required to be stored by codecs. Additionally, TermsEnum.totalTermFreq()
and Terms.getSumTotalTermFreq() are now required: if frequencies are not
stored they are equal to TermsEnum.docFreq() and Terms.getSumDocFreq(),
respectively, because all freq() values equal 1. (Adrien Grand, Robert Muir)
* LUCENE-8038: Deprecated PayloadScoreQuery constructors have been removed (Alan
Woodward)
* LUCENE-8014: Similarity.computeSlopFactor() and
Similarity.computePayloadFactor() have been removed (Alan Woodward)
* LUCENE-7996: Queries are now required to produce positive scores.
(Adrien Grand)
* LUCENE-8099: CustomScoreQuery, BoostedQuery and BoostingQuery have been
removed (Alan Woodward)
* LUCENE-8012: Explanation now takes Number rather than float (Alan Woodward,
Robert Muir)
* LUCENE-8116: SimScorer now only takes a frequency and a norm as per-document
scoring factors. (Adrien Grand)
* LUCENE-8113: TermContext has been renamed to TermStates, and can now be
constructed lazily if term statistics are not required (Alan Woodward)
* LUCENE-8242: Deprecated method IndexSearcher#createNormalizedWeight() has
been removed (Alan Woodward)
* LUCENE-8267: Memory codecs removed from the codebase (MemoryPostings,
MemoryDocValues). (Dawid Weiss)
* LUCENE-8144: Moved QueryCachingPolicy.ALWAYS_CACHE to the test framework.
(Nhat Nguyen via Adrien Grand)
* LUCENE-8356: StandardFilter and StandardFilterFactory have been removed
(Alan Woodward)
* LUCENE-8373: StandardAnalyzer.ENGLISH_STOP_WORD_SET has been removed
(Alan Woodward)
* LUCENE-8388: Unused PostingsEnum#attributes() method has been removed
(Alan Woodward)
* LUCENE-8405: TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector
no longer have an option to compute the maximum score when sorting by field.
(Adrien Grand)
* LUCENE-8411: TopFieldCollector no longer takes a fillFields option, it now
always fills fields. (Adrien Grand)
* LUCENE-8412: TopFieldCollector no longer takes a trackDocScores option. Scores
need to be set on top hits via TopFieldCollector#populateScores instead.
(Adrien Grand)
* LUCENE-6228: A new Scorable abstract class has been added, containing only those
methods from Scorer that should be called from Collectors. LeafCollector.setScorer()
now takes a Scorable rather than a Scorer. (Alan Woodward, Adrien Grand)
* LUCENE-8475: Deprecated constants have been removed from RamUsageEstimator.
(Dimitrios Athanasiou)
* LUCENE-8483: Scorers may no longer take null as a Weight (Alan Woodward)
* LUCENE-8352: TokenStreamComponents is now final, and can take a Consumer<Reader>
in its constructor (Mark Harwood, Alan Woodward, Adrien Grand)
* LUCENE-8498: LowerCaseTokenizer has been removed, and CharTokenizer no longer
takes a normalizer function. (Alan Woodward)
Changes in Runtime Behavior
* LUCENE-8333: Switch MoreLikeThis.setMaxDocFreqPct to use maxDoc instead of
numDocs. (Robert Muir, Dawid Weiss).
* LUCENE-7837: Indices that were created before the previous major version
will now fail to open even if they have been merged with the previous major
version. (Adrien Grand)
* LUCENE-8020: Similarities are no longer passed terms that don't exist by
queries such as SpanOrQuery, so scoring formulas no longer require
divide-by-zero hacks. IndexSearcher.termStatistics/collectionStatistics return null
instead of returning bogus values for a non-existent term or field. (Robert Muir)
* LUCENE-7996: FunctionQuery and FunctionScoreQuery now return a score of 0
when the function produces a negative value. (Adrien Grand)
* LUCENE-8116: Similarities now score fields that omit norms as if the norm was
1. This might change score values on fields that omit norms. (Adrien Grand)
* LUCENE-8134: Index options are no longer automatically downgraded.
(Adrien Grand)
* LUCENE-8031: Length normalization correctly reflects omission of term frequencies.
(Robert Muir, Adrien Grand)
* LUCENE-7444: StandardAnalyzer no longer defaults to removing English stopwords
(Alan Woodward)
* LUCENE-8060: IndexSearcher's search and searchAfter methods now only compute
total hit counts accurately up to 1,000 in order to enable top-hits
optimizations such as block-max WAND (LUCENE-8135). (Adrien Grand)
New Features
* LUCENE-8340: LongPoint#newDistanceQuery may be used to boost scores based on
how close a value of a long field is from an configurable origin. This is
typically useful to boost by recency. (Adrien Grand)
* LUCENE-8482: LatLonPoint#newDistanceFeatureQuery may be used to boost scores
based on the haversine distance of a LatLonPoint field to a provided point. This is
typically useful to boost by distance. (Ignacio Vera)
Improvements
* LUCENE-7997: Add BaseSimilarityTestCase to sanity check similarities.
SimilarityBase switches to 64-bit doubles internally to help avoid common numeric issues.
Add missing range checks for similarity parameters.
Improve BM25 and ClassicSimilarity's explanations. (Robert Muir)
* LUCENE-8011: Improved similarity explanations.
(Mayya Sharipova via Adrien Grand)
* LUCENE-4198: Codecs now have the ability to index score impacts.
(Adrien Grand)
* LUCENE-8135: Boolean queries now implement the block-max WAND algorithm in
order to speed up selection of top scored documents. (Adrien Grand)
* LUCENE-8279: CheckIndex now cross-checks terms with norms. (Adrien Grand)
Optimizations
* LUCENE-8040: Optimize IndexSearcher.collectionStatistics, avoiding MultiFields/MultiTerms
(David Smiley, Robert Muir)
* LUCENE-4100: Disjunctions now support faster collection of top hits when the
total hit count is not required. (Stefan Pohl, Adrien Grand, Robert Muir)
* LUCENE-7993: Phrase queries are now faster if total hit counts are not
required. (Adrien Grand)
* LUCENE-8109: Boolean queries propagate information about the minimum
competitive score in order to make collection faster if there are disjunctions
or phrase queries as sub queries, which know how to leverage this information
to run faster. (Adrien Grand)
* LUCENE-8439: Disjunction max queries can skip blocks to select the top documents
if the total hit count is not required. (Jim Ferenczi, Adrien Grand)
* LUCENE-8204: Boolean queries with a mix of required and optional clauses are
now faster if the total hit count is not required. (Jim Ferenczi, Adrien Grand)
* LUCENE-8448: Boolean queries now propagates the mininum score to their sub-scorers.
(Jim Ferenczi, Adrien Grand)
======================= Lucene 7.6.0 =======================
Build
* LUCENE-8504: Upgrade forbiddenapis to version 2.6. (Uwe Schindler)
======================= Lucene 7.5.1 =======================
Bug Fixes:
* LUCENE-8454: Fix incorrect vertex indexing and other computation errors in
shape tessellation that would sometimes cause an infinite loop. (Nick Knize)
======================= Lucene 7.5.0 =======================
API Changes:
* LUCENE-8467: RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated
(Dawid Weiss)
* LUCENE-8356: StandardFilter is deprecated (Alan Woodward)
* LUCENE-8373: ENGLISH_STOP_WORD_SET on StandardAnalyzer is deprecated. Instead
use EnglishAnalyzer.ENGLISH_STOP_WORD_SET. The default constructor for
StopAnalyzer is also deprecated, and a stop word set should be explicitly
passed to the constructor. (Alan Woodward)
* LUCENE-8378: Add DocIdSetIterator.range static method to return an iterator
matching a range of docids (Mike McCandless)
* LUCENE-8379: Add experimental TermQuery.getTermStates method (Mike McCandless)
* LUCENE-8407: Add experimental SpanTermQuery.getTermStates method (David Smiley)
* LUCENE-8390: MatchesIteratorSupplier replaced by IOSupplier (Alan Woodward,
David Smiley)
* LUCENE-8397: Add DirectoryTaxonomyWriter.getCache (Mike McCandless)
* LUCENE-8387: Add experimental IndexSearcher.getSlices API to see which slices
IndexSearcher is searching concurrently when it's created with an ExecutorService
(Mike McCandless)
* LUCENE-8263: TieredMergePolicy's reclaimDeletesWeight has been replaced with a
new deletesPctAllowed setting to control how aggressively deletes should be
reclaimed. (Erick Erickson, Adrien Grand)
* LUCENE-7314: Graduate LatLonPoint and query classes to core (Nick Knize)
* LUCENE-8428: The way that oal.util.PriorityQueue creates sentinel objects has
been changed from a protected method to a java.util.function.Supplier as a
constructor argument. (Adrien Grand)
* LUCENE-8437: CheckIndex.Status.cantOpenSegments and missingSegmentVersion
have been removed as they were not computed correctly. (Adrien Grand)
* LUCENE-8286: The UnifiedHighlighter has a new HighlightFlag.WEIGHT_MATCHES flag that
will tell this highlighter to use the new MatchesIterator API as the underlying
approach to navigate matching hits for a query. This mode will highlight more
accurately than any other highlighter, and can mark up phrases as one span instead of
word-by-word. The UH's public internal APIs changed a bit in the process.
(David Smiley)
* LUCENE-8471: IndexWriter.getFlushingBytes() returns how many bytes are currently
being flushed to disk. (Alan Woodward)
* LUCENE-8422: Static helper functions for Matches and MatchesIterator implementations
have been moved from Matches to MatchesUtils (Alan Woodward)
* LUCENE-8343: Suggesters now require Long (versus long, previously) from weight() method
while indexing, and provide double (versus long, previously) scores at lookup time
(Alessandro Benedetti)
* LUCENE-8459: SearcherTaxonomyManager now has a constructor taking already opened
IndexReaders, allowing the caller to pass a FilterDirectoryReader, for example.
(Mike McCandless)
Bug Fixes:
* LUCENE-8445: Tighten condition when two planes are identical to prevent constructing
bogus tiles when building GeoPolygons. (Ignacio Vera)
* LUCENE-8444: Prevent building functionally identical plane bounds when constructing
DualCrossingEdgeIterator . (Ignacio Vera)
* LUCENE-8380: UTF8TaxonomyWriterCache inconsistency. (Ruslan Torobaev, Dawid Weiss)
* LUCENE-8164: IndexWriter silently accepts broken payload. This has been fixed
via LUCENE-8165 since we are now checking for offset+length going out of bounds.
(Robert Muir, Nhat Nyugen, Simon Willnauer)
* LUCENE-8370: Reproducing
TestLucene{54,70}DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields()
failures (Erick Erickson)
* LUCENE-8376, LUCENE-8371: ConditionalTokenFilter.end() would not propagate correctly
if the last token in the stream was subsequently dropped; FixedShingleFilter did
not set position increment in end() (Alan Woodward)
* LUCENE-8395: WordDelimiterGraphFilter would incorrectly insert a hole into a
TokenStream if a token consisting entirely of delimiter characters was
encountered, but preserve_original was set. (Alan Woodward)
* LUCENE-8398: TieredMergePolicy.getMaxMergedSegmentMB has rounding error (Erick Erickson)
* LUCENE-8429: DaciukMihovAutomatonBuilder is no longer prone to stack
overflows by enforcing a maximum term length. (Adrien Grand)
* LUCENE-8441: IndexWriter now checks doc value type for index sort fields
and fails the document if they are not compatible. (Jim Ferenczi, Mike McCandless)
* LUCENE-8458: Adjust initialization condition of PendingSoftDeletes and ensures
it is initialized before accepting deletes (Simon Willnauer, Nhat Nguyen)
* LUCENE-8466: IndexWriter.deleteDocs(Query... query) incorrectly applies deletes on flush
if the index is sorted. (Adrien Grand, Jim Ferenczi, Vish Ramachandran)
* LUCENE-8502: Allow access to delegate in FilterCodecReader. FilterCodecReader didn't
allow access to it's delegate like other filter readers. This adds a new #getDelegate method
to access the wrapped reader. (Simon Willnauer)
Changes in Runtime Behavior:
* LUCENE-7976: TieredMergePolicy now respects maxSegmentSizeMB by default when executing
findForcedMerges and findForcedDeletesMerges (Erick Erickson)
* LUCENE-8263: TieredMergePolicy now reclaims deleted documents more
aggressively by default ensuring that no more than ~1/3 of the index size is
used by deleted documents. (Adrien Grand)
* LUCENE-8503: Call #getDelegate instead of direct member access during unwrap.
Filter*Reader instances access the member or the delegate directly instead of
calling getDelegate(). In order to track access of the delegate these methods
should call #getDelegate() (Simon Willnauer)
Improvements
* LUCENE-8468: A ByteBuffer based Directory implementation. (Dawid Weiss)
* LUCENE-8447: Add DISJOINT and WITHIN support to LatLonShape queries. (Nick Knize)
* LUCENE-8440: Add support for indexing and searching Line and Point shapes using LatLonShape encoding (Nick Knize)
* LUCENE-8435: Add new LatLonShapePolygonQuery for querying indexed LatLonShape fields by arbitrary polygons (Nick Knize)
* LUCENE-8367: Make per-dimension drill down optional for each facet dimension (Mike McCandless)
* LUCENE-8396: Add Points Based Shape Indexing and Search that decomposes shapes
into a triangular mesh and indexes individual triangles as a 6 dimension point (Nick Knize)
* LUCENE-8345, GitHub PR #392: Remove instantiation of redundant wrapper classes for primitives;
add wrapper class constructors to forbiddenapis. (Michael Braun via Uwe Schindler)
* LUCENE-8415: Clean up Directory contracts and JavaDoc comments. (Dawid Weiss)
* LUCENE-8414: Make segmentInfos private in IndexWriter (Simon Willnauer, Nhat Nguyen)
* LUCENE-8446: The UnifiedHighlighter's DefaultPassageFormatter now treats overlapping matches in
the passage as merged (as if one larger match). (David Smiley)
* LUCENE-8460: Better argument validation in StoredField. (Namgyu Kim)
* LUCENE-8432: TopFieldComparator stops comparing documents if the index is
sorted, even if hits still need to be visited to compute the hit count.
(Nikolay Khitrin)
* LUCENE-8422: IntervalQuery now returns useful Matches (Alan Woodward)
* LUCENE-7862: Store the real bounds of the leaf cells in the BKD index when the
number of dimensions is bigger than 1. It improves performance when there is
correlation between the dimensions, for example ranges. (Ignacio Vera, Adrien Grand)
Build
* LUCENE-5143: Stop publishing KEYS file with each version, use topmost lucene/KEYS file only.
The buildAndPushRelease.py script validates that RM's PGP key is in the KEYS file.
Remove unused 'copy-to-stage' and '-dist-keys' targets from ant build. (janhoy)
Other:
* LUCENE-8485: Update randomizedtesting to version 2.6.4. (Dawid Weiss)
* LUCENE-8366: Upgrade to ICU 62.1. Emoji handling now uses Unicode 11's
Extended_Pictographic property. (Robert Muir)
* LUCENE-8408: original Highlighter: Remove obsolete static AttributeFactory instance
in TokenStreamFromTermVector. (Michael Braun, David Smiley)
* LUCENE-8420: Upgrade OpenNLP to 1.9.0 so OpenNLP tool can read the new model format which 1.8.x
cannot read. 1.9.0 can read the old format. (Koji Sekiguchi)
* LUCENE-8453: Add documentation to analysis factories of Korean (Nori) analyzer
module. (Tomoko Uchida via Uwe Schindler)
* LUCENE-8455: Upgrade ECJ compiler to 4.6.1 in lucene/common-build.xml (Erick Erickson)
* LUCENE-8456: Upgrade Apache Commons Compress to v1.18 (Steve Rowe)
* LUCENE-765: Improved org.apache.lucene.index javadocs. (Mike Sokolov)
* LUCENE-8476: Remove redundant nullity check and switch to optimized List.sort in the
Korean's user dictionary. (Namgyu Kim)
======================= Lucene 7.4.1 =======================
Bug Fixes:
* LUCENE-8365: Fix ArrayIndexOutOfBoundsException in UnifiedHighlighter. This fixes
a "off by one" error in the UnifiedHighlighter's code that is only triggered when
two nested SpanNearQueries contain the same term. (Marc-Andre Morissette via Simon Willnauer)
* LUCENE-8381: Fix IndexWriter incorrectly interprets hard-deletes as soft-deletes
while wrapping reader for merges. (Simon Willnauer, Nhat Nguyen)
* LUCENE-8384: Fix missing advance docValues generation while handling docValues
update in PendingSoftDeletes. (Simon Willnauer, Nhat Nguyen)
* LUCENE-8472: Always rewrite the soft-deletes merge retention query. (Adrien Grand, Nhat Nguyen)
======================= Lucene 7.4.0 =======================
Upgrading
* LUCENE-8344: If you are using the AnalyzingSuggester or FuzzySuggester subclass, and if you
explicitly use the preservePositionIncrements=false setting (not the default), then you ought
to rebuild your suggester index. If you don't, queries or indexed data with trailing position
gaps (e.g. stop words) may not work correctly. (David Smiley, Jim Ferenczi)
API Changes
* LUCENE-8242: IndexSearcher.createNormalizedWeight() has been deprecated.
Instead use IndexSearcher.createWeight(), rewriting the query first.
(Alan Woodward)
* LUCENE-8248: MergePolicyWrapper is renamed to FilterMergePolicy and now
also overrides getMaxCFSSegmentSizeMB (Mike Sokolov via Mike McCandless)
* LUCENE-8303: LiveDocsFormat is now only responsible for (de)serialization of
live docs. (Adrien Grand)
Changes in Runtime Behavior
* LUCENE-8309: Live docs are no longer backed by a FixedBitSet. (Adrien Grand)
* LUCENE-8330: Detach IndexWriter from MergePolicy. MergePolicy now instead of
requiring IndexWriter as a hard dependency expects a MergeContext which
IndexWriter implements. (Simon Willnauer, Robert Muir, Dawid Weiss, Mike McCandless)
New Features
* LUCENE-8200: Allow doc-values to be updated atomically together
with a document. Doc-Values updates now can be used as a soft-delete
mechanism to all keeping several version of a document or already
deleted documents around for later reuse. See "IW.softUpdateDocument(...)"
for reference. (Simon Willnauer)
* LUCENE-8197: A new FeatureField makes it easy and efficient to integrate
static relevance signals into the final score. (Adrien Grand, Robert Muir)
* LUCENE-8202: Add a FixedShingleFilter (Alan Woodward, Adrien Grand, Jim
Ferenczi)
* LUCENE-8125: ICUTokenizer support for emoji/emoji sequence tokens. (Robert Muir)
* LUCENE-8196, LUCENE-8300: A new IntervalQuery in the sandbox allows efficient proximity
searches based on minimum-interval semantics. (Alan Woodward, Adrien Grand,
Jim Ferenczi, Simon Willnauer, Matt Weber)
* LUCENE-8233: Add support for soft deletes to IndexWriter delete accounting.
Soft deletes are accounted for inside the index writer and therefor also
by merge policies. A SoftDeletesRetentionMergePolicy is added that allows
to selectively carry over soft_deleted document across merges for retention
policies (Simon Willnauer, Mike McCandless, Robert Muir)
* LUCENE-8237: Add a SoftDeletesDirectoryReaderWrapper that allows to respect
soft deletes if the reader is opened form a directory. (Simon Willnauer,
Mike McCandless, Uwe Schindler, Adrien Grand)
* LUCENE-8229, LUCENE-8270: Add a method Weight.matches(LeafReaderContext, doc)
that returns an iterator over matching positions for a given query and document.
This allows exact hit extraction and will enable implementation of accurate
highlighters. (Alan Woodward, Adrien Grand, David Smiley)
* LUCENE-8249: Implement Matches API for phrase queries (Alan Woodward, Adrien
Grand)
* LUCENE-8246: Allow to customize the number of deletes a merge claims. This
helps merge policies in the soft-delete case to correctly implement retention
policies without triggering uncessary merges. (Simon Willnauer, Mike McCandless)
* LUCENE-8231: A new analysis module (nori) similar to Kuromoji
but to handle Korean using mecab-ko-dic and morphological analysis.
(Robert Muir, Jim Ferenczi)
* LUCENE-8265: WordDelimter/GraphFilter now have an option to skip tokens
marked with KeywordAttribute (Mike Sokolov via Mike McCandless)
* LUCENE-8297: Add IW#tryUpdateDocValues(Reader, int, Fields...) IndexWriter can
update doc values for a specific term but this might affect all documents
containing the term. With tryUpdateDocValues users can update doc-values
fields for individual documents. This allows for instance to soft-delete
individual documents. (Simon Willnauer)
* LUCENE-8298: Allow DocValues updates to reset a value. Passing a DV field with a null
value to IW#updateDocValues or IW#tryUpdateDocValues will now remove the value from the
provided document. This allows to undelete a soft-deleted document unless it's been claimed
by a merge. (Simon Willnauer)
* LUCENE-8273: ConditionalTokenFilter allows analysis chains to skip particular token
filters based on the attributes of the current token. This generalises the keyword
token logic currently used for stemmers and WDF. It is integrated into
CustomAnalyzer by using the `when` and `whenTerm` builder methods, and a new
ProtectedTermFilter is added as an example. (Alan Woodward, Robert Muir,
David Smiley, Steve Rowe, Mike Sokolov)
* LUCENE-8310: Ensure IndexFileDeleter accounts for pending deletes. Today we fail
creating the IndexWriter when the directory has a pending delete. Yet, this
is mainly done to prevent writing still existing files more than once.
IndexFileDeleter already accounts for that for existing files which we can
now use to also take pending deletes into account which ensures that all file
generations per segment always go forward. (Simon Willnauer)
* LUCENE-7960: Add preserveOriginal option to the NGram and EdgeNGram filters.
(Ingomar Wesp, Shawn Heisey via Robert Muir)
* LUCENE-8335: Enforce soft-deletes field up-front. Soft deletes field must be marked
as such once it's introduced and can't be changed after the fact.
(Nhat Nguyen via Simon Willnauer)
* LUCENE-8332: New ConcatenateGraphFilter for concatenating all tokens into one (or more
in the event of a graph input). This is useful for fast analyzed exact-match lookup,
suggesters, and as a component of a named entity recognition system. This was excised
out of CompletionTokenStream in the NRT doc suggester. (David Smiley, Jim Ferenczi)
Bug Fixes
* LUCENE-8221: MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger
indexes.
* LUCENE-8266: Detect bogus tiles when creating a standard polygon and
throw a TileException. (Ignacio Vera)
* LUCENE-8234: Fixed bug in how spatial relationship is computed for
GeoStandardCircle when it covers the whole world. (Ignacio Vera)
* LUCENE-8236: Filter duplicated points when creating GeoPath shapes to
avoid creation of bogus planes. (Ignacio Vera)
* LUCENE-8243: IndexWriter.addIndexes(Directory[]) did not properly preserve
index file names for updated doc values fields (Simon Willnauer,
Michael McCandless, Nhat Nguyen)
* LUCENE-8275: Push up #checkPendingDeletes to Directory to ensure IW fails if
the directory has pending deletes files even if the directory is filtered or
a FileSwitchDirectory (Simon Willnauer, Robert Muir)
* LUCENE-8244: Do not leak open file descriptors in SearcherTaxonomyManager's
refresh on exception (Mike McCandless)
* LUCENE-8305: ComplexPhraseQuery.rewrite now handles an embedded MultiTermQuery
that rewrites to a MatchNoDocsQuery instead of throwing an exception.
(Bjarke Mortensen, Andy Tran via David Smiley)
* LUCENE-8287: Ensure that empty regex completion queries always return no results.
(Julie Tibshirani via Jim Ferenczi)
* LUCENE-8317: Prevent concurrent deletes from being applied during full flush.
Future deletes could potentially be exposed to flushes/commits/refreshes if the
amount of RAM used by deletes is greater than half of the IW RAM buffer. (Simon Willnauer)
* LUCENE-8320: Fix WindowsFS to correctly account for rename and hardlinks.
(Simon Willnauer, Nhat Nguyen)
* LUCENE-8328: Ensure ReadersAndUpdates consistently executes under lock.
(Nhat Nguyen via Simon Willnauer)
* LUCENE-8325: Fixed the smartcn tokenizer to not split UTF-16 surrogate pairs.
(chengpohi via Jim Ferenczi)
* LUCENE-8186: LowerCaseTokenizerFactory now lowercases text in multi-term
queries. (Tim Allison via Adrien Grand)
* LUCENE-8278: Some end-of-input no-scheme domain-only URL tokens are typed as
<ALPHANUM> rather than <URL>. (Junte Zhang, Steve Rowe)
* LUCENE-8355: Prevent IW from opening an already dropped segment while DV updates
are written. (Nhat Nguyen via Simon Willnauer)
* LUCENE-8344: TokenStreamToAutomaton (used by some suggesters) was not ignoring a trailing
position increment when the preservePositionIncrement setting is false.
(David Smiley, Jim Ferenczi)
* LUCENE-8357: FunctionScoreQuery.boostByQuery() and boostByValue() were
producing truncated Explanations (Markus Jelsma, Alan Woodward)
* LUCENE-8360: NGramTokenFilter and EdgeNGramTokenFilter did not correctly
set position increments in end() (Alan Woodward)
Other
* LUCENE-8301: Update randomizedtesting to 2.6.0. (Dawid Weiss)
* LUCENE-8299: Geo3D wrapper uses new polygon method factory that gives better
support for polygons with many points (>100). (Ignacio vera)
* LUCENE-8261: InterpolatedProperties.interpolate and recursive property
references. (Steve Rowe, Dawid Weiss)
* LUCENE-8228: removed obsolete IndexDeletionPolicy clone() requirements from
the javadoc. (Dawid Weiss)
* LUCENE-8219: Use a realistic estimate of the number of nodes and links in
LevensteinAutomaton.java, to save reallocation of arrays.
(Christian Ziech)
* LUCENE-8214: Improve selection of testPoint for GeoComplexPolygon.
(Ignacio Vera)
* SOLR-10912: Add automatic patch validation. (Mano Kovacs, Steve Rowe)
* LUCENE-8122, LUCENE-8175: Upgrade analysis/icu to ICU 61.1.
(Robert Muir, Adrien Grand, Uwe Schindler)
* LUCENE-8291: Remove QueryTemplateManager utility class from XML queryparser.
This class is just a general XML transforming tool (using property files and
XSLT) and has nothing to do with query parsing. It can easily be implemented
using more sophisticated libraries or using XSL transformers from the JDK.
This change also removes the Lucene demo webapp to prevent XSS issues in
untested/unmaintained code. (Uwe Schindler)
Build
* LUCENE-7935: Publish .sha512 hash files with the release artifacts and stop
publishing .md5 hashes since the algorithm is broken (janhoy)
* LUCENE-8230: Upgrade forbiddenapis to version 2.5. (Uwe Schindler)
Documentation
* LUCENE-8238: Improve WordDelimiterFilter and WordDelimiterGraphFilter javadocs
(Mike Sokolov via Mike McCandless)
======================= Lucene 7.3.1 =======================
Bug fixes
* LUCENE-8254: LRUQueryCache could cause IndexReader to hang on close, when
shared with another reader with no CacheHelper (Alan Woodward, Simon Willnauer,
Adrien Grand)
======================= Lucene 7.3.0 =======================
API Changes
* LUCENE-8051: LevensteinDistance renamed to LevenshteinDistance.
(Pulak Ghosh via Adrien Grand)
* LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery and BoostingQuery.
Users should instead use FunctionScoreQuery, possibly combined with
a lucene expression (Alan Woodward)
* LUCENE-8104: Remove facets module compile-time dependency on queries
(Alan Woodward)
* LUCENE-8145: UnifiedHighlighter now uses a unitary OffsetsEnum rather
than a list of enums (Alan Woodward, David Smiley, Jim Ferenczi, Timothy
Rodriguez)
New Features
* LUCENE-2899: Add new module analysis/opennlp, with analysis components
to perform tokenization, part-of-speech tagging, lemmatization and phrase
chunking by invoking the corresponding OpenNLP tools. Named entity
recognition is also provided as a Solr update request processor.
(Lance Norskog, Grant Ingersoll, Joern Kottmann, Em, Kai Gülzau,
Rene Nederhand, Robert Muir, Steven Bower, Steve Rowe)
* LUCENE-8126: Add new spatial prefix tree (SPT) based on google S2 geometry.
It can only be used currently with Geo3D spatial context and it provides
improvements on indexing time for non-points shapes and on query performance.
(Ignacio Vera, David Smiley).
Improvements
* LUCENE-8081: Allow IndexWriter to opt out of flushing on indexing threads
Index/Update Threads try to help out flushing pending document buffers to
disk. This change adds an expert setting to opt ouf of this behavior unless
flusing is falling behind. (Simon Willnauer)
* LUCENE-8086: spatial-extras Geo3dFactory: Use GeoExactCircle with
configurable precision for non-spherical planet models.
(Ignacio Vera via David Smiley)
* LUCENE-8093: TrimFilterFactory implements MultiTermAwareComponent (Alan Woodward)
* LUCENE-8094: TermInSetQuery.toString now returns "field:(A B C)" (Mike McCandless)
* LUCENE-8121: UnifiedHighlighter passage relevancy is improved for terms that are
position sensitive (e.g. part of a phrase) by having an accurate freq.
(David Smiley)
* LUCENE-8129: A Unicode set filter can now be specified when using ICUFoldingFilter.
(Ere Maijala)
* LUCENE-7966: Build Multi-Release JARs to enable usage of optimized intrinsic methods
from Java 9 for index bounds checking and array comparison/mismatch. This change
introduces Java 8 replacements for those Java 9 methods and patches the compiled
classes to use the optimized variants through the MR-JAR mechanism.
(Uwe Schindler, Robert Muir, Adrien Grand, Mike McCandless)
* LUCENE-8127: Speed up rewriteNoScoring when there are no MUST clauses.
(Michael Braun via Adrien Grand)
* LUCENE-8152: Improve consumption of doc-value iterators. (Horatiu Lazu via
Adrien Grand)
* LUCENE-8033: FieldInfos now always use a dense encoding. (Mayya Sharipova
via Adrien Grand)
* LUCENE-8190: Specialized cell interface to allow any spatial prefix tree to
benefit from the setting setPruneLeafyBranches on RecursivePrefixTreeStrategy.
(Ignacio Vera)
Bug Fixes
* LUCENE-8077: Fixed bug in how CheckIndex verifies doc-value iterators.
(Xiaoshan Sun via Adrien Grand)
* SOLR-11758: Fixed FloatDocValues.boolVal to correctly return true for all values != 0.0F
(Munendra S N via hossman)
* LUCENE-8121: The UnifiedHighlighter would highlight some terms within some nested
SpanNearQueries at positions where it should not have. It's fixed in the UH by
switching to the SpanCollector API. The original Highlighter still has this
problem (LUCENE-2287, LUCENE-5455, LUCENE-6796). Some public but internal parts of
the UH were refactored. (David Smiley, Steve Davids)
* LUCENE-8120: Fix LatLonBoundingBox's toString() method (Martijn van Groningen, Adrien Grand)
* LUCENE-8130: Fix NullPointerException from TermStates.toString() (Mike McCandless)
* LUCENE-8124: Fixed HyphenationCompoundWordTokenFilter to handle correctly
hyphenation patterns with indicator >= 7. (Holger Bruch via Adrien Grand)
* LUCENE-8163: BaseDirectoryTestCase could produce random filenames that fail
on Windows (Alan Woodward)
* LUCENE-8174: Fixed {Float,Double,Int,Long}Range.toString(). (Oliver Kaleske
via Adrien Grand)
* LUCENE-8182: Fixed BoostingQuery to apply the context boost instead of the parent query
boost (Jim Ferenczi)
* LUCENE-8188: Fixed bugs in OpenNLPOpsFactory that were causing InputStreams fetched from the
ResourceLoader to be leaked (hossman)
Other
* LUCENE-8111: IndexOrDocValuesQuery Javadoc references outdated method name.
(Kai Chan via Adrien Grand)
* LUCENE-8106: Add script (reproduceJenkinsFailures.py) to attempt to reproduce
failing tests from a Jenkins log. (Steve Rowe)
* LUCENE-8075: Removed unnecessary null check in IntersectTermsEnum.
(Pulak Ghosh via Adrien Grand)
* LUCENE-8156: Require users to not have ASM on the Ant classpath during build.
This is required by LUCENE-7966. (Adrien Grand, Uwe Schindler)
* LUCENE-8161: spatial-extras: the Spatial4j dependency has been updated from 0.6 to 0.7,
which is drop-in compatible (Lucene doesn't expressly use any of the few API differences).
Spatial4j 0.7 is compatible with JTS 1.15.0 and not any prior version. JTS 1.15.0 is
dual-licensed to include BSD; prior versions were LGPL. (David Smiley)
* LUCENE-8155: Add back support in smoke tester to run against later Java versions.
(Uwe Schindler)
* LUCENE-8169: Migrated build to use OpenClover 4.2.1 for checking code coverage.
(Uwe Schindler)
* LUCENE-8170: Improve OpenClover reports (separate test from production code);
enable coverage reports inside test-frameworks. (Uwe Schindler)
Build
* LUCENE-8168: Moved Groovy scripts in build files to separate files.
Update Groovy to 2.4.13. (Uwe Schindler)
* LUCENE-8176: HttpReplicatorTest awaits more than a minute for stopping Jetty threads
(Mikhail Khludnev)
======================= Lucene 7.2.1 =======================
Bug Fixes
* LUCENE-8117: Fix advanceExact on SortedNumericDocValues produced by Lucene54DocValues. (Jim Ferenczi).
======================= Lucene 7.2.0 =======================
API Changes
* LUCENE-8017, LUCENE-8042: Weight, DoubleValuesSource and related objects
now implement a SegmentCacheable interface, with a single method
isCacheable(LeafReaderContext) determining whether or not the object may
be cached against a LeafReader. (Alan Woodward, Robert Muir)
* LUCENE-8038: Payload factors for scoring in PayloadScoreQuery are now
calculated by a PayloadDecoder, instead of delegating to the Similarity.
(Alan Woodward)
* LUCENE-8014: Similarity.computeSlopFactor() and
Similarity.computePayloadFactor() have been deprecated. (Alan Woodward)
* LUCENE-6278: Scorer.freq() has been removed (Alan Woodward)
* LUCENE-7736: DoubleValuesSource and LongValuesSource now expose a
rewrite(IndexSearcher) function. (Alan Woodward)
* LUCENE-7998: DoubleValuesSource.fromQuery() allows you to use the scores
from a Query as a DoubleValuesSource. (Alan Woodward)
* LUCENE-8049: IndexWriter.getMergingSegments()'s return type was changed from
Collection to Set to more accurately reflect it's nature. (David Smiley)
* LUCENE-8059: TopFieldDocCollector can now early terminate collection when
the sort order is compatible with the index order. As a consequence,
EarlyTerminatingSortingCollector is now deprecated. (Adrien Grand)
New Features
* LUCENE-8061: Add convenience factory methods to create BBoxes and XYZSolids
directly from bounds objects.
* LUCENE-7736: IndexReaderFunctions expose various IndexReader statistics as
DoubleValuesSources. (Alan Woodward)
* LUCENE-8068: Allow IndexWriter to write a single DWPT to disk Adds a
flushNextBuffer method to IndexWriter that allows the caller to
synchronously move the next pending or the biggest non-pending index buffer to
disk. This enables flushing selected buffer to disk without highjacking an
indexing thread. This is for instance useful if more than one IW (shards) must
be maintained in a single JVM / system. (Simon Willnauer)
Bug Fixes
* LUCENE-8076: Normalize Vincenti distance calculation for planet models that aren't normalized.
(Ignacio Vera)
* LUCENE-8057: Exact circle bounds computation was incorrect.
(Ignacio Vera)
* LUCENE-8056: Exact circle segment bounding suffered from precision errors.
(Karl Wright)
* LUCENE-8054: Fix the exact circle case where relationships fail when the
planet model has c <= ab, because the planes are constructed incorrectly.
(Ignacio Vera)
* LUCENE-7991: KNearestNeighborDocumentClassifier.knnSearch no longer applies
a previous boosted field's factor to subsequent unboosted fields.
(Christine Poerschke)
* LUCENE-7999: Switch from int to long to track the name for the next
segment to write, so that very long lived indices with very frequent
refreshes or commits, and high indexing thread counts, do not
overflow an int (Mykhailo Demianenko via Mike McCandless)
* LUCENE-8025: Use sumTotalTermFreq=sumDocFreq when scoring DOCS_ONLY fields
that omit term frequency information, as it is equivalent in that case.
Previously bogus numbers were used, and many similarities would
completely degrade. (Robert Muir, Adrien Grand)
* LUCENE-8045: ParallelLeafReader did not correctly report FieldInfo.dvGen
(Alan Woodward)
* LUCENE-8034: Use subtraction instead of addition to sidestep int
overflow in SpanNotQuery. (Hari Menon via Mike McCandless)
* LUCENE-8078: The query cache should not cache instances of
MatchNoDocsQuery. (Jon Harper via Adrien Grand)
* LUCENE-8048: Filesystems do not guarantee order of directories updates
(Nikolay Martynov, Simon Willnauer, Erick Erickson)
Optimizations
* LUCENE-8018: Smaller FieldInfos memory footprint by not retaining unnecessary
references to TreeMap entries. (Julian Vassev via Adrien Grand)
* LUCENE-7994: Use int/int scatter map to gather facet counts when the
number of hits is small relative to the number of unique facet labels
(Dawid Weiss, Robert Muir, Mike McCandless)
* LUCENE-8062: GlobalOrdinalsQuery is no longer eligible for caching. (Jim Ferenczi)
* LUCENE-8058: Large instances of TermInSetQuery are no longer eligible for
caching as they could break memory accounting of the query cache.
(Adrien Grand)
* LUCENE-8055: MemoryIndex.MemoryDocValuesIterator returns 2 documents
instead of 1. (Simon Willnauer)
* LUCENE-8043: Fix document accounting in IndexWriter to prevent writing too many
documents. Once this happens, Lucene refuses to open the index and throws a
CorruptIndexException. (Simon Willnauer, Yonik Seeley, Mike McCandless)
Tests
* LUCENE-8035: Run tests with JDK-specific options: --illegal-access=deny
on Java 9+. (Uwe Schindler)
Build
* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy
jars in ~/.ant/lib/. (Shawn Heisey, Steve Rowe)
======================= Lucene 7.1.0 =======================
Changes in Runtime Behavior
* Resolving of external entities in queryparser/xml/CoreParser is disallowed
by default. See SOLR-11477 for details.
New Features
* LUCENE-7970: Add a shape to Geo3D that consists of multiple planes that
approximate a true circle, rather than an ellipse, for non-spherical planet models.
(Karl Wright, Ignacio Vera)
* LUCENE-7955: Add support for the concept of "nearest distance" to Geo3D's
GeoPath abstraction, which is the distance along the path to the point that is
closest to the provided point. (Karl Wright)
* LUCENE-7906: Add spatial relationships between all currently-defined Geo shapes.
(Ignacio Vera)
* LUCENE-7955: Add support for zero-width paths. (Karl Wright)
* LUCENE-7936: Add serialization and deserialization support to Geo3D. (Karl Wright,
Ignacio Vera)
* LUCENE-7942: Distance computations now have the ability to accurately aggregate
distances, rather than just doing sums. (Karl Wright)
* LUCENE-7934: Add a planet model interface. (Karl Wright)
* LUCENE-7918: Revamp the API for composites so that it's generic and can be used
for many kinds of shapes. (Ignacio Vera)
* LUCENE-7621: Add CoveringQuery, a query whose required number of matching
clauses can be defined per document. (Adrien Grand)
* LUCENE-7927: Add LongValueFacetCounts, to compute facet counts for individual
numeric values (Mike McCandless)
* LUCENE-7940: Add BengaliAnalyzer. (Md. Abdulla-Al-Sun via Robert Muir)
* LUCENE-7392: Add point based LatLonBoundingBox as new RangeField Type.
(Nick Knize)
* LUCENE-7951: Spatial-extras has much better Geo3d support by implementing Spatial4j
abstractions: SpatialContextFactory, ShapeFactory, BinaryCodec, DistanceCalculator.
(Ignacio Vera, David Smiley)
* LUCENE-7973: Update dictionary version for Ukrainian analyzer to 3.9.0 (Andriy
Rysin via Dawid Weiss)
* LUCENE-7974: Add FloatPointNearestNeighbor, an N-dimensional FloatPoint
K-nearest-neighbor search implementation. (Steve Rowe)
* LUCENE-7975: Change the default taxonomy facets cache to a faster
byte[] (UTF-8) based cache. (Mike McCandless)
* LUCENE-7972: DirectoryTaxonomyReader, in Lucene's facet module, now
implements Accountable, so you can more easily track how much heap
it's using. (Mike McCandless)
* LUCENE-7982: A new NormsFieldExistsQuery matches documents that have
norms in a specified field (Colin Goodheart-Smithe via Mike McCandless)
Optimizations
* LUCENE-7905: Optimize how OrdinalMap (used by
SortedSetDocValuesFacetCounts and others) builds its map (Robert
Muir, Adrien Grand, Mike McCandless)
* LUCENE-7655: Speed up geo-distance queries in case of dense single-valued
fields when most documents match. (Maciej Zasada via Adrien Grand)
* LUCENE-7897: IndexOrDocValuesQuery now requires the range cost to be more
than 8x greater than the cost of the lead iterator in order to use doc values.
(Murali Krishna P via Adrien Grand)
* LUCENE-7925: Collapse duplicate SHOULD or MUST clauses by summing up their
boosts. (Adrien Grand)
* LUCENE-7939: MinShouldMatchSumScorer now leverages two-phase iteration in
order to be faster when used in conjunctions. (Adrien Grand)
* LUCENE-7827: AnalyzingInfixSuggester doesn't create "textgrams"
when minPrefixChar=0 (Mikhail Khludnev)
Bug Fixes
* LUCENE-8066: It was still possible to construct a concave GeoExactCircle, so use
a sector approach to prevent that. (Ignacio Vera)
* LUCENE-7967: The GeoDegeneratePoint isWithin() method needed allowance for
numerical precision. (Karl Wright)
* LUCENE-7965: GeoBBoxFactory was constructing the wrong shape at the poles
if the longitude span was greater than 180 degrees. (Karl Wright)
* LUCENE-7916: Prevent ArrayIndexOutOfBoundsException if ICUTokenizer is used
with a different ICU JAR version than it is compiled against. Note, this is
not recommended, lucene-analyzers-icu contains binary data structures
specific to ICU/Unicode versions it is built against. (Chris Koenig, Robert Muir)
* LUCENE-7891: Lucene's taxonomy facets now uses a non-buggy LRU cache
by default. (Jan-Willem van den Broek via Mike McCandless)
* LUCENE-7959: Improve NativeFSLockFactory's exception message if it cannot create
write.lock for an empty index due to bad permissions/read-only filesystem/etc.
(Erick Erickson, Shawn Heisey, Robert Muir)
* LUCENE-7968: AnalyzingSuggester would sometimes order suggestions incorrectly,
it did not properly break ties on the surface forms when both the weights and
the analyzed forms were equal. (Robert Muir)
* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all
child scorers (Adrien Grand, Mike McCandless)
* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
Build
* SOLR-11181: Switch order of maven artifact publishing procedure: deploy first
instead of locally installing first, to workaround a double repository push of
*-sources.jar and *-javadoc.jar files. (Lynn Monson via Steve Rowe)
* LUCENE-6673: Maven build fails for target javadoc:jar.
(Ramkumar Aiyengar, Daniel Collins via Steve Rowe)
* LUCENE-7985: Upgrade forbiddenapis to 2.4.1. (Uwe Schindler)
Other
* LUCENE-7948, LUCENE-7937: Upgrade randomizedtesting to 2.5.3 (minor fixes
in test filtering for IDEs). (Mike Sokolov, Dawid Weiss)
* LUCENE-7933: LongBitSet now validates the numBits parameter (Won
Jonghoon, Mike McCandless)
* LUCENE-7978: Add some more documentation about setting up build
environment. (Anton R. Yuste via Uwe Schindler)
* LUCENE-7983: IndexWriter.IndexReaderWarmer is now a functional interface
instead of an abstract class with a single method (Dawid Weiss)
* LUCENE-5753: Update TLDs recognized by UAX29URLEmailTokenizer. (Steve Rowe)
======================= Lucene 7.0.1 =======================
Bug Fixes
* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all
child scorers (Adrien Grand, Mike McCandless)
======================= Lucene 7.0.0 =======================
New Features
* LUCENE-7703: SegmentInfos now record the major Lucene version at index
creation time. (Adrien Grand)
* LUCENE-7756: LeafReader.getMetaData now exposes the index created version as
well as the oldest Lucene version that contributed to the segment.
(Adrien Grand)
* LUCENE-7854: The new TermFrequencyAttribute used during analysis
with a custom token stream allows indexing custom term frequencies
(Mike McCandless)
* LUCENE-7866: Add a new DelimitedTermFrequencyTokenFilter that allows to
mark tokens with a custom term frequency (LUCENE-7854). It parses a numeric
value after a separator char ('|') at the end of each token and changes
the term frequency to this value. (Uwe Schindler, Robert Muir, Mike
McCandless)
* LUCENE-7868: Multiple threads can now resolve deletes and doc values
updates concurrently, giving sizable speedups in update-heavy
indexing use cases (Simon Willnauer, Mike McCandless)
* LUCENE-7823: Pure query based naive bayes classifier using BM25 scores (Tommaso Teofili)
* LUCENE-7838: Knn classifier based on fuzzified term queries (Tommaso Teofili)
* LUCENE-7855: Added advanced options of the Wikipedia tokenizer to its factory.
(Juan Pedro via Adrien Grand)
API Changes
* LUCENE-2605: Classic QueryParser no longer splits on whitespace by default.
Use setSplitOnWhitespace(true) to get the old behavior. (Steve Rowe)
* LUCENE-7369: Similarity.coord and BooleanQuery.disableCoord are removed.
(Adrien Grand)
* LUCENE-7368: Removed query normalization. (Adrien Grand)
* LUCENE-7355: AnalyzingQueryParser has been removed as its functionality has
been folded into the classic QueryParser. (Adrien Grand)
* LUCENE-7407: Doc values APIs have been switched from random access
to iterators, enabling future codec compression improvements. (Mike
McCandless)
* LUCENE-7475: Norms now support sparsity, allowing to pay for what is
actually used. (Adrien Grand)
* LUCENE-7494: Points now have a per-field API, like doc values. (Adrien Grand)
* LUCENE-7410: Cache keys and close listeners have been refactored in order
to be less trappy. See IndexReader.getReaderCacheHelper and
LeafReader.getCoreCacheHelper. (Adrien Grand)
* LUCENE-6819: Index-time boosts are not supported anymore. As a replacement,
index-time scoring factors should be indexed into a doc value field and
combined at query time using eg. FunctionScoreQuery. (Adrien Grand)
* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType.
(David Smiley)
* LUCENE-7701: Grouping collectors have been refactored, such that groups are
now defined by a GroupSelector implementation. (Alan Woodward)
* LUCENE-7741: DoubleValuesSource now has an explain() method (Alan Woodward,
Adrien Grand)
* LUCENE-7815: Removed the PostingsHighlighter; you should use the UnifiedHighlighter
instead, which derived from the UH. WholeBreakIterator and
CustomSeparatorBreakIterator were moved to UH's package. (David Smiley)
* LUCENE-7850: Removed support for legacy numerics. (Adrien Grand)
* LUCENE-7500: Removed abstract LeafReader.fields(); instead terms(fieldName)
has been made abstract, fomerly was final. Also, MultiFields.getTerms
was optimized to work directly instead of being implemented on getFields.
(David Smiley)
* LUCENE-7872: TopDocs.totalHits is now a long. (Adrien Grand, hossman)
* LUCENE-7868: IndexWriterConfig.setMaxBufferedDeleteTerms is
removed. (Simon Willnauer, Mike McCandless)
* LUCENE-7877: PrefixAwareTokenStream is replaced with ConcatenatingTokenStream
(Alan Woodward, Uwe Schindler, Adrien Grand)
* LUCENE-7867: The deprecated Token class is now only available in the test
framework (Alan Woodward, Adrien Grand)
* LUCENE-7723: DoubleValuesSource enforces implementation of equals() and
hashCode() (Alan Woodward)
* LUCENE-7737: The spatial-extras module no longer has a dependency on the
queries module. All uses of ValueSource are either replaced with core
DoubleValuesSource extensions, or with the new ShapeValuesSource and
ShapeValuesPredicate classes (Alan Woodward, David Smiley)
* LUCENE-7892: Doc-values query factory methods have been renamed so that their
name contains "slow" in order to cleary indicate that they would usually be a
bad choice. (Adrien Grand)
* LUCENE-7899: FieldValueQuery is renamed to DocValuesFieldExistsQuery
(Adrien Grand, Mike McCandless)
Bug Fixes
* LUCENE-7626: IndexWriter will no longer accept broken token offsets
(Mike McCandless)
* LUCENE-7859: Spatial-extras PackedQuadPrefixTree bug that only revealed itself
with the new pointsOnly optimizations in LUCENE-7845. (David Smiley)
* LUCENE-7871: fix false positive match in BlockJoinSelector when children have no value, introducing
wrap methods accepting children as DISI. Extracting ToParentDocValues (Mikhail Khludnev)
* LUCENE-7914: Add a maximum recursion level in automaton recursive
functions (Operations.isFinite and Operations.topsortState) to prevent
large automaton to overflow the stack (Robert Muir, Adrien Grand, Jim Ferenczi)
* LUCENE-7864: IndexMergeTool is not using intermediate hard links (even
if possible). (Dawid Weiss)
* LUCENE-7956: Fixed potential stack overflow error in ICUNormalizer2CharFilter.
(Adrien Grand)
* LUCENE-7963: Remove useless getAttribute() in DefaultIndexingChain that
causes performance drop, introduced by LUCENE-7626. (Daniel Mitterdorfer
via Uwe Schindler)
Improvements
* LUCENE-7489: Better storage of sparse doc-values fields with the default
codec. (Adrien Grand)
* LUCENE-7730: More accurate encoding of the length normalization factor
thanks to the removal of index-time boosts. (Adrien Grand)
* LUCENE-7901: Original Highlighter now eagerly throws an exception if you
provide components that are null. (Jason Gerlowski, David Smiley)
* LUCENE-7841: Normalize ґ to г in Ukrainian analyzer. (Andriy Rysin via Dawid Weiss)
Optimizations
* LUCENE-7416: BooleanQuery optimizes queries that have queries that occur both
in the sets of SHOULD and FILTER clauses, or both in MUST/FILTER and MUST_NOT
clauses. (Spyros Kapnissis via Adrien Grand, Uwe Schindler)
* LUCENE-7506: FastTaxonomyFacetCounts should use CPU in proportion to
the size of the intersected set of hits from the query and documents
that have a facet value, so sparse faceting works as expected
(Adrien Grand via Mike McCandless)
* LUCENE-7519: Add optimized APIs to compute browse-only top level
facets (Mike McCandless)
* LUCENE-7589: Numeric doc values now have the ability to encode blocks of
values using different numbers of bits per value if this proves to save
storage. (Adrien Grand)
* LUCENE-7845: Enhance spatial-extras RecursivePrefixTreeStrategy queries when the
query is a point (for 2D) or a is a simple date interval (e.g. 1 month). When
the strategy is marked as pointsOnly, the results is a TermQuery. (David Smiley)
* LUCENE-7874: DisjunctionMaxQuery rewrites to a BooleanQuery when tiebreaker is set to 1. (Jim Ferenczi)
* LUCENE-7828: Speed up range queries on range fields by improving how we
compute the relation between the query and inner nodes of the BKD tree.
(Adrien Grand)
Other
* LUCENE-7923: Removed FST.Arc.node field (unused). (Dawid Weiss)
* LUCENE-7328: Remove LegacyNumericEncoding from GeoPointField. (Nick Knize)
* LUCENE-7360: Remove Explanation.toHtml() (Alan Woodward)
* LUCENE-7681: MemoryIndex uses new DocValues API (Alan Woodward)
* LUCENE-7753: Make fields static when possible.
(Daniel Jelinski via Adrien Grand)
* LUCENE-7540: Upgrade ICU to 59.1 (Mike McCandless, Jim Ferenczi)
* LUCENE-7852: Correct copyright year(s) in lucene/LICENSE.txt file.
(Christine Poerschke, Steve Rowe)
* LUCENE-7719: Generalized the UnifiedHighlighter's support for AutomatonQuery
for character & binary automata. Added AutomatonQuery.isBinary. (David Smiley)
* LUCENE-7873: Due to serious problems with context class loaders in several
frameworks (OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats,
DocValuesFormats and all analysis factories was changed to only inspect the
current classloader that defined the interface class (lucene-core.jar).
See MIGRATE.txt for more information! (Uwe Schindler, Dawid Weiss)
* LUCENE-7883: Lucene no longer uses the context class loader when resolving
resources in CustomAnalyzer or ClassPathResourceLoader. Resources are only
resolved against Lucene's class loader by default. Please use another builder
method to change to a custom classloader. (Uwe Schindler)
* LUCENE-5822: Convert README to Markdown (Jason Gerlowski via Mike Drob)
* LUCENE-7773: Remove unused/deprecated token types from StandardTokenizer.
(Ahmet Arslan via Steve Rowe)
* LUCENE-7800: Remove code that potentially rethrows checked exceptions
from methods that don't declare them ("sneaky throw" hack). (Robert Muir,
Uwe Schindler, Dawid Weiss)
* LUCENE-7876: Avoid calls to LeafReader.fields() and MultiFields.getFields()
that are trivially replaced by LeafReader.terms() and MultiFields.getTerms()
(David Smiley)
======================= Lucene 6.6.5 =======================
(No Changes)
======================= Lucene 6.6.4 =======================
(No Changes)
======================= Lucene 6.6.3 =======================
Build
* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy
jars in ~/.ant/lib/. (Shawn Heisey, Steve Rowe)
======================= Lucene 6.6.2 =======================
Changes in Runtime Behavior
* Resolving of external entities in queryparser/xml/CoreParser is disallowed
by default. See SOLR-11477 for details.
Bug Fixes
* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
======================= Lucene 6.6.1 =======================
Bug Fixes
* LUCENE-7869: Changed MemoryIndex to sort 1d points. In case of 1d points, the PointInSetQuery.MergePointVisitor expects
that these points are visited in ascending order. The memory index doesn't do this and this can result in document
with multiple points that should match to not match. (Martijn van Groningen)
* LUCENE-7878: Fix query builder to keep the SHOULD clause that wraps multi-word synonyms. (Jim Ferenczi)
======================= Lucene 6.6.0 =======================
New Features
* LUCENE-7811: Add a concurrent SortedSet facets implementation.
(Mike McCandless)
Bug Fixes
* LUCENE-7777: ByteBlockPool.readBytes sometimes throws
ArrayIndexOutOfBoundsException when byte blocks larger than 32 KB
were added (Mike McCandless)
* LUCENE-7797: The static FSDirectory.listAll(Path) method was always
returning an empty array. (Atkins Chang via Mike McCandless)
* LUCENE-7481: Fixed missing rewrite methods for SpanPayloadCheckQuery
and PayloadScoreQuery. (Erik Hatcher)
* LUCENE-7808: Fixed PayloadScoreQuery and SpanPayloadCheckQuery
.equals and .hashCode methods. (Erik Hatcher)
* LUCENE-7798: Add .equals and .hashCode to ToParentBlockJoinSortField
(Mikhail Khludnev)
* LUCENE-7814: DateRangePrefixTree (in spatial-extras) had edge-case bugs for
years >= 292,000,000. (David Smiley)
* LUCENE-5365, LUCENE-7818: Fix incorrect condition in queryparser's
QueryNodeOperation#logicalAnd(). (Olivier Binda, Amrit Sarkar,
AppChecker via Uwe Schindler)
* LUCENE-7821: The classic and flexible query parsers, as well as Solr's
"lucene"/standard query parser, should require " TO " in range queries,
and accept "TO" as endpoints in range queries. (hossman, Steve Rowe)
* LUCENE-7824: Fix graph query analysis for multi-word synonym rules with common terms (eg. new york, new york city).
(Jim Ferenczi)
* LUCENE-7817: Pass cached query to onQueryCache instead of null.
(Christoph Kaser via Adrien Grand)
* LUCENE-7831: CodecUtil should not seek to negative offsets. (Adrien Grand)
* LUCENE-7833: ToParentBlockJoinQuery computed the min score instead of the max
score with ScoreMode.MAX. (Adrien Grand)
* LUCENE-7847: Fixed all-docs-match optimization of range queries on range
fields. (Adrien Grand)
* LUCENE-7810: Fix equals() and hashCode() methods of several join queries.
(Hossman, Adrien Grand, Martijn van Groningen)
Improvements
* LUCENE-7782: OfflineSorter now passes the total number of items it
will write to getWriter (Mike McCandless)
* LUCENE-7785: Move dictionary for Ukrainian analyzer to external dependency.
(Andriy Rysin via Steve Rowe, Dawid Weiss)
* LUCENE-7801: SortedSetDocValuesReaderState now implements
Accountable so you can see how much RAM it's using (Robert Muir,
Mike McCandless)
* LUCENE-7792: OfflineSorter can now run concurrently if you pass it
an optional ExecutorService (Dawid Weiss, Mike McCandless)
* LUCENE-7811: Sorted set facets now use sparse storage when
collecting hits, when appropriate. (Mike McCandless)
Optimizations
* LUCENE-7787: spatial-extras HeatmapFacetCounter will now short-circuit it's
work when Bits.MatchNoBits is passed. (David Smiley)
Other
* LUCENE-7796: Make IOUtils.reThrow idiom declare Error return type so
callers may use it in a way that compiler knows subsequent code is
unreachable. reThrow is now deprecated in favor of IOUtils.rethrowAlways
with a slightly different semantics (see javadoc). (Hossman, Robert Muir,
Dawid Weiss)
* LUCENE-7754: Inner classes should be static whenever possible.
(Daniel Jelinski via Adrien Grand)
* LUCENE-7751: Avoid boxing primitives only to call compareTo.
(Daniel Jelinski via Adrien Grand)
* LUCENE-7743: Never call new String(String).
(Daniel Jelinski via Adrien Grand)
* LUCENE-7761: Fixed comment in ReqExclScorer.
(Pablo Pita Leira via Adrien Grand)
======================= Lucene 6.5.1 =======================
Bug Fixes
* LUCENE-7755: Fixed join queries to not reference IndexReaders, as it could
cause leaks if they are cached. (Adrien Grand)
* LUCENE-7749: Made LRUQueryCache delegate the scoreSupplier method.
(Martin Amirault via Adrien Grand)
* LUCENE-7769: The UnifiedHighligter wasn't highlighting portions of the query
wrapped in BoostQuery or SpanBoostQuery. (David Smiley, Dmitry Malinin)
Other
* LUCENE-7763: Remove outdated comment in IndexWriterConfig.setIndexSort javadocs.
(马可阳 via Christine Poerschke)
======================= Lucene 6.5.0 =======================
API Changes
* LUCENE-7740: Refactor Range Fields to remove Field suffix (e.g., DoubleRange),
move InetAddressRange and InetAddressPoint from sandbox to misc module, and
refactor all other range fields from sandbox to core. (Nick Knize)
* LUCENE-7624: TermsQuery has been renamed as TermInSetQuery and moved to core.
(Alan Woodward)
* LUCENE-7637: TermInSetQuery requires that all terms come from the same field.
(Adrien Grand)
* LUCENE-7644: FieldComparatorSource.newComparator() and
SortField.getComparator() no longer throw IOException (Alan Woodward)
* LUCENE-7643: Replaced doc-values queries in lucene/sandbox with factory
methods on the *DocValuesField classes. (Adrien Grand)
* LUCENE-7659: Added a IndexWriter#getFieldNames() method (experimental) to return
all field names as visible from the IndexWriter. This would be useful for
IndexWriter#updateDocValues() calls, to prevent calling with non-existent
docValues fields (Ishan Chattopadhyaya, Adrien Grand, Mike McCandless)
* LUCENE-6959: Removed ToParentBlockJoinCollector in favour of
ParentChildrenBlockJoinQuery, that can return the matching children documents per
parent document. This query should be executed for each matching parent document
after the main query has been executed. (Adrien Grand, Martijn van Groningen,
Mike McCandless)
* LUCENE-7628: Scorer.getChildren() now only returns Scorers that are
positioned on the current document, and can throw an IOException.
AssertingScorer checks that getChildren() is not called on an unpositioned
Scorer. (Alan Woodward, Adrien Grand)
* LUCENE-7702: Removed GraphQuery in favour of simple boolean query. (Matt Webber via Jim Ferenczi)
* LUCENE-7707: TopDocs.merge now takes a boolean option telling it
when to use the incoming shard index versus when to assign the shard
index itself, allowing users to merge shard responses incrementally
instead of once all shard responses are present. (Simon Willnauer,
Mike McCandless)
* LUCENE-7700: A cleanup of merge throughput control logic. Refactored all the
code previously scattered throughout the IndexWriter and
ConcurrentMergeScheduler into a more accessible set of public methods (see
MergePolicy.OneMergeProgress, MergeScheduler.wrapForMerge and
OneMerge.mergeInit). (Dawid Weiss, Mike McCandless).
* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType.
(David Smiley)
New Features
* LUCENE-7738: Add new InetAddressRange for indexing and querying InetAddress
ranges. (Nick Knize)
* LUCENE-7449: Add CROSSES relation support to RangeFieldQuery. (Nick Knize)
* LUCENE-7623: Add FunctionScoreQuery and FunctionMatchQuery (Alan Woodward,
Adrien Grand, David Smiley)
* LUCENE-7619: Add WordDelimiterGraphFilter, just like
WordDelimiterFilter except it produces correct token graphs so that
proximity queries at search time will produce correct results (Mike
McCandless)
* LUCENE-7656: Added the LatLonDocValuesField.new(Box/Distance)Query() factory
methods that are the equivalent of factory methods on LatLonPoint but operate
on doc values. These new methods should be wrapped in an IndexOrDocValuesQuery
for best performance. (Adrien Grand)
* LUCENE-7673: Added MultiValued[Int/Long/Float/Double]FieldSource that given a
SortedNumericSelector.Type can give a ValueSource view of a
SortedNumericDocValues field. (Tomás Fernández Löbbe)
* LUCENE-7465: Add SimplePatternTokenizer and
SimplePatternSplitTokenizer, using Lucene's regexp/automaton
implementation for analysis/tokenization (Clinton Gormley, Mike
McCandless)
* LUCENE-7688: Add OneMergeWrappingMergePolicy class.
(Keith Laban, Christine Poerschke)
* LUCENE-7686: The near-real-time document suggester can now
efficiently filter out duplicate suggestions (Uwe Schindler, Mike
McCandless)
* LUCENE-7712: SimpleQueryParser now supports default fuzziness
syntax, mapping foo~ to a FuzzyQuery with edit distance 2. (Lee
Hinman, David Pilato via Mike McCandless)
Bug Fixes
* LUCENE-7630: Fix (Edge)NGramTokenFilter to no longer drop payloads
and preserve all attributes. (Nathan Gass via Uwe Schindler)
* LUCENE-7679: MemoryIndex was ignoring omitNorms settings on passed-in
IndexableFields. (Alan Woodward)
* LUCENE-7692: PatternReplaceCharFilterFactory now implements MultiTermAware.
(Adrien Grand)
* LUCENE-7685: ToParentBlockJoinQuery and ToChildBlockJoinQuery now use the
rewritten child query in their equals and hashCode implementations.
(Adrien Grand)
* LUCENE-7698: CommonGramsQueryFilter was producing a disconnected
token graph, messing up phrase queries when it was used during query
parsing (Ere Maijala via Mike McCandless)
* LUCENE-7708: ShingleFilter without unigram was producing a disconnected
token graph, messing up queries when it was used during query
parsing (Jim Ferenczi)
Improvements
* LUCENE-7055: Added Weight#scorerSupplier, which allows to estimate the cost
of a Scorer before actually building it, in order to optimize how the query
should be run, eg. using points or doc values depending on costs of other
parts of the query. (Adrien Grand)
* LUCENE-7643: IndexOrDocValuesQuery allows to execute range queries using
either points or doc values depending on which one is more efficient.
(Adrien Grand)
* LUCENE-7662: If index files are missing, throw CorruptIndexException instead
of the less descriptive FileNotFound or NoSuchFileException (Mike Drob via
Mike McCandless, Erick Erickson)
* LUCENE-7680: UsageTrackingQueryCachingPolicy never caches term filters anymore
since they are plenty fast. This also has the side-effect of leaving more
space in the history for costly filters. (Adrien Grand)
* LUCENE-7677: UsageTrackingQueryCachingPolicy now caches compound queries a bit
earlier than regular queries in order to improve cache efficiency.
(Adrien Grand)
* LUCENE-7710: BlockPackedReader throws CorruptIndexException and includes
IndexInput description instead of plain IOException (Mike Drob via
Mike McCandless)
* LUCENE-7695: ComplexPhraseQueryParser to support query time synonyms (Markus Jelsma
via Mikhail Khludnev)
* LUCENE-7747: QueryBuilder now iterates lazily over the possible paths when building a graph query
(Jim Ferenczi)
Optimizations
* LUCENE-7641: Optimized point range queries to compute documents that do not
match the range on single-valued fields when more than half the documents in
the index would match. (Adrien Grand)
* LUCENE-7656: Speed up for LatLonPointDistanceQuery by computing distances even
less often. (Adrien Grand)
* LUCENE-7661: Speed up for LatLonPointInPolygonQuery by pre-computing the
relation of the polygon with a grid. (Adrien Grand)
* LUCENE-7660: Speed up LatLonPointDistanceQuery by improving the detection of
whether BKD cells are entirely within the distance close to the dateline.
(Adrien Grand)
* LUCENE-7654: ToParentBlockJoinQuery now implements two-phase iteration and
computes scores lazily in order to be faster when used in conjunctions.
(Adrien Grand)
* LUCENE-7667: BKDReader now calls `IntersectVisitor.grow()` on larger
increments. (Adrien Grand)
* LUCENE-7638: Query parsers now analyze the token graph for articulation
points (or cut vertices) in order to create more efficient queries for
multi-token synonyms. (Jim Ferenczi)
* LUCENE-7699: Query parsers now use span queries to produce more efficient
phrase queries for multi-token synonyms. (Matt Webber via Jim Ferenczi)
* LUCENE-7742: Fix places where we were unboxing and then re-boxing
according to FindBugs (Daniel Jelinski via Mike McCandless)
* LUCENE-7739: Fix places where we unnecessarily boxed while parsing
a numeric value according to FindBugs (Daniel Jelinski via Mike
McCandless)
Build
* LUCENE-7653: Update randomizedtesting to version 2.5.0. (Dawid Weiss)
* LUCENE-7665: Remove grouping dependency from the join module.
(Martijn van Groningen)
* SOLR-10023: Add non-recursive 'test-nocompile' target: Only runs unit tests.
Jars are not downloaded; compilation is not updated; and Clover is not enabled.
(Steve Rowe)
* LUCENE-7694: Update forbiddenapis to version 2.3. (Uwe Schindler)
* LUCENE-7693: Replace "org.apache." logic in GetMavenDependenciesTask.
(Daniel Collins, Christine Poerschke)
* LUCENE-7726: Fix HTML entity bugs in Javadocs to be able to build with
Java 9. (Uwe Schindler, Hossman)
* LUCENE-7727: Replace end-of-life Markdown parser "Pegdown" by "Flexmark"
for compatibility with Java 9. (Uwe Schindler)
Other
* LUCENE-7666: Fix typos in lucene-join package info javadoc.
(Tom Saleeba via Christine Poerschke)
* LUCENE-7658: queryparser/xml CoreParser now implements SpanQueryBuilder interface.
(Daniel Collins, Christine Poerschke)
* LUCENE-7715: NearSpansUnordered simplifications.
(Paul Elschot via Adrien Grand)
======================= Lucene 6.4.2 =======================
Bug Fixes
* LUCENE-7676: Fixed FilterCodecReader to override more super-class methods.
Also added TestFilterCodecReader class. (Christine Poerschke)
* LUCENE-7717: The UnifiedHighlighter and PostingsHighlighter were not highlighting
prefix queries with multi-byte characters. TermRangeQuery is affected too.
(Dmitry Malinin, David Smiley)
======================= Lucene 6.4.1 =======================
Build
* LUCENE-7651: Fix Javadocs build for Java 8u121 by injecting "Google Code
Prettify" without adding Javascript to Javadocs's -bottom parameter.
Also update Prettify to latest version to fix Google Chrome issue.
(Uwe Schindler)
Bug Fixes
* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery
with a TermContext is cached. (Adrien Grand)
* LUCENE-7647: Made stored fields reclaim native memory more aggressively when
configured with BEST_COMPRESSION. This could otherwise result in out-of-memory
issues. (Adrien Grand)
* LUCENE-7670: AnalyzingInfixSuggester should not immediately open an
IndexWriter over an already-built index. (Steve Rowe)
======================= Lucene 6.4.0 =======================
API Changes
* LUCENE-7533: Classic query parser no longer allows autoGeneratePhraseQueries
to be set to true when splitOnWhitespace is false (and vice-versa).
* LUCENE-7607: LeafFieldComparator.setScorer and SimpleFieldComparator.setScorer
are declared as throwing IOException (Alan Woodward)
* LUCENE-7617: Collector construction for two-pass grouping queries is
abstracted into a new Grouper class, which can be passed as a constructor
parameter to GroupingSearch. The abstract base classes for the different
grouping Collectors are renamed to remove the Abstract* prefix.
(Alan Woodward, Martijn van Groningen)
* LUCENE-7609: The expressions module now uses the DoubleValuesSource API, and
no longer depends on the queries module. Expression#getValueSource() is
replaced with Expression#getDoubleValuesSource(). (Alan Woodward, Adrien
Grand)
* LUCENE-7610: The facets module now uses the DoubleValuesSource API, and
methods that take ValueSource parameters are deprecated (Alan Woodward)
* LUCENE-7611: DocumentValueSourceDictionary now takes a LongValuesSource
as a parameter, and the ValueSource equivalent is deprecated (Alan Woodward)
New features
* LUCENE-5867: Added BooleanSimilarity. (Robert Muir, Adrien Grand)
* LUCENE-7466: Added AxiomaticSimilarity. (Peilin Yang via Tommaso Teofili)
* LUCENE-7590: Added DocValuesStatsCollector to compute statistics on DocValues
fields. (Shai Erera)
* LUCENE-7587: The new FacetQuery and MultiFacetQuery helper classes
make it simpler to execute drill down when drill sideways counts are
not needed (Emmanuel Keller via Mike McCandless)
* LUCENE-6664: A new SynonymGraphFilter outputs a correct graph
structure for multi-token synonyms, separating out a
FlattenGraphFilter that is hardwired into the current
SynonymFilter. This finally makes it possible to implement
correct multi-token synonyms at search time. See
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
for details. (Mike McCandless)
* LUCENE-5325: Added LongValuesSource and DoubleValuesSource, intended as
type-safe replacements for ValueSource in the queries module. These
expose per-segment LongValues or DoubleValues iterators. (Alan Woodward, Adrien Grand)
* LUCENE-7603: Graph token streams are now handled accurately by query
parsers, by enumerating all paths and creating the corresponding
query/ies as sub-clauses (Matt Weber via Mike McCandless)
* LUCENE-7588: DrillSideways can now run queries concurrently, and
supports an IndexSearcher using an executor service to run each query
concurrently across all segments in the index (Emmanuel Keller via
Mike McCandless)
* LUCENE-7627: Added .intersect methods to SortedDocValues and
SortedSetDocValues to allow filtering their TermsEnums with a
CompiledAutomaton (Alan Woodward, Mike McCandless)
Bug Fixes
* LUCENE-7547: JapaneseTokenizerFactory was failing to close the
dictionary file it opened (Markus via Mike McCandless)
* LUCENE-7562: CompletionFieldsConsumer sometimes throws
NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless)
* LUCENE-7533: Classic query parser: disallow autoGeneratePhraseQueries=true
when splitOnWhitespace=false (and vice-versa). (Steve Rowe)
* LUCENE-7536: ASCIIFoldingFilterFactory used to return an illegal multi-term
component when preserveOriginal was set to true. (Adrien Grand)
* LUCENE-7576: Fix Terms.intersect in the default codec to detect when
the incoming automaton is a special case and throw a clearer
exception than NullPointerException (Tom Mortimer via Mike McCandless)
* LUCENE-6989: Fix Exception handling in MMapDirectory's unmap hack
support code to work with Java 9's new InaccessibleObjectException
that does not extend ReflectiveAccessException in Java 9.
(Uwe Schindler)
* LUCENE-7581: Lucene now prevents updating a doc values field that is used
in the index sort, since this would lead to corruption. (Jim
Ferenczi via Mike McCandless)
* LUCENE-7570: IndexWriter may deadlock if a commit is running while
there are too many merges running and one of the merges hits a
tragic exception (Joey Echeverria via Mike McCandless)
* LUCENE-7594: Fixed point range queries on floating-point types to recommend
using helpers for exclusive bounds that are consistent with Double.compare.
(Adrien Grand, Dawid Weiss)
* LUCENE-7606: Normalization with CustomAnalyzer would only apply the last
token filter. (Adrien Grand)
* LUCENE-7612: Removed an unused dependency from the suggester to the misc
module. (Alan Woodward)
Improvements
* LUCENE-7532: Add back lost codec file format documentation
(Shinichiro Abe via Mike McCandless)
* LUCENE-6824: TermAutomatonQuery now rewrites to TermQuery,
PhraseQuery or MultiPhraseQuery when the word automaton is simple
(Mike McCandless)
* LUCENE-7431: Allow a certain amount of overlap to be specified between the include
and exclude arguments of SpanNotQuery via negative pre and/or post arguments.
(Marc Morissette via David Smiley)
* LUCENE-7544: UnifiedHighlighter: add extension points for handling custom queries.
(Michael Braun, David Smiley)
* LUCENE-7538: Asking IndexWriter to store a too-massive text field
now throws IllegalArgumentException instead of a cryptic exception
that closes your IndexWriter (Steve Chen via Mike McCandless)
* LUCENE-7524: Added more detailed explanation of how IDF is computed in
ClassicSimilarity and BM25Similarity. (Adrien Grand)
* LUCENE-7564: AnalyzingInfixSuggester should close its IndexWriter by default
at the end of build(). (Steve Rowe)
* LUCENE-7526: Enhanced UnifiedHighlighter's passage relevancy for queries with
wildcards and sometimes just terms. Added shouldPreferPassageRelevancyOverSpeed()
which can be overridden to return false to eek out more speed in some cases.
(Timothy M. Rodriguez, David Smiley)
* LUCENE-7560: QueryBuilder.createFieldQuery is no longer final,
giving custom query parsers subclassing QueryBuilder more freedom to
control how text is analyzed and converted into a query (Matt Weber
via Mike McCandless)
* LUCENE-7537: Index time sorting now supports multi-valued sorts
using selectors (MIN, MAX, etc.) (Jim Ferenczi via Mike McCandless)
* LUCENE-7575: UnifiedHighlighter can now highlight fields with queries that don't
necessarily refer to that field (AKA requireFieldMatch==false). Disabled by default.
See UH get/setFieldMatcher. (Jim Ferenczi via David Smiley)
* LUCENE-7592: If the segments file is truncated, we now throw
CorruptIndexException instead of the more confusing EOFException
(Mike Drob via Mike McCandless)
* LUCENE-6989: Make MMapDirectory's unmap hack work with Java 9 EA (b150+):
Unmapping uses new sun.misc.Unsafe#invokeCleaner(ByteBuffer).
Java 9 now needs same permissions like Java 8;
RuntimePermission("accessClassInPackage.jdk.internal.ref")
is no longer needed. Support for older Java 9 builds was removed.
(Uwe Schindler)
* LUCENE-7401: Changed the way BKD trees pick the split dimension in order to
ensure all dimensions are indexed. (Adrien Grand)
* LUCENE-7614: Complex Phrase Query parser ignores double quotes around single token
prefix, wildcard, range queries (Mikhail Khludnev)
* LUCENE-7620: Added LengthGoalBreakIterator, a wrapper around another B.I. to skip breaks
that would create Passages that are too short. Only for use with the UnifiedHighlighter
(and probably PostingsHighlighter). (David Smiley)
Optimizations
* LUCENE-7568: Optimize merging when index sorting is used but the
index is already sorted (Jim Ferenczi via Mike McCandless)
* LUCENE-7563: The BKD in-memory index for dimensional points now uses
a compressed format, using substantially less RAM in some cases
(Adrien Grand, Mike McCandless)
* LUCENE-7583: BKD writing now buffers each leaf block in heap before
writing to disk, giving a small speedup in points-heavy use cases.
(Mike McCandless)
* LUCENE-7572: Doc values queries now cache their hash code. (Adrien Grand)
Other
* LUCENE-7546: Fixed references to benchmark wikipedia data and the Jenkins line-docs file
(David Smiley)
* LUCENE-7534: fix smokeTestRelease.py to run on Cygwin (Mikhail Khludnev)
* LUCENE-7559: UnifiedHighlighter: Make Passage and OffsetsEnum more exposed to allow
passage creation to be customized. (David Smiley)
* LUCENE-7599: Simplify TestRandomChains using Java's built-in Predicate and
Function interfaces. (Ahmet Arslan via Adrien Grand)
* LUCENE-7595: Improve RAMUsageTester in test-framework to estimate memory usage of
runtime classes and work with Java 9 EA (b148+). Disable static field heap usage
checker in LuceneTestCase. (Uwe Schindler, Dawid Weiss)
Build
* LUCENE-7387: fix defaultCodec in build.xml to account for the line ending (hossman)
* LUCENE-7543: Make changes-to-html target an offline operation, by moving the
Lucene and Solr DOAP RDF files into the Git source repository under
dev-tools/doap/ and then pulling release dates from those files, rather than
from JIRA. (Mano Kovacs, hossman, Steve Rowe)
* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9
build 148+. Also update JGit version for working-copy checks. (Uwe Schindler)
======================= Lucene 6.3.0 =======================
API Changes
New Features
* LUCENE-7438: New "UnifiedHighlighter" derivative of the PostingsHighlighter that
can consume offsets from postings, term vectors, or analysis. It can highlight phrases
as accurately as the standard Highlighter. Light term vectors can be used with offsets
in postings for fast wildcard (MultiTermQuery) highlighting.
(David Smiley, Timothy Rodriguez)
* LUCENE-7490: SimpleQueryParser now parses '*' to MatchAllDocsQuery
(Lee Hinman via Mike McCandless)
Bug Fixes
* LUCENE-7507: Upgrade morfologik-stemming to version 2.1.1 (fixes security
manager issue with Polish dictionary lookup). (Dawid Weiss)
* LUCENE-7472: MultiFieldQueryParser.getFieldQuery() drops queries that are
neither BooleanQuery nor TermQuery. (Steve Rowe)
* LUCENE-7456: PerFieldPostings/DocValues was failing to delegate the
merge method (Julien MASSENET via Mike McCandless)
* LUCENE-7468: ASCIIFoldingFilter should not emit duplicated tokens when
preserve original is on. (David Causse via Adrien Grand)
* LUCENE-7484: FastVectorHighlighter failed to highlight SynonymQuery
(Jim Ferenczi via Mike McCandless)
* LUCENE-7476: JapaneseNumberFilter should not invoke incrementToken
on its input after it's exhausted (Andy Hind via Mike McCandless)
* LUCENE-7486: DisjunctionMaxQuery does not work correctly with queries that
return negative scores. (Ivan Provalov, Uwe Schindler, Adrien Grand)
* LUCENE-7491: Suddenly turning on dimensional points for some fields
that already exist in an index but didn't previously index
dimensional points could cause unexpected merge exceptions (Hans
Lund, Mike McCandless)
* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points.
(Hossman)
* LUCENE-7493: FacetCollector.search threw an unexpected exception if
you asked for zero hits but wanted facets (Mahesh via Mike McCandless)
* LUCENE-7505: AnalyzingInfixSuggester returned invalid results when
allTermsRequired is false and context filters are specified (Mike
McCandless)
* LUCENE-7429: AnalyzerWrapper can now modify the normalization chain too and
DelegatingAnalyzerWrapper does the right thing automatically. (Adrien Grand)
* LUCENE-7135: Lucene's check for 32 or 64 bit JVM now works around security
manager blocking access to some properties (Aaron Madlon-Kay via
Mike McCandless)
Improvements
* LUCENE-7439: FuzzyQuery now matches all terms within the specified
edit distance, even if they are short terms (Mike McCandless)
* LUCENE-7496: Better toString for SweetSpotSimilarity (janhoy)
* LUCENE-7520: Highlighter's WeightedSpanTermExtractor shouldn't attempt to expand a MultiTermQuery
when its field doesn't match the field the extraction is scoped to.
(Cao Manh Dat via David Smiley)
Optimizations
* LUCENE-7501: BKDReader should not store the split dimension explicitly in the
1D case. (Adrien Grand)
Other
* LUCENE-7513: Upgrade randomizedtesting to 2.4.0. (Dawid Weiss)
* LUCENE-7452: Block join query exception suggests how to find a doc, which
violates orthogonality requirement. (Mikhail Khludnev)
* LUCENE-7438: Renovate the Benchmark module's support for benchmarking highlighting. All
highlighters are supported via SearchTravRetHighlight. (David Smiley)
Build
* LUCENE-7292: Fix build to use "--release 8" instead of "-release 8" on
Java 9 (this changed with recent EA build b135). (Uwe Schindler)
======================= Lucene 6.2.1 =======================
API Changes
* LUCENE-7436: MinHashFilter's constructor, and some of its default
settings, should be public. (Doug Turnbull via Mike McCandless)
Bug Fixes
* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when
trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one
term. (Thomas Kappler via David Smiley)
* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an
ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs)
with large skips. (yonik)
* LUCENE-7442: MinHashFilter's ctor should validate its args.
(Cao Manh Dat via Steve Rowe)
* LUCENE-7318: Fix backwards compatibility issues around StandardAnalyzer
and its components, introduced with Lucene 6.2.0. The moved classes
were restored in their original packages: LowercaseFilter and StopFilter,
as well as several utility classes. (Uwe Schindler, Mike McCandless)
======================= Lucene 6.2.0 =======================
API Changes
* ScoringWrapperSpans was removed since it had no purpose or effect as of Lucene 5.5.
New Features
* LUCENE-7388: Add point based IntRangeField, FloatRangeField, LongRangeField along with
supporting queries and tests (Nick Knize)
* LUCENE-7381: Add point based DoubleRangeField and RangeFieldQuery for
indexing and querying on Ranges up to 4 dimensions (Nick Knize)
* LUCENE-6968: LSH Filter (Tommaso Teofili, Andy Hind, Cao Manh Dat)
* LUCENE-7302: IndexWriter methods that change the index now return a
long "sequence number" indicating the effective equivalent
single-threaded execution order (Mike McCandless)
* LUCENE-7335: IndexWriter's commit data is now late binding,
recording key/values from a provided iterable based on when the
commit actually takes place (Mike McCandless)
* LUCENE-7287: UkrainianMorfologikAnalyzer is a new dictionary-based
analyzer for the Ukrainian language (Andriy Rysin via Mike
McCandless)
* LUCENE-7373: Directory.renameFile, which did both renaming and fsync
of the directory metadata, has been deprecated; use the new separate
methods Directory.rename and Directory.syncMetaData instead (Robert Muir,
Uwe Schindler, Mike McCandless)
* LUCENE-7355: Added Analyzer#normalize(), which only applies normalization to
an input string. (Adrien Grand)
* LUCENE-7380: Add Polygon.fromGeoJSON for more easily creating
Polygon instances from a standard GeoJSON string (Robert Muir, Mike
McCandless)
* LUCENE-7395: PerFieldSimilarityWrapper requires a default similarity
for calculating query norm and coordination factor in Lucene 6.x.
Lucene 7 will no longer have those factors. (Uwe Schindler, Sascha Markus)
* SOLR-9279: Queries module: new ComparisonBoolFunction base class
(Doug Turnbull via David Smiley)
Bug Fixes
* LUCENE-6662: Fixed potential resource leaks. (Rishabh Patel via Adrien Grand)
* LUCENE-7340: MemoryIndex.toString() could throw NPE; fixed. Renamed to toStringDebug().
(Daniel Collins, David Smiley)
* LUCENE-7382: Fix bug introduced by LUCENE-7355 that used the
wrong default AttributeFactory for new Tokenizers.
(Terry Smith, Uwe Schindler)
* LUCENE-7389: Fix FieldType.setDimensions(...) validation for the dimensionNumBytes
parameter. (Martijn van Groningen)
* LUCENE-7391: Fix performance regression in MemoryIndex's fields() introduced
in Lucene 6. (Steve Mason via David Smiley)
* LUCENE-7395, SOLR-9315: Fix PerFieldSimilarityWrapper to also delegate query
norm and coordination factor using a default similarity added as ctor param.
(Uwe Schindler, Sascha Markus)
* SOLR-9413: Fix analysis/kuromoji's CSVUtil.quoteEscape logic, add TestCSVUtil test.
(AppChecker, Christine Poerschke)
* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup
PositionIncrementAttribute every time. (Mike McCandless, Robert Muir)
Improvements
* LUCENE-7323: Compound file writing now verifies the incoming
sub-files' checkums and segment IDs, to catch hardware issues or
filesytem bugs earlier (Robert Muir, Mike McCandless)
* LUCENE-6766: Index time sorting has graduated from the misc module
to core, is much simpler to use, via
IndexWriter.setIndexSort, and now works with dimensional points.
(Adrien Grand, Mike McCandless)
* LUCENE-5931: Detect when an application tries to reopen an
IndexReader after (illegally) removing the old index and
reindexing (Vitaly Funstein, Robert Muir, Mike McCandless)
* LUCENE-6171: Lucene now passes the StandardOpenOption.CREATE_NEW
option when writing new files so the filesystem enforces our
write-once architecture, possibly catching externally caused
issues sooner (Robert Muir, Mike McCandless)
* LUCENE-7318: StandardAnalyzer has been moved from the analysis
module into core and is now the default analyzer in
IndexWriterConfig (Robert Muir, Mike McCandless)
* LUCENE-7345: RAMDirectory now enforces write-once files as well
(Robert Muir, Mike McCandless)
* LUCENE-7337: MatchNoDocsQuery now scores with 0 normalization factor
and empty boolean queries now rewrite to MatchNoDocsQuery instead of
vice/versa (Jim Ferenczi via Mike McCandless)
* LUCENE-7359: Add equals() and hashCode() to Explanation (Alan Woodward)
* LUCENE-7353: ScandinavianFoldingFilterFactory and
ScandinavianNormalizationFilterFactory now implement MultiTermAwareComponent.
(Adrien Grand)
* LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
control whether to split on whitespace prior to text analysis. Default
behavior remains unchanged: split-on-whitespace=true. (Steve Rowe)
* LUCENE-7276: MatchNoDocsQuery now includes an optional reason for
why it was used (Jim Ferenczi via Mike McCandless)
* LUCENE-7355: AnalyzingQueryParser now only applies the subset of the analysis
chain that is about normalization for range/fuzzy/wildcard queries.
(Adrien Grand)
* LUCENE-7376: Add support for ToParentBlockJoinQuery to fast vector highlighter's
FieldQuery. (Martijn van Groningen)
* LUCENE-7385: Improve/fix assert messages in SpanScorer. (David Smiley)
* LUCENE-7393: Add ICUTokenizer option to parse Myanmar text as syllables instead of words,
because the ICU word-breaking algorithm has some issues. This allows for the previous
tokenization used before Lucene 5. (AM, Robert Muir)
* LUCENE-7409: Changed MMapDirectory's unmapping to work safer, but still with
no guarantees. This uses a store-store barrier and yields the current thread
before unmapping to allow in-flight requests to finish. The new code no longer
uses WeakIdentityMap as it delegates all ByteBuffer reads throgh a new
ByteBufferGuard wrapper that is shared between all ByteBufferIndexInput clones.
(Robert Muir, Uwe Schindler)
Optimizations
* LUCENE-7330, LUCENE-7339: Speed up conjunction queries. (Adrien Grand)
* LUCENE-7356: SearchGroup tweaks. (Christine Poerschke)
* LUCENE-7351: Doc id compression for points. (Adrien Grand)
* LUCENE-7371: Point values are now better compressed using run-length
encoding. (Adrien Grand)
* LUCENE-7311: Cached term queries do not seek the terms dictionary anymore.
(Adrien Grand)
* LUCENE-7396, LUCENE-7399: Faster flush of points.
(Adrien Grand, Mike McCandless)
* LUCENE-7406: Automaton and PrefixQuery tweaks (fewer object (re)allocations).
(Christine Poerschke)
Other
* LUCENE-4787: Fixed some highlighting javadocs. (Michael Dodsworth via Adrien
Grand)
* LUCENE-7334: Update ASM dependency to 5.1. (Uwe Schindler)
* LUCENE-7346: Update forbiddenapis to version 2.2.
(Uwe Schindler)
* LUCENE-7360: Explanation.toHtml() is deprecated. (Alan Woodward)
* LUCENE-7372: Factor out an org.apache.lucene.search.FilterWeight class.
(Christine Poerschke, Adrien Grand, David Smiley)
* LUCENE-7384: Removed ScoringWrapperSpans. And tweaked SpanWeight.buildSimWeight() to
reuse the existing Similarity instead of creating a new one. (David Smiley)
======================= Lucene 6.1.0 =======================
New Features
* LUCENE-7099: Add LatLonDocValuesField.newDistanceSort to the sandbox.
(Robert Muir)
* LUCENE-7140: Add PlanetModel.bisection to spatial3d (Karl Wright via
Mike McCandless)
* LUCENE-7069: Add LatLonPoint.nearest, to find nearest N points to a
provided query point (Mike McCandless)
* LUCENE-7234: Added InetAddressPoint.nextDown/nextUp to easily generate range
queries with excluded bounds. (Adrien Grand)
* LUCENE-7300: The misc module now has a directory wrapper that uses hard-links if
applicable and supported when copying files from another FSDirectory in
Directory#copyFrom. (Simon Willnauer)
API Changes
* LUCENE-7184: Refactor LatLonPoint encoding methods to new GeoEncodingUtils
helper class in core geo package. Also refactors LatLonPointTests to
TestGeoEncodingUtils (Nick Knize)
* LUCENE-7163: refactor GeoRect, Polygon, and GeoUtils tests to geo
package in core (Nick Knize)
* LUCENE-7152: Refactor GeoUtils from lucene-spatial package to
core (Nick Knize)
* LUCENE-7141: Switch OfflineSorter's ByteSequencesReader to
BytesRefIterator (Mike McCandless)
* LUCENE-7150: Spatial3d gets useful APIs to create common shape
queries, matching LatLonPoint. (Karl Wright via Mike McCandless)
* LUCENE-7243: Removed the LeafReaderContext parameter from
QueryCachingPolicy#shouldCache. (Adrien Grand)
Optimizations
* LUCENE-7071: Reduce bytes copying in OfflineSorter, giving ~10%
speedup on merging 2D LatLonPoint values (Mike McCandless)
* LUCENE-7105, LUCENE-7215: Optimize LatLonPoint's newDistanceQuery.
(Robert Muir)
* LUCENE-7097: IntroSorter now recurses to 2 * log_2(count) quicksort
stack depth before switching to heapsort (Adrien Grand, Mike McCandless)
* LUCENE-7115: Speed up FieldCache.CacheEntry toString by setting initial
StringBuilder capacity (Gregory Chanan)
* LUCENE-7147: Improve disjoint check for geo distance query traversal
(Ryan Ernst, Robert Muir, Mike McCandless)
* LUCENE-7153: GeoPointField and LatLonPoint polygon queries now support
multiple polygons and holes, with memory usage independent of
polygon complexity. (Karl Wright, Mike McCandless, Robert Muir)
* LUCENE-7159: Speed up LatLonPoint polygon performance. (Robert Muir, Ryan Ernst)
* LUCENE-7211: Reduce memory & GC for spatial RPT Intersects when the number of
matching docs is small. (Jeff Wartes, David Smiley)
* LUCENE-7235: LRUQueryCache should not take a lock for segments that it will
not cache on anyway. (Adrien Grand)
* LUCENE-7238: Explicitly disable the query cache in MemoryIndex#createSearcher.
(Adrien Grand)
* LUCENE-7237: LRUQueryCache now prefers returning an uncached Scorer than
waiting on a lock. (Adrien Grand)
* LUCENE-7261, LUCENE-7262, LUCENE-7264, LUCENE-7258: Speed up DocIdSetBuilder
(which is used by TermsQuery, multi-term queries and several point queries).
(Adrien Grand, Jeff Wartes, David Smiley)
* LUCENE-7299: Speed up BytesRefHash.sort() using radix sort. (Adrien Grand)
* LUCENE-7306: Speed up points indexing and merging using radix sort.
(Adrien Grand)
Bug Fixes
* LUCENE-7127: Fix corner case bugs in GeoPointDistanceQuery. (Robert Muir)
* LUCENE-7166: Fix corner case bugs in LatLonPoint/GeoPointField bounding box
queries. (Robert Muir)
* LUCENE-7168: Switch to stable encode for geo3d, remove quantization
test leniency, remove dead code (Mike McCandless)
* LUCENE-7301: Multiple doc values updates to the same document within
one update batch could be applied in the wrong order resulting in
the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless)
* LUCENE-7312: Fix geo3d's x/y/z double to int encoding to ensure it always
rounds down (Karl Wright, Mike McCandless)
* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases
where ranges of documents had only a single clause matching while
other ranges had more than one clause matching (Ahmet Arslan,
hossman, Mike McCandless)
* LUCENE-7286: Added support for highlighting SynonymQuery. (Adrien Grand)
* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the
dateline and indexed non-point shapes are much bigger than the heatmap region.
(David Smiley)
* LUCENE-7333: Fix test bug where randomSimpleString() generated a filename
that is a reserved device name on Windows. (Uwe Schindler, Mike McCandless)
Other
* LUCENE-7295: TermAutomatonQuery.hashCode calculates Automaton.toDot().hash,
equivalence relationship replaced with object identity. (Dawid Weiss)
* LUCENE-7277: Make Query.hashCode and Query.equals abstract. (Paul Elschot,
Dawid Weiss)
* LUCENE-7174: Upgrade randomizedtesting to 2.3.4. (Uwe Schindler, Dawid Weiss)
* LUCENE-7205: Remove repeated nl.getLength() calls in
(Boolean|DisjunctionMax|FuzzyLikeThis)QueryBuilder. (Christine Poerschke)
* LUCENE-7210: Make TestCore*Parser's analyzer choice override-able
(Christine Poerschke, Daniel Collins)
* LUCENE-7263: Make queryparser/xml/CoreParser's SpanQueryBuilderFactory
accessible to deriving classes. (Daniel Collins via Christine Poerschke)
* SOLR-9109/SOLR-9121: Allow specification of a custom Ivy settings file via system
property "ivysettings.xml". (Misha Dmitriev, Christine Poerschke, Uwe Schindler, Steve Rowe)
* LUCENE-7206: Improve the ToParentBlockJoinQuery's explain by including the explain
of the best matching child doc. (Ilya Kasnacheev, Jeff Evans via Martijn van Groningen)
* LUCENE-7307: Add getters to the PointInSetQuery and PointRangeQuery queries.
(Martijn van Groningen, Adrien Grand)
Build
* LUCENE-7292: Use '-release' instead of '-source/-target' during
compilation on Java 9+ to ensure real cross-compilation.
(Uwe Schindler)
* LUCENE-7296: Update forbiddenapis to version 2.1.
(Uwe Schindler)
======================= Lucene 6.0.1 =======================
New Features
* LUCENE-7278: Spatial-extras DateRangePrefixTree's Calendar is now configurable, to
e.g. clear the Gregorian Change Date. Also, toString(cal) is now identical to
DateTimeFormatter.ISO_INSTANT. (David Smiley)
Bug Fixes
* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations
should delegate to the wrapped weight. (Martijn van Groningen)
* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand)
* LUCENE-7232: Fixed InetAddressPoint.newPrefixQuery, which was generating an
incorrect query when the prefix length was not a multiple of 8. (Adrien Grand)
* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException
on some valid inputs (Mike McCandless)
* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll()
that led to IllegalStateException being thrown when nothing was wrong.
(David Smiley, yonik)
* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders
match the underlying queries' (lower|upper)Term optionality logic.
(Kaneshanathan Srivisagan, Christine Poerschke)
* LUCENE-7257: Fixed PointValues#size(IndexReader, String), docCount,
minPackedValue and maxPackedValue to skip leaves that do not have points
rather than raising an IllegalStateException. (Adrien Grand)
* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan
Woodward)
* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term
phrase queries. (Eva Popenda, Alan Woodward)
* LUCENE-7293: Don't try to highlight GeoPoint queries (Britta Weber,
Nick Knize, Mike McCandless, Uwe Schindler)
Documentation
* LUCENE-7223: Improve XXXPoint javadocs to make it clear that you
should separately add StoredField if you want to retrieve these
field values at search time (Greg Huber, Robert Muir, Mike McCandless)
======================= Lucene 6.0.0 =======================
System Requirements
* LUCENE-5950: Move to Java 8 as minimum Java version.
(Ryan Ernst, Uwe Schindler)
* LUCENE-6069: Lucene Core now gets compiled with Java 8 "compact1" profile,
all other modules with "compact2". (Robert Muir, Uwe Schindler)
New Features
* LUCENE-6631: Lucene Document classification (Tommaso Teofili, Alessandro Benedetti)
* LUCENE-6747: FingerprintFilter is a TokenFilter that outputs a single
token which is a concatenation of the sorted and de-duplicated set of
input tokens. Useful for normalizing short text in clustering/linking
tasks. (Mark Harwood, Adrien Grand)
* LUCENE-5735: NumberRangePrefixTreeStrategy now includes interval/range faceting
for counting ranges that align with the underlying terms as defined by the
NumberRangePrefixTree (e.g. familiar date units like days). (David Smiley)
* LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field
length computations, to avoid skew from documents that don't have the field.
(Ahmet Arslan via Robert Muir)
* LUCENE-6758: Use docCount+1 for DefaultSimilarity's IDF, so that queries
containing nonexistent fields won't screw up querynorm. (Terry Smith, Robert Muir)
* SOLR-7876: The QueryTimeout interface now has a isTimeoutEnabled method
that can return false to exit from ExitableDirectoryReader wrapping at
the point fields() is called. (yonik)
* LUCENE-6825: Add low-level support for block-KD trees (Mike McCandless)
* LUCENE-6852, LUCENE-6975: Add support for points (dimensionally
indexed values) to index, document and codec APIs, including a
simple text implementation. (Mike McCandless)
* LUCENE-6861: Create Lucene60Codec, supporting points.
(Mike McCandless)
* LUCENE-6879: Allow to define custom CharTokenizer instances without
subclassing using Java 8 lambdas or method references. (Uwe Schindler)
* LUCENE-6881: Cutover all BKD implementations to points
(Mike McCandless)
* LUCENE-6837: Add N-best output support to JapaneseTokenizer.
(Hiroharu Konno via Christian Moen)
* LUCENE-6962: Add per-dimension min/max to points
(Mike McCandless)
* LUCENE-6975: Add ExactPointQuery, to match a single N-dimensional
point (Robert Muir, Mike McCandless)
* LUCENE-6989: Add preliminary support for MMapDirectory unmapping in Java 9.
(Uwe Schindler, Chris Hegarty, Peter Levart)
* LUCENE-7040: Upgrade morfologik-stemming to version 2.1.0.
(Dawid Weiss)
* LUCENE-7048: Add XXXPoint.newSetQuery, to create a query that
efficiently matches all documents containing any of the specified
point values. This is the analog of TermsQuery, but for points
instead. (Adrien Grand, Robert Muir, Mike McCandless)
API Changes
* LUCENE-7094: BBoxStrategy and PointVectorStrategy now support
PointValues (in addition to legacy numeric trie). Their APIs
were changed a little and also made more consistent. PointValues/Trie
is optional, DocValues is optional, stored value is optional.
(Nick Knize, David Smiley)
* LUCENE-6067: Accountable.getChildResources has a default
implementation returning the empty list. (Robert Muir)
* LUCENE-6583: FilteredQuery has been removed. Instead, you can construct a
BooleanQuery with one MUST clause for the query, and one FILTER clause for
the filter. (Adrien Grand)
* LUCENE-6651: AttributeImpl#reflectWith(AttributeReflector) was made
abstract and has no reflection-based default implementation anymore.
(Uwe Schindler)
* LUCENE-6706: PayloadTermQuery and PayloadNearQuery have been removed.
Instead, use PayloadScoreQuery to wrap any SpanQuery. (Alan Woodward)
* LUCENE-6829: OfflineSorter, and the classes that use it (suggesters,
hunspell) now do all temporary file IO via Directory instead of
directly through java's temp dir. Directory.createTempOutput
creates a uniquely named IndexOutput, and the new
IndexOutput.getName returns its name (Dawid Weiss, Robert Muir, Mike
McCandless)
* LUCENE-6917: Deprecate and rename NumericXXX classes to
LegacyNumericXXX in favor of points (Mike McCandless)
* LUCENE-6947: SortField.missingValue is now protected. You can read its
value using the new SortField.getMissingValue getter. (Adrien Grand)
* LUCENE-7028: Remove duplicate method in LegacyNumericUtils.
(Uwe Schindler)
* LUCENE-7052, LUCENE-7053: Remove custom comparators from BytesRef
class and solely use natural byte[] comparator throughout codebase.
This also simplifies API of BytesRefHash. It also replaces the natural
comparator in ArrayUtil by Java 8's Comparator#naturalOrder().
(Mike McCandless, Uwe Schindler, Robert Muir)
* LUCENE-7060: Update Spatial4j to 0.6. The package com.spatial4j.core
is now org.locationtech.spatial4j. (David Smiley)
* LUCENE-7058: Add getters to various Query implementations (Guillaume Smet via
Alan Woodward)
* LUCENE-7064: MultiPhraseQuery is now immutable and should be constructed
with MultiPhraseQuery.Builder. (Luc Vanlerberghe via Adrien Grand)
* LUCENE-7072: Geo3DPoint always uses WGS84 planet model.
(Robert Muir, Mike McCandless)
* LUCENE-7056: Geo3D classes are in different packages now. (David Smiley)
* LUCENE-6952: These classes are now abstract: FilterCodecReader, FilterLeafReader,
FilterCollector, FilterDirectory. And some Filter* classes in
lucene-test-framework too. (David Smiley)
* SOLR-8867: FunctionValues.getRangeScorer now takes a LeafReaderContext instead
of an IndexReader, and avoids matching documents without a value in the field
for numeric fields. (yonik)
Optimizations
* LUCENE-6891: Use prefix coding when writing points in
each leaf block in the default codec, to reduce the index
size (Mike McCandless)
* LUCENE-6901: Optimize points indexing: use faster
IntroSorter instead of InPlaceMergeSorter, and specialize 1D
merging to merge sort the already sorted segments instead of
re-indexing (Mike McCandless)
* LUCENE-6793: LegacyNumericRangeQuery.hashCode() is now less subject to hash
collisions. (J.B. Langston via Adrien Grand)
* LUCENE-7050: TermsQuery is now cached more aggressively by the default
query caching policy. (Adrien Grand)
* LUCENE-7066: PointRangeQuery got optimized for the case that all documents
have a value and all points from the segment match. (Adrien Grand)
Changes in Runtime Behavior
* LUCENE-6789: IndexSearcher's default Similarity is changed to BM25Similarity.
Use ClassicSimilarity to get the old vector space DefaultSimilarity. (Robert Muir)
* LUCENE-6886: Reserve the .tmp file name extension for temp files,
and codec components are no longer allowed to use this extension
(Robert Muir, Mike McCandless)
* LUCENE-6835: Directory.listAll now returns entries in sorted order,
to not leak platform-specific behavior, and "retrying file deletion"
is now the responsibility of Directory.deleteFile, not the caller.
(Robert Muir, Mike McCandless)
Tests
* LUCENE-7009: Add expectThrows utility to LuceneTestCase. This uses a lambda
expression to encapsulate a statement that is expected to throw an exception.
(Ryan Ernst)
Bug Fixes
* LUCENE-7065: Fix the explain for the global ordinals join query. Before the
explain would also indicate that non matching documents would match.
On top of that with score mode average, the explain would fail with a NPE.
(Martijn van Groningen)
* LUCENE-7101: OfflineSorter had O(N^2) merge cost, and used too many
temporary file descriptors, for large sorts (Mike McCandless)
* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for
Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe)
* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance
implementation (Karl Wright via Mike McCandless)
* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called
on queries that could not be extracted. (Adrien Grand)
* LUCENE-7126: Remove GeoPointDistanceRangeQuery. This query was implemented
with boolean NOT, and incorrect for multi-valued documents. (Robert Muir)
* LUCENE-7158: Consistently use earth's WGS84 mean radius wherever our
geo search implementations approximate the earth as a sphere (Karl
Wright via Mike McCandless)
Other
* LUCENE-7035: Upgrade icu4j to 56.1/unicode 8. (Robert Muir)
* LUCENE-7087: Let MemoryIndex#fromDocument(...) accept 'Iterable<? extends IndexableField>'
as document instead of 'Document'. (Martijn van Groningen)
* LUCENE-7091: Add doc values support to MemoryIndex
(Martijn van Groningen, David Smiley)
* LUCENE-7093: Add point values support to MemoryIndex
(Martijn van Groningen, Mike McCandless)
* LUCENE-7095: Add point values support to the numeric field query time join.
(Martijn van Groningen, Mike McCandless)
======================= Lucene 5.5.5 =======================
Changes in Runtime Behavior
* Resolving of external entities in queryparser/xml/CoreParser is disallowed
by default. See SOLR-11477 for details.
Bug Fixes
* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup
PositionIncrementAttribute every time. (Mike McCandless, Robert Muir)
* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
======================= Lucene 5.5.4 =======================
Bug Fixes
* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when
trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one
term. (Thomas Kappler via David Smiley)
* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery
with a TermContext is cached. (Adrien Grand)
* LUCENE-7647: Made stored fields reclaim native memory more aggressively when
configured with BEST_COMPRESSION. This could otherwise result in out-of-memory
issues. (Adrien Grand)
* LUCENE-7562: CompletionFieldsConsumer sometimes throws
NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless)
* LUCENE-7547: JapaneseTokenizerFactory was failing to close the
dictionary file it opened (Markus via Mike McCandless)
* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points.
(Hossman)
* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an
ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs)
with large skips. (yonik)
* LUCENE-7570: IndexWriter may deadlock if a commit is running while
there are too many merges running and one of the merges hits a
tragic exception (Joey Echeverria via Mike McCandless)
Other
* LUCENE-6989: Backport MMapDirectory's unmapping code from Lucene 6.4 to use
MethodHandles. This allows it to work with Java 9 (EA build 150 and later).
(Uwe Schindler)
Build
* LUCENE-7543: Make changes-to-html target an offline operation, by moving the
Lucene and Solr DOAP RDF files into the Git source repository under
dev-tools/doap/ and then pulling release dates from those files, rather than
from JIRA. (Mano Kovacs, hossman, Steve Rowe)
* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9
build 148+. Also update JGit version for working-copy checks. This does not
fix all issues with Java 9, but allows to build the distribution.
(Uwe Schindler)
* LUCENE-7651: Backport (Lucene 6.4.1) fix for Java 8u121 to allow documentation
build to inject "Google Code Prettify" without adding Javascript to Javadocs's
-bottom parameter. Unfortunately, this fix disables Prettify if Javadocs are
built with Java 7, as there is no generic way in Java 7 to inject Javascript
without breaking Java 8 (and possible paid Java 7 security updates). This
fix also updates Prettify to latest version to work around a Google Chrome
issue. (Uwe Schindler)
======================= Lucene 5.5.3 =======================
(No Changes)
======================= Lucene 5.5.2 =======================
Bug Fixes
* LUCENE-7065: Fix the explain for the global ordinals join query. Before the
explain would also indicate that non matching documents would match.
On top of that with score mode average, the explain would fail with a NPE.
(Martijn van Groningen)
* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for
Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe)
* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance
implementation (Karl Wright via Mike McCandless)
* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations
should delegate to the wrapped weight. (Martijn van Groningen)
* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException
on some valid inputs (Mike McCandless)
* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders
match the underlying queries' (lower|upper)Term optionality logic.
(Kaneshanathan Srivisagan, Christine Poerschke)
* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan
Woodward)
* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term
phrase queries. (Eva Popenda, Alan Woodward)
* LUCENE-7301: Multiple doc values updates to the same document within
one update batch could be applied in the wrong order resulting in
the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless)
* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases
where ranges of documents had only a single clause matching while
other ranges had more than one clause matching (Ahmet Arslan,
hossman, Mike McCandless)
* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the
dateline and indexed non-point shapes are much bigger than the heatmap region.
(David Smiley)
======================= Lucene 5.5.1 =======================
Bug fixes
* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called
on queries that could not be extracted. (Adrien Grand)
* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll()
that led to IllegalStateException being thrown when nothing was wrong.
(David Smiley, yonik)
* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand)
======================= Lucene 5.5.0 =======================
New Features
* LUCENE-5868: JoinUtil.createJoinQuery(..,NumericType,..) query-time join
for LONG and INT fields with NUMERIC and SORTED_NUMERIC doc values.
(Alexey Zelin via Mikhail Khludnev)
* LUCENE-6939: Add exponential reciprocal scoring to
BlendedInfixSuggester, to even more strongly favor suggestions that
match closer to the beginning (Arcadius Ahouansou via Mike McCandless)
* LUCENE-6958: Improved CustomAnalyzer to take class references to factories
as alternative to their SPI name. This enables compile-time safety when
defining analyzer's components. (Uwe Schindler, Shai Erera)
* LUCENE-6818, LUCENE-6986: Add DFISimilarity implementing the divergence
from independence model. (Ahmet Arslan via Robert Muir)
* SOLR-4619: Added removeAllAttributes() to AttributeSource, which removes
all previously added attributes.
* LUCENE-7010: Added MergePolicyWrapper to allow easy wrapping of other policies.
(Shai Erera)
API Changes
* LUCENE-6997: refactor sandboxed GeoPointField and query classes to lucene-spatial
module under new lucene.spatial.geopoint package (Nick Knize)
* LUCENE-6908: GeoUtils static relational methods have been refactored to new
GeoRelationUtils and now correctly handle large irregular rectangles, and
pole crossing distance queries. (Nick Knize)
* LUCENE-6900: Grouping sortWithinGroup variables used to allow null to mean
Sort.RELEVANCE. Null is no longer permitted. (David Smiley)
* LUCENE-6919: The Scorer class has been refactored to expose an iterator
instead of extending DocIdSetIterator. asTwoPhaseIterator() has been renamed
to twoPhaseIterator() for consistency. (Adrien Grand)
* LUCENE-6973: TeeSinkTokenFilter no longer accepts a SinkFilter (the latter
has been removed). If you wish to filter the sinks, you can wrap them with
any other TokenFilter (e.g. a FilteringTokenFilter). Also, you can no longer
add a SinkTokenStream to an existing TeeSinkTokenFilter. If you need to
share multiple streams with a single sink, chain them with multiple
TeeSinkTokenFilters.
DateRecognizerSinkFilter was renamed to DateRecognizerFilter and moved under
analysis/common. TokenTypeSinkFilter was removed (use TypeTokenFilter instead).
TokenRangeSinkFilter was removed. (Shai Erera, Uwe Schindler)
* LUCENE-6980: Default applyAllDeletes to true when opening
near-real-time readers (Mike McCandless)
* LUCENE-6981: SpanQuery.getTermContexts() helper methods are now public, and
SpanScorer has a public getSpans() method. (Alan Woodward)
* LUCENE-6932: IndexInput.seek implementations now throw EOFException
if you seek beyond the end of the file (Adrien Grand, Mike McCandless)
* LUCENE-6988: IndexableField.tokenStream() no longer throws IOException
(Alan Woodward)
* LUCENE-7028: Deprecate a duplicate method in NumericUtils.
(Uwe Schindler)
Optimizations
* LUCENE-6930: Decouple GeoPointField from NumericType by using a custom
and efficient GeoPointTokenStream and TermEnum designed for GeoPoint prefix
terms. (Nick Knize)
* LUCENE-6951: Improve GeoPointInPolygonQuery using point orientation based
line crossing algorithm, and adding result for multi-value docs when least
1 point satisfies polygon criteria. (Nick Knize)
* LUCENE-6889: BooleanQuery.rewrite now performs some query optimization, in
particular to rewrite queries that look like: "+*:* #filter" to a
"ConstantScore(filter)". (Adrien Grand)
* LUCENE-6912: Grouping's Collectors now calculate a response to needsScores()
instead of always 'true'. (David Smiley)
* LUCENE-6815: DisjunctionScorer now advances two-phased iterators lazily,
stopping to evaluate them as soon as a single one matches. The other iterators
will be confirmed lazily when computing score() or freq(). (Adrien Grand)
* LUCENE-6926: MUST_NOT clauses now use the match cost API to run the slow bits
last whenever possible. (Adrien Grand)
* LUCENE-6944: BooleanWeight no longer creates sub-scorers if BS1 is not
applicable. (Adrien Grand)
* LUCENE-6940: MUST_NOT clauses execute faster, especially when they are sparse.
(Adrien Grand)
* LUCENE-6470: Improve efficiency of TermsQuery constructors. (Robert Muir)
Bug Fixes
* LUCENE-6976: BytesRefTermAttributeImpl.copyTo NPE'ed if BytesRef was null.
Added equals & hashCode, and a new test for these things. (David Smiley)
* LUCENE-6932: RAMDirectory's IndexInput was failing to throw
EOFException in some cases (Stéphane Campinas, Adrien Grand via Mike
McCandless)
* LUCENE-6896: Don't treat the smallest possible norm value as an infinitely
long document in SimilarityBase or BM25Similarity. Add more warnings to sims
that will not work well with extreme tf values. (Ahmet Arslan, Robert Muir)
* LUCENE-6984: SpanMultiTermQueryWrapper no longer modifies its wrapped query.
(Alan Woodward, Adrien Grand)
* LUCENE-6998: Fix a couple places to better detect truncated index files
as corruption. (Robert Muir, Mike McCandless)
* LUCENE-7002: Fixed MultiCollector to not throw a NPE if setScorer is called
after one of the sub collectors is done collecting. (John Wang, Adrien Grand)
* LUCENE-7027: Fixed NumericTermAttribute to not throw IllegalArgumentException
after NumericTokenStream was exhausted. (Uwe Schindler, Lee Hinman,
Mike McCandless)
* LUCENE-7018: Fix GeoPointTermQueryConstantScoreWrapper to add document on
first GeoPointField match. (Nick Knize)
* LUCENE-7019: Add two-phase iteration to GeoPointTermQueryConstantScoreWrapper.
(Robert Muir via Nick Knize)
* LUCENE-6989: Improve MMapDirectory's unmapping checks to catch more non-working
cases. The unmap-hack does not yet work with recent Java 9. Official support
will come with Lucene 6. (Uwe Schindler)
Other
* LUCENE-6924: Upgrade randomizedtesting to 2.3.2. (Dawid Weiss)
* LUCENE-6920: Improve custom function checks in expressions module
to use MethodHandles and work without extra security privileges.
(Uwe Schindler, Robert Muir)
* LUCENE-6921: Fix SPIClassIterator#isParentClassLoader to don't
require extra permissions. (Uwe Schindler)
* LUCENE-6923: Fix RamUsageEstimator to access private fields inside
AccessController block for computing size. (Robert Muir)
* LUCENE-6907: make TestParser extendable, rename test/.../xml/
NumericRangeQueryQuery.xml to NumericRangeQuery.xml
(Christine Poerschke)
* LUCENE-6925: add ForceMergePolicy class in test-framework
(Christine Poerschke)
* LUCENE-6945: factor out TestCorePlus(Queries|Extensions)Parser from
TestParser, rename TestParser to TestCoreParser (Christine Poerschke)
* LUCENE-6949: fix (potential) resource leak in SynonymFilterFactory
(https://scan.coverity.com/projects/5620 CID 120656)
(Christine Poerschke, Coverity Scan (via Rishabh Patel))
* LUCENE-6961: Improve Exception handling in AnalysisFactories /
AnalysisSPILoader: Don't wrap exceptions occuring in factory's
ctor inside InvocationTargetException. (Uwe Schindler)
* LUCENE-6965: Expression's JavascriptCompiler now throw ParseException
with bad function names or bad arity instead of IllegalArgumentException.
(Tomás Fernández Löbbe, Uwe Schindler, Ryan Ernst)
* LUCENE-6964: String-based signatures in JavascriptCompiler replaced
with better compile-time-checked MethodType; generated class files
are no longer marked as synthetic. (Uwe Schindler)
* LUCENE-6978: Refactor several code places that lookup locales
by string name to use BCP47 locale tag instead. LuceneTestCase
now also prints locales on failing tests this way.
Locale#forLanguageTag() and Locale#toString() were placed on list
of forbidden signatures. (Uwe Schindler, Robert Muir)
* LUCENE-6988: You can now add IndexableFields directly to a MemoryIndex,
and create a MemoryIndex from a lucene Document. (Alan Woodward)
* LUCENE-7005: TieredMergePolicy tweaks (>= vs. >, @see get vs. set)
(Christine Poerschke)
* LUCENE-7006: increase BaseMergePolicyTestCase use (TestNoMergePolicy and
TestSortingMergePolicy now extend it, TestUpgradeIndexMergePolicy added)
(Christine Poerschke)
======================= Lucene 5.4.1 =======================
Bug Fixes
* LUCENE-6910: fix 'if ... > Integer.MAX_VALUE' check in
(Binary|Numeric)DocValuesFieldUpdates.merge
(https://scan.coverity.com/projects/5620 CID 119973 and CID 120081)
(Christine Poerschke, Coverity Scan (via Rishabh Patel))
* LUCENE-6946: SortField.equals now takes the missingValue parameter into
account. (Adrien Grand)
* LUCENE-6918: LRUQueryCache.onDocIdSetEviction is only called when at least
one DocIdSet is being evicted. (Adrien Grand)
* LUCENE-6929: Fix SpanNotQuery rewriting to not drop the pre/post parameters.
(Tim Allison via Adrien Grand)
* LUCENE-6950: Fix FieldInfos handling of UninvertingReader, e.g. do not
hide the true docvalues update generation or other properties.
(Ishan Chattopadhyaya via Robert Muir)
* LUCENE-6948: Fix ArrayIndexOutOfBoundsException in PagedBytes$Reader.fill
by removing an unnecessary long-to-int cast.
(Michael Lawley via Christine Poerschke)
* SOLR-7865: BlendedInfixSuggester was returning too many results
(Arcadius Ahouansou via Mike McCandless)
* LUCENE-6970: Fixed off-by-one error in Lucene54DocValuesProducer that could
potentially corrupt doc values. (Adrien Grand)
* LUCENE-2229: Fix Highlighter's SimpleSpanFragmenter when multiple adjacent
stop words following a span can unduly make the fragment way too long.
(Elmer Garduno, Lukhnos Liu via David Smiley)
======================= Lucene 5.4.0 =======================
New Features
* LUCENE-6875: New Serbian Filter. (Nikola Smolenski via Robert Muir,
Dawid Weiss)
* LUCENE-6720: New FunctionRangeQuery wrapper around ValueSourceScorer
(returned from ValueSource/FunctionValues.getRangeScorer()). (David Smiley)
* LUCENE-6724: Add utility APIs to GeoHashUtils to compute neighbor
geohash cells (Nick Knize via Mike McCandless).
* LUCENE-6737: Add DecimalDigitFilter which folds unicode digits to basic latin.
(Robert Muir)
* LUCENE-6699: Add integration of BKD tree and geo3d APIs to give
fast, very accurate query to find all indexed points within an
earth-surface shape (Karl Wright, Mike McCandless)
* LUCENE-6838: Added IndexSearcher#getQueryCache and #getQueryCachingPolicy.
(Adrien Grand)
* LUCENE-6844: PayloadScoreQuery can include or exclude underlying span scores
from its score calculations (Bill Bell, Alan Woodward)
* LUCENE-6778: Add GeoPointDistanceRangeQuery, to search for points
within a "ring" (beyond a minimum distance and below a maximum
distance) (Nick Knize via Mike McCandless)
* LUCENE-6874: Add a new UnicodeWhitespaceTokenizer to analysis/common
that uses Unicode character properties extracted from ICU4J to tokenize
text on whitespace. This tokenizer will split on non-breaking
space (NBSP), too. (David Smiley, Uwe Schindler, Steve Rowe)
API Changes
* LUCENE-6590: Query.setBoost(), Query.getBoost() and Query.clone() are gone.
In order to apply boosts, you now need to wrap queries in a BoostQuery.
(Adrien Grand)
* LUCENE-6716: SpanPayloadCheckQuery now takes a List<BytesRef> rather than
a Collection<byte[]>. (Alan Woodward)
* LUCENE-6489: The various span payload queries have been moved to the queries
submodule, and PayloadSpanUtil is now in sandbox. (Alan Woodward)
* LUCENE-6650: The spatial module no longer uses Filter in any way. All
spatial Filters are now subclass Query. The spatial heatmap/facet API
now accepts a Bits parameter to filter counts. (David Smiley, Adrien Grand)
* LUCENE-6803: Deprecate sandbox Regexp Query. (Uwe Schindler)
* LUCENE-6301: org.apache.lucene.search.Filter is now deprecated. You should use
Query objects instead of Filters, and the BooleanClause.Occur.FILTER clause in
order to let Lucene know that a Query should be used for filtering but not
scoring.
* LUCENE-6939: SpanOrQuery.addClause is now deprecated, clauses should all be
provided at construction time. (Paul Elschot via Adrien Grand)
* LUCENE-6855: CachingWrapperQuery is deprecated and will be removed in 6.0.
(Adrien Grand)
* LUCENE-6870: DisjunctionMaxQuery#add is now deprecated, clauses should all be
provided at construction time. (Adrien Grand)
* LUCENE-6884: Analyzer.tokenStream() and Tokenizer.setReader() are no longer
declared as throwing IOException. (Alan Woodward)
* LUCENE-6849: Expose IndexWriter.flush() method, to move all
in-memory segments to disk without opening a near-real-time reader
nor calling fsync (Robert Muir, Simon Willnauer, Mike McCandless)
* LUCENE-6911: Add correct StandardQueryParser.getMultiFields() method,
deprecate no-op StandardQueryParser.getMultiFields(CharSequence[]) method.
(Christine Poerschke, Mikhail Khludnev, Coverity Scan (via Rishabh Patel))
Optimizations
* LUCENE-6708: TopFieldCollector does not compute the score several times on the
same document anymore. (Adrien Grand)
* LUCENE-6720: ValueSourceScorer, returned from
FunctionValues.getRangeScorer(), now uses TwoPhaseIterator. (David Smiley)
* LUCENE-6756: MatchAllDocsQuery now has a dedicated BulkScorer for better
performance when used as a top-level query. (Adrien Grand)
* LUCENE-6746: DisjunctionMaxQuery, BoostingQuery and BoostedQuery now create
sub weights through IndexSearcher so that they can be cached. (Adrien Grand)
* LUCENE-6754: Optimized IndexSearcher.count for the cases when it can use
index statistics instead of collecting all matches. (Adrien Grand)
* LUCENE-6773: Nested conjunctions now iterate over documents as if clauses
were all at the same level. (Adrien Grand)
* LUCENE-6777: Reuse BytesRef when visiting term ranges in
GeoPointTermsEnum to reduce GC pressure (Nick Knize via Mike
McCandless)
* LUCENE-6779: Reduce memory allocated by CompressingStoredFieldsWriter to write
strings larger than 64kb by an amount equal to string's utf8 size.
(Dawid Weiss, Robert Muir, shalin)
* LUCENE-6850: Optimize BooleanScorer for sparse clauses. (Adrien Grand)
* LUCENE-6840: Ordinal indexes for SORTED_SET/SORTED_NUMERIC fields and
addresses for BINARY fields are now stored on disk instead of in memory.
(Adrien Grand)
* LUCENE-6878: Speed up TopDocs.merge. (Daniel Jelinski via Adrien Grand)
* LUCENE-6885: StandardDirectoryReader (initialCapacity) tweaks
(Christine Poerschke)
* LUCENE-6863: Optimized storage requirements of doc values fields when less
than 1% of documents have a value. (Adrien Grand)
* LUCENE-6892: various lucene.index initialCapacity tweaks
(Christine Poerschke)
* LUCENE-6276: Added TwoPhaseIterator.matchCost() which allows to confirm the
least costly TwoPhaseIterators first. (Paul Elschot via Adrien Grand)
* LUCENE-6898: In the default codec, the last stored field value will not
be fully read from disk if the supplied StoredFieldVisitor doesn't want it.
So put your largest text field value last to benefit. (David Smiley)
* LUCENE-6909: Remove unnecessary synchronized from
FacetsConfig.getDimConfig for better concurrency (Sanne Grinovero
via Mike McCandless)
* SOLR-7730: Speed up SlowCompositeReaderWrapper.getSortedSetDocValues() by
avoiding merging FieldInfos just to check doc value type.
(Paul Vasilyev, Yuriy Pakhomov, Mikhail Khludnev, yonik)
Bug Fixes
* LUCENE-6905: Unwrap center longitude for dateline crossing
GeoPointDistanceQuery. (Nick Knize)
* LUCENE-6817: ComplexPhraseQueryParser.ComplexPhraseQuery does not display
slop in toString(). (Ahmet Arslan via Dawid Weiss)
* LUCENE-6730: Hyper-parameter c is ignored in term frequency NormalizationH1.
(Ahmet Arslan via Robert Muir)
* LUCENE-6742: Lovins & Finnish implementation of SnowballFilter was
fixed to behave exactly as specified. A bug in the snowball compiler
caused differences in output of the filter in comparison to the original
test data. In addition, the performance of those filters was improved
significantly. (Uwe Schindler, Robert Muir)
* LUCENE-6783: Removed side effects from FuzzyLikeThisQuery.rewrite.
(Adrien Grand)
* LUCENE-6776: Fix geo3d math to handle randomly squashed planet
models (Karl Wright via Mike McCandless)
* LUCENE-6792: Fix TermsQuery.toString() to work with binary terms.
(Ruslan Muzhikov, Robert Muir)
* LUCENE-5503: When Highlighter's WeightedSpanTermExtractor converts a
PhraseQuery to an equivalent SpanQuery, it would sometimes use a slop that is
too low (no highlight) or determine inOrder wrong.
(Tim Allison via David Smiley)
* LUCENE-6790: Fix IndexWriter thread safety when one thread is
handling a tragic exception but another is still committing (Mike
McCandless)
* LUCENE-6810: Upgrade to Spatial4j 0.5 -- fixes some edge-case bugs in the
spatial module. See https://github.com/locationtech/spatial4j/blob/master/CHANGES.md
(David Smiley)
* LUCENE-6813: OfflineSorter no longer removes its output Path up
front, and instead opens it for write with the
StandardCopyOption.REPLACE_EXISTING to overwrite any prior file, so
that callers can safely use Files.createTempFile for the output.
This change also fixes OfflineSorter's default temp directory when
running tests to use mock filesystems so e.g. we detect file handle
leaks (Dawid Weiss, Robert Muir, Mike McCandless)
* LUCENE-6813: RangeTreeWriter was failing to close all file handles
it opened, leading to intermittent failures on Windows (Dawid Weiss,
Robert Muir, Mike McCandless)
* LUCENE-6826: Fix ClassCastException when merging a field that has no
terms because they were filtered out by e.g. a FilterCodecReader
(Trejkaz via Mike McCandless)
* LUCENE-6823: LocalReplicator should use System.nanoTime as its clock
source for checking for expiration (Ishan Chattopadhyaya via Mike
McCandless)
* LUCENE-6856: The Weight wrapper used by LRUQueryCache now delegates to the
original Weight's BulkScorer when applicable. (Adrien Grand)
* LUCENE-6858: Fix ContextSuggestField to correctly wrap token stream
when using CompletionAnalyzer. (Areek Zillur)
* LUCENE-6872: IndexWriter handles any VirtualMachineError, not just OOM,
as tragic. (Robert Muir)
* LUCENE-6814: PatternTokenizer no longer hangs onto heap sized to the
maximum input string it's ever seen, which can be a large memory
"leak" if you tokenize large strings with many threads across many
indices (Alex Chow via Mike McCandless)
* LUCENE-6888: Explain output of map() function now also prints default value (janhoy)
Other
* LUCENE-6899: Upgrade randomizedtesting to 2.3.1. (Dawid Weiss)
* LUCENE-6478: Test execution can hang with java.security.debug. (Dawid Weiss)
* LUCENE-6862: Upgrade of RandomizedRunner to version 2.2.0. (Dawid Weiss)
* LUCENE-6857: Validate StandardQueryParser with NOT operator
with-in parantheses. (Jigar Shah via Dawid Weiss)
* LUCENE-6827: Use explicit capacity ArrayList instead of a LinkedList
in MultiFieldQueryNodeProcessor. (Dawid Weiss).
* LUCENE-6812: Upgrade RandomizedTesting to 2.1.17. (Dawid Weiss)
* LUCENE-6174: Improve "ant eclipse" to select right JRE for building.
(Uwe Schindler, Dawid Weiss)
* LUCENE-6417, LUCENE-6830: Upgrade ANTLR used in expressions module
to version 4.5.1-1. (Jack Conradson, Uwe Schindler)
* LUCENE-6729: Upgrade ASM used in expressions module to version 5.0.4.
(Uwe Schindler)
* LUCENE-6738: remove IndexWriterConfig.[gs]etIndexingChain
(Christine Poerschke)
* LUCENE-6755: more tests of ToChildBlockJoinScorer.advance (hossman)
* LUCENE-6571: fix some private access level javadoc errors and warnings
(Cao Manh Dat, Christine Poerschke)
* LUCENE-6768: AbstractFirstPassGroupingCollector.groupSort private member
is not needed. (Christine Poerschke)
* LUCENE-6761: MatchAllDocsQuery's Scorers do not expose approximations
anymore. (Adrien Grand)
* LUCENE-6775, LUCENE-6833: Improved MorfologikFilterFactory to allow
loading of custom dictionaries from ResourceLoader. Upgraded
Morfologik to version 2.0.1. The 'dictionary' attribute has been
reverted back and now points at the dictionary resource to be
loaded instead of the default Polish dictionary.
(Uwe Schindler, Dawid Weiss)
* LUCENE-6797: Make GeoCircle an interface and use a factory to create
it, to eventually handle degenerate cases (Karl Wright via Mike
McCandless)
* LUCENE-6800: Use XYZSolidFactory to create XYZSolids (Karl Wright
via Mike McCandless)
* LUCENE-6798: Geo3d now models degenerate (too tiny) circles as a
single point (Karl Wright via Mike McCandless)
* LUCENE-6770: Add javadocs that FSDirectory canonicalizes the path.
(Uwe Schindler, Vladimir Kuzmin)
* LUCENE-6795: Fix various places where code used
AccessibleObject#setAccessible() without a privileged block. Code
without a hard requirement to do reflection were rewritten. This
makes Lucene and Solr ready for Java 9 Jigsaw's module system, where
reflection on Java's runtime classes is very restricted.
(Robert Muir, Uwe Schindler)
* LUCENE-6467: Simplify Query.equals. (Paul Elschot via Adrien Grand)
* LUCENE-6845: SpanScorer is now merged into Spans (Alan Woodward, David Smiley)
* LUCENE-6887: DefaultSimilarity is deprecated, use ClassicSimilarity for equivalent behavior,
or consider switching to BM25Similarity which will become the new default in Lucene 6.0 (hossman)
* LUCENE-6893: factor out CorePlusQueriesParser from CorePlusExtensionsParser
(Christine Poerschke)
* LUCENE-6902: Don't retry to fsync files / directories; fail
immediately. (Daniel Mitterdorfer, Uwe Schindler)
* LUCENE-6801: Clarify JavaDocs of PhraseQuery that it in fact supports terms
at the same position (as does MultiPhraseQuery), treated like a conjunction.
Added test. (David Smiley, Adrien Grand)
Build
* LUCENE-6732: Improve checker for invalid source patterns to also
detect javadoc-style license headers. Use Groovy to implement the
checks instead of plain Ant. (Uwe Schindler)
* LUCENE-6594: Update forbiddenapis to 2.0. (Uwe Schindler)
Tests
* LUCENE-6752: Add Math#random() to forbiddenapis. (Uwe Schindler,
Mikhail Khludnev, Andrei Beliakov)
Changes in Backwards Compatibility Policy
* LUCENE-6742: The Lovins & Finnish implementation of SnowballFilter
were fixed to now behave exactly like the original Snowball stemmer.
If you have indexed text using those stemmers you may need to reindex.
(Uwe Schindler, Robert Muir)
Changes in Runtime Behavior
* LUCENE-6772: MultiCollector now catches CollectionTerminatedException and
removes the collector that threw this exception from the list of sub
collectors to collect. (Adrien Grand)
* LUCENE-6784: IndexSearcher's query caching is enabled by default. Run
indexSearcher.setQueryCache(null) to disable. (Adrien Grand)
* LUCENE-6305: BooleanQuery.equals and hashcode do not depend on the order of
clauses anymore. (Adrien Grand)
======================= Lucene 5.3.2 =======================
Bug Fixes
* SOLR-7865: BlendedInfixSuggester was returning too many results
(Arcadius Ahouansou via Mike McCandless)
======================= Lucene 5.3.1 =======================
Bug Fixes
* LUCENE-6774: Remove classloader hack in MorfologikFilter. (Robert Muir,
Uwe Schindler)
* LUCENE-6748: UsageTrackingQueryCachingPolicy no longer caches trivial queries
like MatchAllDocsQuery. (Adrien Grand)
* LUCENE-6781: Fixed BoostingQuery to rewrite wrapped queries. (Adrien Grand)
Tests
* LUCENE-6760, SOLR-7958: Move TestUtil#randomWhitespace to the only
Solr test that is using it. The method is not useful for Lucene tests
(and easily breaks, e.g., in Java 9 caused by Unicode version updates).
(Uwe Schindler)
======================= Lucene 5.3.0 =======================
New Features
* LUCENE-6485: Add CustomSeparatorBreakIterator to postings
highlighter which splits on any character. For example, it
can be used with getMultiValueSeparator render whole field
values. (Luca Cavanna via Robert Muir)
* LUCENE-6459: Add common suggest API that mirrors Lucene's
Query/IndexSearcher APIs for Document based suggester.
Adds PrefixCompletionQuery, RegexCompletionQuery,
FuzzyCompletionQuery and ContextQuery.
(Areek Zillur via Mike McCandless)
* LUCENE-6487: Spatial Geo3D API now has a WGS84 ellipsoid world model option.
(Karl Wright via David Smiley)
* LUCENE-6477: Add experimental BKD geospatial tree doc values format
and queries, for fast "bbox/polygon contains lat/lon points" (Mike
McCandless)
* LUCENE-6526: Asserting(Query|Weight|Scorer) now ensure scores are not computed
if they are not needed. (Adrien Grand)
* LUCENE-6481: Add GeoPointField, GeoPointInBBoxQuery,
GeoPointInPolygonQuery for simple "indexed lat/lon point in
bbox/shape" searching. (Nick Knize via Mike McCandless)
* LUCENE-5954: The segments_N commit point now stores the Lucene
version that wrote the commit as well as the lucene version that
wrote the oldest segment in the index, for faster checking of "too
old" indices (Ryan Ernst, Robert Muir, Mike McCandless)
* LUCENE-6519: BKDPointInPolygonQuery is much faster by avoiding
the per-hit polygon check when a leaf cell is fully contained by the
polygon. (Nick Knize, Mike McCandless)
* LUCENE-6549: Add preload option to MMapDirectory. (Robert Muir)
* LUCENE-6504: Add Lucene53Codec, with norms implemented directly
via the Directory's RandomAccessInput api. (Robert Muir)
* LUCENE-6539: Add new DocValuesNumbersQuery, to match any document
containing one of the specified long values. This change also
moves the existing DocValuesTermsQuery and DocValuesRangeQuery
to Lucene's sandbox module, since in general these queries are
quite slow and are only fast in specific cases. (Adrien Grand,
Robert Muir, Mike McCandless)
* LUCENE-6577: Give earlier and better error message for invalid CRC.
(Robert Muir)
* LUCENE-6544: Geo3D: (1) Regularize path & polygon construction, (2) add
PlanetModel.surfaceDistance() (ellipsoidal calculation), (3) cache lat & lon
in GeoPoint, (4) add thread-safety where missing -- Geo3dShape. (Karl Wright,
David Smiley)
* LUCENE-6606: SegmentInfo.toString now confesses how the documents
were sorted, when SortingMergePolicy was used (Christine Poerschke
via Mike McCandless)
* LUCENE-6524: IndexWriter can now be initialized from an already open
near-real-time or non-NRT reader. (Boaz Leskes, Robert Muir, Mike
McCandless)
* LUCENE-6578: Geo3D can now compute the distance from a point to a shape, both
inner distance and to an outside edge. Multiple distance algorithms are
available. (Karl Wright, David Smiley)
* LUCENE-6632: Geo3D: Compute circle planes more accurately.
(Karl Wright via David Smiley)
* LUCENE-6653: Added general purpose BytesTermAttribute to basic token
attributes package that can be used for TokenStreams that solely produce
binary terms. (Uwe Schindler)
* LUCENE-6365: Add Operations.topoSort, to run topological sort of the
states in an Automaton (Markus Heiden via Mike McCandless)
* LUCENE-6365: Replace Operations.getFiniteStrings with a
more scalable iterator API (FiniteStringsIterator) (Markus Heiden
via Mike McCandless)
* LUCENE-6589: Add a new org.apache.lucene.search.join.CheckJoinIndex class
that can be used to validate that an index has an appropriate structure to
run join queries. (Adrien Grand)
* LUCENE-6659: Remove IndexWriter's unnecessary hard limit on max concurrency
(Robert Muir, Mike McCandless)
* LUCENE-6547: Add GeoPointDistanceQuery, matching all points within
the specified distance from the center point. Fix
GeoPointInBBoxQuery to handle dateline crossing.
* LUCENE-6694: Add LithuanianAnalyzer and LithuanianStemmer.
(Dainius Jocas via Robert Muir)
* LUCENE-6695: Added a new BlendedTermQuery to blend statistics across several
terms. (Simon Willnauer, Adrien Grand)
* LUCENE-6706: Added a new PayloadScoreQuery that generalises the behaviour of
PayloadTermQuery and PayloadNearQuery to all Span queries. (Alan Woodward)
* LUCENE-6697: Add experimental range tree doc values format and
queries, based on a 1D version of the spatial BKD tree, for a faster
and smaller alternative to postings-based numeric and binary term
filtering. Range trees can also handle values larger than 64 bits.
(Adrien Grand, Mike McCandless)
* LUCENE-6647: Add GeoHash string utility APIs (Nick Knize via Mike
McCandless).
* LUCENE-6710: GeoPointField now uses full 64 bits (up from 62) to encode
lat/lon (Nick Knize via Mike McCandless).
* LUCENE-6580: SpanNearQuery now allows defined-width gaps in its subqueries
(Alan Woodward, Adrien Grand).
* LUCENE-6712: Use doc values to post-filter GeoPointField hits that
fall in boundary cells, resulting in smaller index, faster searches
and less heap used for each query (Nick Knize via Mike McCandless).
API Changes
* LUCENE-6508: Simplify Lock api, there is now just
Directory.obtainLock() which returns a Lock that can be
released (or fails with exception). Add lock verification
to IndexWriter. Improve exception messages when locking fails.
(Uwe Schindler, Mike McCandless, Robert Muir)
* LUCENE-6371, LUCENE-6490: Payload collection from Spans is moved to a more generic
SpanCollector framework. Spans no longer implements .hasPayload() and
.getPayload() methods, and instead exposes a collect() method that allows
the collection of arbitrary postings information. SpanPayloadCheckQuery and
SpanPayloadNearCheckQuery have moved from the .spans package to the .payloads
package. (Alan Woodward, David Smiley, Paul Elschot, Robert Muir)
* LUCENE-6529: Removed an optimization in UninvertingReader that was causing
incorrect results for Numeric fields using precisionStep
(hossman, Robert Muir)
* LUCENE-6551: Add missing ConcurrentMergeScheduler.getAutoIOThrottle
getter (Simon Willnauer, Mike McCandless)
* LUCENE-6552: Add MergePolicy.OneMerge.getMergeInfo and rename
setInfo to setMergeInfo (Simon Willnauer, Mike McCandless)
* LUCENE-6525: Deprecate IndexWriterConfig's writeLockTimeout.
(Robert Muir)
* LUCENE-6583: FilteredQuery is deprecated and will be removed in 6.0. It should
be replaced with a BooleanQuery which handle the query as a MUST clause and
the filter as a FILTER clause. (Adrien Grand)
* LUCENE-6553: The postings, spans and scorer APIs no longer take an acceptDocs
parameter. Live docs are now always checked on top of these APIs.
(Adrien Grand)
* LUCENE-6634: PKIndexSplitter now takes a Query instead of a Filter to decide
how to split an index. (Adrien Grand)
* LUCENE-6643: GroupingSearch from lucene/grouping was changed to take a Query
object to define groups instead of a Filter. (Adrien Grand)
* LUCENE-6554: ToParentBlockJoinFieldComparator was removed because of a bug
with missing values that could not be fixed. ToParentBlockJoinSortField now
works with string or numeric doc values selectors. Sorting on anything else
than a string or numeric field would require to implement a custom selector.
(Adrien Grand)
* LUCENE-6648: All lucene/facet APIs now take Query objects where they used to
take Filter objects. (Adrien Grand)
* LUCENE-6640: Suggesters now take a BitsProducer object instead of a Filter
object to reduce the scope of doc IDs that may be returned, emphasizing the
fact that these objects need to support random-access. (Adrien Grand)
* LUCENE-6646: Make EarlyTerminatingCollector take a Sort object directly
instead of a SortingMergePolicy. (Christine Poerschke via Adrien Grand)
* LUCENE-6649: BitDocIdSetFilter and BitDocIdSetCachingWrapperFilter are now
deprecated in favour of BitSetProducer and QueryBitSetProducer, which do not
extend oal.search.Filter. (Adrien Grand)
* LUCENE-6607: Factor out geo3d into its own spatial3d module. (Karl
Wright, Nick Knize, David Smiley, Mike McCandless)
* LUCENE-6531: PhraseQuery is now immutable and can be built using the
PhraseQuery.Builder class. (Adrien Grand)
* LUCENE-6570: BooleanQuery is now immutable and can be built using the
BooleanQuery.Builder class. (Adrien Grand)
* LUCENE-6702: NRTSuggester: Add a method to inject context values at index time
in ContextSuggestField. Simplify ContextQuery logic for extracting contexts and
add dedicated method to consider all context values at query time.
(Areek Zillur, Mike McCandless)
* LUCENE-6719: NumericUtils getMinInt, getMaxInt, getMinLong, getMaxLong now
return null if there are no terms for the specified field, previously these
methods returned primitive values and raised an undocumented NullPointerException
if there were no terms for the field. (hossman, Timothy Potter)
Bug fixes
* LUCENE-6500: ParallelCompositeReader did not always call
closed listeners. This was fixed by LUCENE-6501.
(Adrien Grand, Uwe Schindler)
* LUCENE-6520: Geo3D GeoPath.done() would throw an NPE if adjacent path
segments were co-linear. (Karl Wright via David Smiley)
* LUCENE-5805: QueryNodeImpl.removeFromParent was doing nothing in a
costly manner (Christoph Kaser, Cao Manh Dat via Mike McCAndless)
* LUCENE-6533: SlowCompositeReaderWrapper no longer caches its live docs
instance since this can prevent future improvements like a
disk-backed live docs (Adrien Grand, Mike McCandless)
* LUCENE-6558: Highlighters now work with CustomScoreQuery (Cao Manh
Dat via Mike McCandless)
* LUCENE-6560: BKDPointInBBoxQuery now handles "dateline crossing"
correctly (Nick Knize, Mike McCandless)
* LUCENE-6564: Change PrintStreamInfoStream to use thread safe Java 8
ISO-8601 date formatting (in Lucene 5.x use Java 7 FileTime#toString
as workaround); fix output of tests to use same format. (Uwe Schindler,
Ramkumar Aiyengar)
* LUCENE-6593: Fixed ToChildBlockJoinQuery's scorer to not refuse to advance
to a document that belongs to the parent space. (Adrien Grand)
* LUCENE-6591: Never write a negative vLong (Robert Muir, Ryan Ernst,
Adrien Grand, Mike McCandless)
* LUCENE-6588: Fix how ToChildBlockJoinQuery deals with acceptDocs.
(Christoph Kaser via Adrien Grand)
* LUCENE-6597: Geo3D's GeoCircle now supports a world-globe diameter.
(Karl Wright via David Smiley)
* LUCENE-6608: Fix potential resource leak in BigramDictionary.
(Rishabh Patel via Uwe Schindler)
* LUCENE-6614: Improve partition detection in IOUtils#spins() so it
works with NVMe drives. (Uwe Schindler, Mike McCandless)
* LUCENE-6586: Fix typo in GermanStemmer, causing possible wrong value
for substCount. (Christoph Kaser via Mike McCandless)
* LUCENE-6658: Fix IndexUpgrader to also upgrade indexes without any
segments. (Trejkaz, Uwe Schindler)
* LUCENE-6677: QueryParserBase fails to enforce maxDeterminizedStates when
creating a WildcardQuery (David Causse via Mike McCandless)
* LUCENE-6680: Preserve two suggestions that have same key and weight but
different payloads (Arcadius Ahouansou via Mike McCandless)
* LUCENE-6681: SortingMergePolicy must override MergePolicy.size(...).
(Christine Poerschke via Adrien Grand)
* LUCENE-6682: StandardTokenizer performance bug: scanner buffer is
unnecessarily copied when maxTokenLength doesn't change. Also stop silently
maxing out buffer size (and effectively also max token length) at 1M chars,
but instead throw an exception from setMaxTokenLength() when the given
length is greater than 1M chars. (Piotr Idzikowski, Steve Rowe)
* LUCENE-6696: Fix FilterDirectoryReader.close() to never close the
underlying reader several times. (Adrien Grand)
* LUCENE-6334: FastVectorHighlighter failed to highlight phrases across
more than one value in a multi-valued field. (Chris Earle, Nik Everett
via Mike McCandless)
* LUCENE-6704: GeoPointDistanceQuery was visiting too many term ranges,
consuming too much heap for a large radius (Nick Knize via Mike McCandless)
* SOLR-5882: fix ScoreMode.Min at ToParentBlockJoinQuery (Mikhail Khludnev)
* LUCENE-6718: JoinUtil.createJoinQuery failed to rewrite queries before
creating a Weight. (Adrien Grand)
* LUCENE-6713: TooComplexToDeterminizeException claims to be serializable
but wasn't (Simon Willnauer, Mike McCandless)
* LUCENE-6723: Fix date parsing problems in Java 9 with date formats using
English weekday/month names. (Uwe Schindler)
* LUCENE-6618: Properly set MMapDirectory.UNMAP_SUPPORTED when it is now allowed
by security policy. (Robert Muir)
Changes in Runtime Behavior
* LUCENE-6501: The subreader structure in ParallelCompositeReader
was flattened, because the current implementation had too many
hidden bugs regarding refounting and close listeners.
If you create a new ParallelCompositeReader, it will just take
all leaves of the passed readers and form a flat structure of
ParallelLeafReaders instead of trying to assemble the original
structure of composite and leaf readers. (Adrien Grand,
Uwe Schindler)
* LUCENE-6537: NearSpansOrdered no longer tries to minimize its
Span matches. This means that the matching algorithm is entirely
lazy. All spans returned by the previous implementation are still
reported, but matching documents may now also return additional
spans that were previously discarded in preference to shorter
overlapping ones. (Alan Woodward, Adrien Grand, Paul Elschot)
* LUCENE-6538: Also include java.vm.version and java.runtime.version
in per-segment diagnostics (Robert Muir, Mike McCandless)
* LUCENE-6569: Optimize MultiFunction.anyExists and allExists to eliminate
excessive array creation in common 2 argument usage (Jacob Graves, hossman)
* LUCENE-2880: Span queries now score more consistently with regular queries.
(Robert Muir, Adrien Grand)
* LUCENE-6601: FilteredQuery now always rewrites to a BooleanQuery which handles
the query as a MUST clause and the filter as a FILTER clause.
LEAP_FROG_QUERY_FIRST_STRATEGY and LEAP_FROG_FILTER_FIRST_STRATEGY do not
guarantee anymore which iterator will be advanced first, it will depend on the
respective costs of the iterators. QUERY_FIRST_FILTER_STRATEGY and
RANDOM_ACCESS_FILTER_STRATEGY still consume the filter using its random-access
API, however the returned bits may be called on different documents compared
to before. (Adrien Grand)
* LUCENE-6542: FSDirectory's ctor now works with security policies or file systems
that restrict write access. (Trejkaz, hossman, Uwe Schindler)
* LUCENE-6651: The default implementation of AttributeImpl#reflectWith(AttributeReflector)
now uses AccessControler#doPrivileged() to do the reflection. Please consider
implementing this method in all your custom attributes, because the method will be
made abstract in Lucene 6. (Uwe Schindler)
* LUCENE-6639: LRUQueryCache and CachingWrapperQuery now consider a query as
"used" when the first Scorer is pulled instead of when a Scorer is pulled on
the first segment on an index. (Terry Smith, Adrien Grand)
* LUCENE-6579: IndexWriter now sacrifices (closes) itself to protect the index
when an unexpected, tragic exception strikes while merging. (Robert
Muir, Mike McCandless)
* LUCENE-6691: SortingMergePolicy.isSorted now considers FilterLeafReader instances.
EarlyTerminatingSortingCollector.terminatedEarly accessor added.
TestEarlyTerminatingSortingCollector.testTerminatedEarly test added.
(Christine Poerschke)
* LUCENE-6609: Add getSortField impls to many subclasses of FieldCacheSource which return
the most direct SortField implementation. In many trivial sort by ValueSource usages, this
will result in less RAM, and more precise sorting of extreme values due to no longer
converting to double. (hossman)
Optimizations
* LUCENE-6548: Some optimizations for BlockTree's intersect with very
finite automata (Mike McCandless)
* LUCENE-6585: Flatten conjunctions and conjunction approximations into
parent conjunctions. For example a sloppy phrase query of "foo bar"~5
with a filter of "baz" will internally leapfrog foo,bar,baz as one
conjunction. (Ryan Ernst, Robert Muir, Adrien Grand)
* LUCENE-6325: Reduce RAM usage of FieldInfos, and speed up lookup by
number, by using an array instead of TreeMap except in very sparse
cases (Robert Muir, Mike McCandless)
* LUCENE-6617: Reduce heap usage for small FSTs (Mike McCandless)
* LUCENE-6616: IndexWriter now lists the files in the index directory
only once on init, and IndexFileDeleter no longer suppresses
FileNotFoundException and NoSuchFileException. This also improves
IndexFileDeleter to delete segments_N files last, so that in the
presence of a virus checker, the index is never left in a state
where an expired segments_N references non-existing files (Robert
Muir, Mike McCandless)
* LUCENE-6645: Optimized the way we merge postings lists in multi-term queries
and TermsQuery. This should especially help when there are lots of small
postings lists. (Adrien Grand, Mike McCandless)
* LUCENE-6668: Optimized storage for sorted set and sorted numeric doc values
in the case that there are few unique sets of values.
(Adrien Grand, Robert Muir)
* LUCENE-6690: Sped up MultiTermsEnum.next() on high-cardinality fields.
(Adrien Grand)
* LUCENE-6621: Removed two unused variables in analysis/stempel/src/java/org/
egothor/stemmer/Compile.java
(Rishabh Patel via Christine Poerschke)
Build
* LUCENE-6518: Don't report false thread leaks from IBM J9
ClassCache Reaper in test framework. (Dawid Weiss)
* LUCENE-6567: Simplify payload checking in SpanPayloadCheckQuery (Alan
Woodward)
* LUCENE-6568: Make rat invocation depend on ivy configuration being set up
(Ramkumar Aiyengar)
* LUCENE-6683: ivy-fail goal directs people to non-existent page
(Mike Drob via Steve Rowe)
* LUCENE-6693: Updated Groovy to 2.4.4, Pegdown to 1.5, Svnkit to 1.8.10.
Also fixed some PermGen errors while running full build caused by
these updates: Tasks are now installed from root's build.xml.
(Uwe Schindler)
* LUCENE-6741: Fix jflex files to regenerate the java files correctly.
(Uwe Schindler)
Test Framework
* LUCENE-6637: Fix FSTTester to not violate file permissions
on -Dtests.verbose=true. (Mesbah M. Alam, Uwe Schindler)
* LUCENE-6542: LuceneTestCase now has runWithRestrictedPermissions() to run
an action with reduced permissions. This can be used to simulate special
environments (e.g., read-only dirs). If tests are running without a security
manager, an assume cancels test execution automatically. (Uwe Schindler)
* LUCENE-6652: Removed lots of useless Byte(s)TermAttributes all over test
infrastructure. (Uwe Schindler)
* LUCENE-6563: Improve MockFileSystemTestCase.testURI to check if a path
can be encoded according to local filesystem requirements. Otherwise
stop test execution. (Christine Poerschke via Uwe Schindler)
Changes in Backwards Compatibility Policy
* LUCENE-6553: The iterator returned by the LeafReader.postings method now
always includes deleted docs, so you have to check for deleted documents on
top of the iterator. (Adrien Grand)
* LUCENE-6633: DuplicateFilter has been deprecated and will be removed in 6.0.
DiversifiedTopDocsCollector can be used instead with a maximum number of hits
per key equal to 1. (Adrien Grand)
* LUCENE-6653: The workflow for consuming the TermToBytesRefAttribute was changed:
getBytesRef() now does all work and is called on each token, fillBytesRef()
was removed. The implementation is free to reuse the internal BytesRef
or return a new one on each call. (Uwe Schindler)
* LUCENE-6682: StandardTokenizer.setMaxTokenLength() now throws an exception if
a length greater than 1M chars is given. Previously the effective max token
length (the scanner's buffer) was capped at 1M chars, but getMaxTokenLength()
incorrectly returned the previously requested length, even when it exceeded 1M.
(Piotr Idzikowski, Steve Rowe)
======================= Lucene 5.2.1 =======================
Bug Fixes
* LUCENE-6482: Fix class loading deadlock relating to Codec initialization,
default codec and SPI discovery. (Shikhar Bhushan, Uwe Schindler)
* LUCENE-6523: NRT readers now reflect a new commit even if there is
no change to the commit user data (Mike McCandless)
* LUCENE-6527: Queries now get a dummy Similarity when scores are not needed
in order to not load unnecessary information like norms. (Adrien Grand)
* LUCENE-6559: TimeLimitingCollector now also checks for timeout when a new
leaf reader is pulled ie. if we move from one segment to another even without
collecting a hit. (Simon Willnauer)
======================= Lucene 5.2.0 =======================