Skip to content
Permalink
master
Go to file
 
 
Cannot retrieve contributors at this time
17356 lines (12568 sloc) 755 KB
Lucene Change Log
For more information on past and future Lucene versions, please see:
http://s.apache.org/luceneversions
======================= Lucene 9.0.0 =======================
System Requirements
* LUCENE-8738: Move to Java 11 as minimum Java version.
(Adrien Grand, Uwe Schindler)
API Changes
* LUCENE-9317: Clean up package name conflicts between core and analyzers-common.
See MIGRATE.md for details. (David Ryan, Tomoko Uchida, Uwe Schindler, Dawid Weiss)
* LUCENE-8474: RAMDirectory and associated deprecated classes have been
removed. (Dawid Weiss)
* LUCENE-3041: The deprecated Weight#extractTerms() method has been
removed (Alan Woodward, Simon Willnauer, David Smiley, Luca Cavanna)
* LUCENE-8805: StoredFieldVisitor#stringField now takes a String rather than a
byte[] that stores the UTF-8 bytes of the stored string.
(Namgyu Kim via Adrien Grand)
* LUCENE-8811: BooleanQuery#setMaxClauseCount() and #getMaxClauseCount() have
moved to IndexSearcher. The checks are now implemented using a QueryVisitor
and apply to all queries, rather than only booleans. (Atri Sharma, Adrien
Grand, Alan Woodward)
* LUCENE-8909: The deprecated IndexWriter#getFieldNames() method has been removed.
(Adrien Grand, Munendra S N)
* LUCENE-8948: Change "name" argument in ICU factories to "form". Here, "form" is
named after "Unicode Normalization Form". (Tomoko Uchida)
* LUCENE-8933: Validate JapaneseTokenizer user dictionary entry. (Tomoko Uchida)
* LUCENE-8905: Better defence against malformed arguments in TopDocsCollector
(Atri Sharma)
* LUCENE-9089: FST Builder renamed FSTCompiler with fluent-style Builder.
(Bruno Roustant)
* LUCENE-9212: Deprecated Intervals.multiterm() methods that take a bare Automaton
have been removed (Alan Woodward)
* LUCENE-9264: SimpleFSDirectory has been removed in favor of NIOFSDirectory.
(Yannick Welsch)
* LUCENE-9281: Use java.util.ServiceLoader to load codec components and analysis
factories to be compatible with Java Module System. This allows to load factories
without META-INF/service from a Java module exposing the factory in the module
descriptor. This breaks backwards compatibility as custom analysis factories
must now also implement the default constructor (see MIGRATE.md).
(Uwe Schindler, Dawid Weiss)
* LUCENE-9307: BufferedIndexInput#setBufferSize has been removed. (Adrien Grand)
* LUCENE-9340: SimpleBindings#add(SortField) has been removed. (Alan Woodward)
* LUCENE-9462: Fields without positions should still return MatchIterator.
(Alan Woodward, Dawid Weiss)
* LUCENE-9516: Removed the ability to replace the IndexingChain / DocConsumer
in Lucenes IndexWriter. The interface is not sufficient to efficiently
replace the functionality with reasonable efforts. (Simon Willnauer)
Improvements
* LUCENE-9463: Query match region retrieval component, passage scoring and formatting
for building custom highlighters. (Alan Woodward, Dawid Weiss)
* LUCENE-9370: RegExp query is no longer lenient about inappropriate backslashes and
follows the Java Pattern policy for rejecting illegal syntax. (Mark Harwood)
* LUCENE-9336: RegExp query now supports \w \W \d \D \s \S expressions.
This is a break with previous behaviour where these were (mis)interpreted
as literally the characters w W d etc. (Mark Harwood)
* LUCENE-8757: When provided with an ExecutorService to run queries across
multiple threads, IndexSearcher now groups small segments together, up to
250k docs per slice. (Atri Sharma via Adrien Grand)
* LUCENE-8857: Introduce Custom Tiebreakers in TopDocs.merge for tie breaking on
docs on equal scores. Also, remove the ability of TopDocs.merge to set shard
indices (Atri Sharma, Adrien Grand, Simon Willnauer)
* LUCENE-8958: Shared count early termination for relevance sorted indices (Atri Sharma)
* LUCENE-8937: Avoid agressive stemming on numbers in the FrenchMinimalStemmer.
(Adrien Gallou via Tomoko Uchida)
* LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields (Andy Hind via Anshum Gupta)
* LUCENE-8596: Kuromoji user dictionary now accepts entries containing hash mark (#) that were
previously treated as beginning a line-ending comment (Satoshi Kato and Masaru Hasegawa via
Michael Sokolov)
* LUCENE-9109: Use StackWalker to implement TestSecurityManager's detection
of JVM exit (Uwe Schindler)
* LUCENE-9110: Refactor stack analysis in tests to use generalized LuceneTestCase
methods that use StackWalker (Uwe Schindler)
* LUCENE-9206: IndexMergeTool gets additional options to control the merging.
This tool no longer forceMerge(1)s to a single segment by default. If you
rely upon this behavior, pass -max-segments 1 instead. (Robert Muir)
* LUCENE-9220: Upgrade snowball to 2.0. New snowball stemmers: Hindi, Indonesian,
Nepali, Serbian, and Tamil. New stoplist: Indonesian. Adds gradle 'snowball'
task to regenerate and ease future upgrades. (Robert Muir, Dawid Weiss)
* LUCENE-9354: Improvements to snowball french stopwords list, so that it is less
aggressive. (Philippe Ouellet)
* LUCENE-9114: Improve ValueSourceScorer's Default Cost Implementation (Atri Sharma, David Smiley)
* LUCENE-9074: Introduce Slice Executor For Dynamic Runtime Execution Of Slices (Atri Sharma)
* LUCENE-9280: Add an ability for field comparators to skip non-competitive documents.
Creating a TopFieldCollector with totalHitsThreshold less than Integer.MAX_VALUE
instructs Lucene to skip non-competitive documents whenever possible. For numeric
sort fields the skipping functionality works when the same field is indexed both
with doc values and points. In this case, there is an assumption that the same data is
stored in these points and doc values (Mayya Sharipova, Jim Ferenczi, Adrien Grand)
* LUCENE-9313: Add SerbianAnalyzer based on the snowball stemmer. (Dragan Ivanovic)
* LUCENE-9449: Enhance DocComparator to provide an iterator over competitive
documents when searching with "after". This iterator can quickly position
on the desired "after" document skipping all documents and segments before
"after". Also redesign numeric comparators to provide skipping functionality
by default. (Mayya Sharipova, Jim Ferenczi)
* LUCENE-9527: Upgrade javacc to 7.0.4, regenerate query parsers. (Dawid Weiss)
* LUCENE-9531: Consolidated CharStream and FastCharStream classes: these have been moved
from each query parser package to org.apache.lucene.queryparser.charstream (Dawid Weiss).
Bug fixes
* LUCENE-8663: NRTCachingDirectory.slowFileExists may open a file while
it's inaccessible. (Dawid Weiss)
* LUCENE-9117: RamUsageEstimator hangs with AOT compilation. Removed any attempt to
estimate Long.valueOf cache size. (Cleber Muramoto, Dawid Weiss)
* LUCENE-9290: Don't assume that different XYPoint have different hash code
(Ignacio Vera via Mike Drob)
* LUCENE-9372: Fix paths for cygwin/msys before gradle wrapper jar lookup.
(Peter Barna)
* LUCENE-9365: FuzzyQuery was missing matches when prefix length was equal to the term length
(Mark Harwood, Mike Drob)
Other
* LUCENE-9312: Allow gradle builds against arbitrary JVMs. (Tomoko Uchida, Dawid Weiss)
* LUCENE-9391: Upgrade HPPC to 0.8.2. (Haoyu Zhai)
* LUCENE-8768: Fix Javadocs build in Java 11. (Namgyu Kim)
* LUCENE-9092: upgrade randomizedtesting to 2.7.5 (Dawid Weiss)
* LUCENE-8656: Deprecations in FuzzyQuery and get compiler warnings out of
queryparser code (Alan Woodward, Erick Erickson)
* LUCENE-9344: Convert .txt files to properly formatted .md files. (Tomoko Uchida, Uwe Schindler)
* LUCENE-9267: Update MatchingQueries documentation to correct
time unit. (Pierre-Luc Perron via Mike Drob)
* LUCENE-9411: Fail complation on warnings, 9x gradle-only (Erick Erickson, Dawid Weiss)
Deserves mention here as well as Lucene CHANGES.txt since it affects both.
* LUCENE-9433: Remove Ant support from trunk (Erick Erickson, Uwe Schindler et.al.)
* LUCENE-9215: Replace checkJavaDocs.py with doclet (Robert Muir, Dawid Weiss, Uwe Schindler)
* LUCENE-9497: Integrate Error Prone, a static analysis tool during compilation (Dawid Weiss, Varun Thacker)
* LUCENE-9544: add regenerate gradle script for nori dictionary (Namgyu Kim)
======================= Lucene 8.7.0 =======================
API Changes
---------------------
* LUCENE-9437: Lucene's facet module's DocValuesOrdinalsReader.decode method
is now public, making it easier for applications to decode facet
ordinals into their corresponding labels (Ankur Goel)
* LUCENE-9515: IndexingChain now accepts individual primitives rather than a
DocumentsWriterPerThread instance in order to create a new DocConsumer.
(Simon Willnauer)
New Features
---------------------
* LUCENE-9386: RegExpQuery added case insensitive matching option. (Mark Harwood)
* LUCENE-8962: Add IndexWriter merge-on-refresh feature to selectively merge
small segments on getReader, subject to a configurable timeout, to improve
search performance by reducing the number of small segments for searching. (Simon Willnauer)
* LUCENE-9484: Allow sorting an index after it was created. With SortingCodecReader, existing
unsorted segments can be wrapped and merged into a fresh index using IndexWriter#addIndices
API. (Simon Willnauer, Adrien Grand)
* LUCENE-9444: Add utility class to retrieve facet labels from the
taxonomy index for a facet field so such fields do not also have to
be redundantly stored (Ankur Goel)
Improvements
---------------------
* LUCENE-8574: Add a new ExpressionValueSource which will enforce only one value per name
per hit in dependencies, ExpressionFunctionValues will no longer
recompute already computed values (Haoyu Zhai)
* LUCENE-9416: Fix CheckIndex to print an invalid non-zero norm as
unsigned long when detecting corruption.
* LUCENE-9440: FieldInfo#checkConsistency called twice from Lucene50(60)FieldInfosFormat#read;
Removed the (redundant?) assert and do these checks for real. (Yauheni Putsykovich)
* LUCENE-9446: In BooleanQuery rewrite, always remove MatchAllDocsQuery filter clauses
when possible. (Julie Tibshirani)
* LUCENE-9501: Improve coverage for Asserting* test classes: make sure to handle singleton doc
values, and sometimes exercise Weight#scorer instead of Weight#bulkScorer for top-level
queries. (Julie Tibshirani)
* LUCENE-9511: Include StoredFieldsWriter in DWPT accounting to ensure that it's
heap consumption is taken into account when IndexWriter stalls or should flush
DWPTs. (Simon Willnauer)
* LUCENE-9514: Include TermVectorsWriter in DWPT accounting to ensure that it's
heap consumption is taken into account when IndexWriter stalls or should flush
DWPTs. (Simon Willnauer)
* LUCENE-9523: In query shapes over shape fields, skip points while traversing the
BKD tree when the relationship with the document is already known. (Ignacio Vera)
* LUCENE-9539: Use more compact datastructures to represent sorted doc-values in memory when
sorting a segment before flush and in SortingCodecReader. (Simon Willnauer)
Optimizations
---------------------
* LUCENE-9395: ConstantValuesSource now shares a single DoubleValues
instance across all segments (Tony Xu)
* LUCENE-9447, LUCENE-9486: Stored fields now get higer compression ratios on
highly compressible data. (Adrien Grand)
* LUCENE-9373: FunctionMatchQuery now accepts a "matchCost" optimization hint.
(Maxim Glazkov, David Smiley)
* LUCENE-9510: Indexing with an index sort is now faster by not compressing
temporary representations of the data. (Adrien Grand)
Bug Fixes
---------------------
* LUCENE-9427: Fix a regression where the unified highlighter didn't produce
highlights on fuzzy queries that correspond to exact matches. (Julie Tibshirani)
* LUCENE-9467: Fix NRTCachingDirectory to use Directory#fileLength to check if a file
already exists instead of opening an IndexInput on the file which might throw a AccessDeniedException
in some Directory implementations. (Simon Willnauer)
* LUCENE-9501: Fix a bug in IndexSortSortedNumericDocValuesRangeQuery where it could violate the
DocIdSetIterator contract. (Julie Tibshirani)
* LUCENE-9401: Include field in ComplexPhraseQuery's toString() (Thomas Hecker via Munendra S N)
Documentation
---------------------
* LUCENE-9424: Add a performance warning to AttributeSource.captureState javadocs (Haoyu Zhai)
Changes in Runtime Behavior
---------------------
* LUCENE-9539: SortingCodecReader now doesn't cache doc values fields anymore. Previously, SortingCodecReader
used to cache all doc values fields after they were loaded into memory. This reader should only be used
to sort segments after the fact using IndexWriter#addIndices. (Simon Willnauer)
Other
---------------------
* LUCENE-9292: Refactor BKD point configuration into its own class. (Ignacio Vera)
* LUCENE-9470: Make TestXYMultiPolygonShapeQueries more resilient for CONTAINS queries. (Ignacio Vera)
* LUCENE-9512: Move LockFactory stress test to be a unit/integration
test. (Uwe Schindler, Dawid Weiss, Robert Muir)
Build
* Upgrade forbiddenapis to version 3.1. (Uwe Schindler)
======================= Lucene 8.6.2 =======================
Bug Fixes
---------------------
* LUCENE-9478: Prevent DWPTDeleteQueue from referencing itself and leaking memory. The queue
passed an implicit this reference to the next queue instance on flush which leaked about 500byte
of memory on each full flush, commit or getReader call. (Simon Willnauer)
======================= Lucene 8.6.1 =======================
Bug Fixes
---------------------
* LUCENE-9443: The UnifiedHighlighter was closing the underlying reader when there were multiple term-vector fields.
This was a regression in 8.6.0. (David Smiley, Chris Beer)
======================= Lucene 8.6.0 =======================
API Changes
---------------------
* LUCENE-9265: SimpleFSDirectory is deprecated in favor of NIOFSDirectory. (Yannick Welsch)
* LUCENE-9304: Removed ability to set DocumentsWriterPerThreadPool on IndexWriterConfig.
The DocumentsWriterPerThreadPool is a packaged protected final class which made it impossible
to customize. (Simon Willnauer)
* LUCENE-9339: MergeScheduler#merge doesn't accept a parameter if a new merge was found anymore.
(Simon Willnauer)
* LUCENE-9330: SortFields are now responsible for writing themselves into index headers if they
are used as index sorts. (Alan Woodward, Uwe Schindler, Adrien Grand)
* LUCENE-9340: Deprecate SimpleBindings#add(SortField). (Alan Woodward)
* LUCENE-9345: MergeScheduler is now decoupled from IndexWriter. Instead it accepts a MergeSource
interface that offers the basic methods to acquire pending merges, run the merge and do accounting
around it. (Simon Willnauer)
* LUCENE-9349: QueryVisitor.consumeTermsMatching() now takes a
Supplier<ByteRunAutomaton> to enable queries that build large automata to
provide them lazily. TermsInSetQuery switches to using this method
to report matching terms. (Alan Woodward)
* LUCENE-9366: DocValues.emptySortedNumeric() not longer takes a maxDoc parameter
(Alan Woodward)
* LUCENE-7822: CodecUtil#checkFooter(IndexInput, Throwable) now throws a
CorruptIndexException if checksums mismatch or if checksums can't be verified.
(Martin Amirault, Adrien Grand)
New Features
---------------------
* LUCENE-7889: Grouping by range based on values from DoubleValuesSource and LongValuesSource
(Alan Woodward)
* LUCENE-8962: Add IndexWriter merge-on-commit feature to selectively merge small segments on commit,
subject to a configurable timeout, to improve search performance by reducing the number of small
segments for searching (Michael Froh, Mike Sokolov, Mike Mccandless, Simon Willnauer)
Improvements
---------------------
* LUCENE-9276: Use same code-path for updateDocuments and updateDocument in IndexWriter and
DocumentsWriter. (Simon Willnauer)
* LUCENE-9279: Update dictionary version for Ukrainian analyzer to 4.9.1 (Andriy Rysin via Dawid Weiss)
* LUCENE-8050: PerFieldDocValuesFormat should not get the DocValuesFormat on a field that has no doc values.
(David Smiley, Juan Rodriguez)
* LUCENE-9304: Removed ThreadState abstraction from DocumentsWriter which allows pooling of DWPT directly and
improves the approachability of the IndexWriter code. (Simon Willnauer)
* LUCENE-9324: Add an ID to SegmentCommitInfo in order to compare commits for equality and make
snapshots incremental on generational files. (Simon Willnauer, Mike Mccandless, Adrien Grant)
* LUCENE-9342: TotalHits' relation will be EQUAL_TO when the number of hits is lower than TopDocsColector's numHits
(Tomás Fernández Löbbe)
* LUCENE-9353: Metadata of the terms dictionary moved to its own file, with the
`.tmd` extension. This allows checksums of metadata to be verified when
opening indices and helps save seeks when opening an index. (Adrien Grand)
* LUCENE-9359: SegmentInfos#readCommit now always returns a
CorruptIndexException if the content of the file is invalid. (Adrien Grand)
* LUCENE-9393: Make FunctionScoreQuery use ScoreMode.COMPLETE for creating the inner query weight when
ScoreMode.TOP_DOCS is requested. (Tomás Fernández Löbbe)
* LUCENE-9392: Make FacetsConfig.DELIM_CHAR publicly accessible (Ankur Goel)
* LUCENE-9397: UniformSplit supports encodable fields metadata. (Bruno Roustant)
* LUCENE-9396: Improved truncation detection for points. (Adrien Grand, Robert Muir)
* LUCENE-9402: Let MultiCollector handle minCompetitiveScore (Tomás Fernández Löbbe, Adrien Grand)
Optimizations
---------------------
* LUCENE-9254: UniformSplit keeps FST off-heap. (Bruno Roustant)
* LUCENE-8103: DoubleValuesSource and QueryValueSource now use a TwoPhaseIterator if one is provided by the Query.
(Michele Palmia, David Smiley)
* LUCENE-9287: UsageTrackingQueryCachingPolicy no longer caches DocValuesFieldExistsQuery. (Ignacio Vera)
* LUCENE-9286: FST.Arc.BitTable reads directly FST bytes. Arc is lightweight again and FSTEnum traversal faster.
(Bruno Roustant)
* LUCENE-7788: fail precommit on unparameterised log messages and examine for wasted work/objects (Erick Erickson)
* LUCENE-9273: Speed up geometry queries by specialising Component2D spatial operations. Instead of using a generic
relate method for all relations, we use specialize methods for each one. In addition, the type of triangle is
computed at deserialization time, therefore we can be more selective when decoding points of a triangle.
(Ignacio Vera)
* LUCENE-9087: Build always trees with full leaves and lower the default value for maxPointsPerLeafNode to 512.
(Ignacio Vera)
* LUCENE-9148: Points now write their index in a separate file. (Adrien Grand)
Bug Fixes
---------------------
* LUCENE-9259: Fix wrong NGramFilterFactory argument name for preserveOriginal option (Paul Pazderski)
* LUCENE-8849: DocValuesRewriteMethod.visit wasn't visiting its embedded query (Michele Palmia, David Smiley)
* LUCENE-9258: DocTermsIndexDocValues assumed it was operating on a SortedDocValues (single valued) field when
it could be multi-valued used with a SortedSetSelector (Michele Palmia)
* LUCENE-9164: Ensure IW processes all internal events before it closes itself on a rollback.
(Simon Willnauer, Nhat Nguyen, Dawid Weiss, Mike Mccandless)
* LUCENE-8908: Return default value from objectVal when doc doesn't match the query in QueryValueSource
(Bill Bell, hossman, Munendra S N, Michele Palmia)
* LUCENE-9133: Fix for potential NPE in TermFilteredPresearcher for empty fields (Marvin Justice via Mike Drob)
* LUCENE-9309: Wait for #addIndexes merges when aborting merges. (Simon Willnauer)
* LUCENE-9337: Ensure CMS updates it's thread accounting datastructures consistently.
CMS today releases it's lock after finishing a merge before it re-acquires it to update
the thread accounting datastructures. This causes threading issues where concurrently
finishing threads fail to pick up pending merges causing potential thread starvation on
forceMerge calls. (Simon Willnauer)
* LUCENE-9314: Single-document monitor runs were using the less efficient MultiDocumentBatch
implementation. (Pierre-Luc Perron, Alan Woodward)
* LUCENE-9362: Fix equality check in ExpressionValueSource#rewrite. This fixes rewriting of inner value sources.
(Dmitry Emets)
* LUCENE-9405: IndexWriter incorrectly calls closeMergeReaders twice when the merged segment is 100% deleted.
(Michael Froh, Simon Willnauer, Mike Mccandless, Mike Sokolov)
* LUCENE-9400: Tessellator might build illegal polygons when several holes share the shame vertex. (Ignacio Vera)
* LUCENE-9417: Tessellator might build illegal polygons when several holes share are connected to the same
vertex. (Ignacio Vera)
* LUCENE-9418: Fix ordered intervals over interleaved terms (Alan Woodward)
Other
---------------------
* LUCENE-9257: Always keep FST off-heap. FSTLoadMode, Reader attributes and openedFromWriter removed. (Bruno Roustant)
* LUCENE-9272: Checksums of the terms index are now verified when
LeafReader#checkIntegrity is called rather than when opening the index.
(Adrien Grand)
* LUCENE-9270: Update Javadoc about normalizeEntry in the Kuromoji DictionaryBuilder. (Namgyu Kim)
* LUCENE-9275: Make TestLatLonMultiPolygonShapeQueries more resilient for CONTAINS queries. (Ignacio Vera)
* LUCENE-9244: Adjust TestLucene60PointsFormat#testEstimatePointCount2Dims so it does not fail when a point
is shared by multiple leaves. (Ignacio Vera)
* LUCENE-9271: ByteBufferIndexInput was refactored to work on top of the
ByteBuffer API. (Adrien Grand)
* LUCENE-9191: Make LineFileDocs's random seeking more efficient, making tests using LineFileDocs faster (Robert Muir,
Mike McCandless)
* LUCENE-9338: Refactors SimpleBindings to improve type safety and cycle detection (Alan Woodward,
Adrien Grand)
* LUCENE-9358: Change the way the multi-dimensional BKD tree builder generates the intermediate tree representation to be
equal to the one dimensional case to avoid unnecessary tree and leaves rotation. (Ignacio Vera)
* LUCENE-9288: poll_mirrors.py release script can handle HTTPS mirrors. (Ignacio Vera)
* LUCENE-9232: Fix or suppress 13 resource leak precommit warnings in lucene/replicator (Andras Salamon via Erick Erickson)
* LUCENE-9398: Always keep BKD index off-heap. BKD reader does not implement Accountable any more. (Ignacio Vera)
Build
* Upgrade forbiddenapis to version 3.0.1. (Uwe Schindler)
* LUCENE-9376: Fix or suppress 20 resource leak precommit warnings in lucene/search
(Andras Salamon via Erick Erickson)
* LUCENE-9380: Fix auxiliary class warnings in Lucene (Erick Erickson)
* LUCENE-9389: Enhance gradle logging calls validation: eliminate getMessage() (Andras Salamon via Erick Erickson)
======================= Lucene 8.5.2 =======================
Optimizations
---------------------
* LUCENE-9350: Partial reversion of LUCENE-9068; holding levenshtein automata on FuzzyQuery can end
up blowing up query caches which use query objects as cache keys, so building the automata is
now delayed to search time again. (Alan Woodward, Mike Drob)
======================= Lucene 8.5.1 =======================
Bug Fixes
---------------------
* LUCENE-9300: Fix corruption of the new gen field infos when doc values updates are applied on a segment created
externally and added to the index with IndexWriter#addIndexes(Directory). (Jim Ferenczi, Adrien Grand)
======================= Lucene 8.5.0 =======================
API Changes
---------------------
* LUCENE-9093: Not an API change but a change in behavior of the UnifiedHighlighter's LengthGoalBreakIterator that will
yield Passages sized a little different due to the fact that the sizing pivot is now the center of the first match and
not its left edge.
* LUCENE-9116: PostingsWriterBase and PostingsReaderBase no longer support
setting a field's metadata via a `long[]`. (Adrien Grand)
* LUCENE-9116: The FSTOrd postings format has been removed.
(Adrien Grand)
* LUCENE-8369: Remove obsolete spatial module. (Nick Knize, David Smiley)
* LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core. (Nick Knize)
* LUCENE-9218: XY geometries API works in float space. (Ignacio Vera)
* LUCENE-9212: Intervals.multiterm() takes CompiledAutomaton rather than plain Automaton
(Alan Woodward)
* LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d. (Nick Knize)
* LUCENE-9171: QueryBuilder.newTermQuery() and .newSynonymQuery() now take boost parameters.
(Alessandro Benedetti, Alan Woodward)
New Features
---------------------
* LUCENE-8903: Add LatLonShape and XYShape point query. (Ignacio Vera)
* LUCENE-8707: Add LatLonShape and XYShape distance query. (Ignacio Vera)
* LUCENE-9238: New XYPointField field and Queries for indexing, searching and sorting
cartesian points. (Ignacio Vera)
Improvements
---------------------
* LUCENE-9149: Increase data dimension limit in BKD. (Nick Knize)
* LUCENE-9102: Add maxQueryLength option to DirectSpellchecker. (Andy Webb via Bruno Roustant)
* LUCENE-9091: UnifiedHighlighter HTML escaping should only escape essentials (Nándor Mátravölgyi)
* LUCENE-9105: UniformSplit postings format detects corrupted index and better handles IO exceptions. (Bruno Roustant)
* LUCENE-9106: UniformSplit postings format allows extension of block/line serializers. (Bruno Roustant)
* LUCENE-9093: UnifiedHighlighter's LengthGoalBreakIterator has a new fragmentAlignment option to better center the
first match in the passage. Also the sizing point now pivots at the center of the first match term and not its left
edge. This yields Passages that won't be identical to the previous behavior. (Nándor Mátravölgyi, David Smiley)
* LUCENE-9153: Allow WhitespaceAnalyzer to set a maxTokenLength other than the default of 255
(Alan Woodward)
* LUCENE-9152: Improve line intersections with polygons when they are touching from the outside. (Ignacio Vera)
* LUCENE-9123: Add new JapaneseTokenizer constructors with discardCompoundToken option that controls whether
the tokenizer emits original (compound) tokens when the mode is not NORMAL. (Kazuaki Hiraga via Tomoko Uchida)
* LUCENE-9253: KoreanTokenizer now supports custom dictionaries(system, unknown). (Namgyu Kim)
* LUCENE-9171: QueryBuilder can now use BoostAttributes on input token streams to selectively
boost particular terms or synonyms in parsed queries. (Alessandro Benedetti, Alan Woodward)
* LUCENE-9298: Improve RAM accounting in BufferedUpdates when deleted doc IDs and terms are cleared. (Yu Binglei, Simon Willnauer)
Optimizations
---------------------
* LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
* LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
* LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a
single field to the same value. This optimization can reduce the flush time by around
20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, Simon Willnauer)
* LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, Robert Muir)
* LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)
* LUCENE-9260: LeafReader#checkIntegrity verifies checksums of CFS files.
(Adrien Grand)
* LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, Mike Drob)
* LUCENE-9113: Faster merging of SORTED/SORTED_SET doc values. (Adrien Grand)
* LUCENE-9125: Optimize Automaton.step() with binary search and introduce Automaton.next(). (Bruno Roustant)
* LUCENE-9147: The index of stored fields and term vectors in now off-heap.
(Adrien Grand)
Bug Fixes
---------------------
* LUCENE-9084: Fix potential deadlock due to circular synchronization in AnalyzingInfixSuggester (Paul Ward)
* LUCENE-9115: NRTCachingDirectory no longer caches files of unknown size.
(Adrien Grand)
* LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points are added to the writer.
(Ignacio Vera)
* LUCENE-9135: Make UniformSplit FieldMetadata counters long. (Bruno Roustant)
* LUCENE-9200: Fix TieredMergePolicy to use double (not float) math to make its merging decisions, fixing
a corner-case bug uncovered by fun randomized tests (Robert Muir, Mike McCandless)
* LUCENE-9099: Unordered and Ordered interval queries now correctly handle
repeated subterms - ordered intervals could supply an 'extra' minimized
interval, resulting in odd matches when combined with eg CONTAINS queries;
and unordered intervals would match duplicate subterms on the same position,
so an query for UNORDERED(foo, foo) would match a document containing 'foo'
only once. (Alan Woodward)
* LUCENE-9250: Add support for Circle2d#intersectsLine around the dateline. (Ignacio Vera)
* LUCENE-9243: Add fudge factor when creating a bounding box of a XYCircle. (Ignacio Vera)
* LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within distance. (Ignacio Vera)
* LUCENE-9251: Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon
were bot filtered out properly. (Ignacio Vera)
* LUCENE-9263: Fix wrong transformation of distance in meters to radians in Geo3DPoint. (Ignacio Vera)
Other
---------------------
* LUCENE-9109: Backport some changes from master (except StackWalker) to improve
TestSecurityManager (Uwe Schindler)
* LUCENE-9110: Backport refactored stack analysis in tests to use generalized
LuceneTestCase methods (Uwe Schindler)
* LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are
executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are
executed with input objects that extend such interface. (Ignacio Vera)
* LUCENE-9096: Simplification of CompressingTermVectorsWriter#flushOffsets.
(kkewwei via Adrien Grand)
* LUCENE-9225: Rectangle extends LatLonGeometry so it can be used in a geometry collection. (Ignacio Vera)
======================= Lucene 8.4.1 =======================
Bug Fixes
---------------------
(No changes)
======================= Lucene 8.4.0 =======================
API Changes
* LUCENE-9029: Deprecate SloppyMath toRadians/toDegrees in favor of Java Math.
(Jack Conradson via Adrien Grand)
New Features
* LUCENE-8620: Add CONTAINS support for LatLonShape and XYShape. (Ignacio Vera)
Improvements
* LUCENE-9002: Skip costly caching clause in LRUQueryCache if it makes the query
many times slower. (Guoqiang Jiang)
* LUCENE-9006: WordDelimiterGraphFilter's catenateAll token is now ordered before any token parts, like WDF did.
(David Smiley)
* LUCENE-9028: introducing Intervals.multiterm() (Mikhail Khludnev)
* LUCENE-9018: ConcatenateGraphFilter now has a configurable separator. (Stanislav Mikulchik, David Smiley)
* LUCENE-9036: ExitableDirectoryReader may interupt scaning over DocValues (Mikhail Khludnev)
* LUCENE-9062: QueryVisitor now has a consumeTermsMatching() method, allowing queries
that match a class of terms to pass a ByteRunAutomaton matching those that class
back to the visitor. (Alan Woodward, David Smiley)
* LUCENE-9073: IntervalQuery to respond field on toString() and explain() (Mikhail Khludnev)
Optimizations
* LUCENE-8928: When building a kd-tree for dimensions n > 2, compute exact bounds for an inner node every N splits
to improve the quality of the tree. N is defined by SPLITS_BEFORE_EXACT_BOUNDS which is set to 4.
(Ignacio Vera, Adrien Grand)
* BaseDirectoryReader no longer sums up the `LeafReader#numDocs` of its leaves
eagerly. This especially helps when creating views of readers that hide
documents, since computing the number of live documents is an expensive
operation. (Adrien Grand)
* LUCENE-8992: TopFieldCollector and TopScoreDocCollector can now share minimum scores across leaves
concurrently. (Adrien Grand, Atri Sharma, Jim Ferenczi)
* LUCENE-8932: BKDReader's index is now stored off-heap when the IndexInput is
an instance of ByteBufferIndexInput. (Jack Conradson via Adrien Grand)
* LUCENE-9024: IntroSelector now falls back to the median of medians algorithm
instead of sorting when the maximum recursion level is exceeded, providing
better worst-case runtime. (Paul Sanwald via Adrien Grand)
* LUCENE-8920: The denser arcs of FST now index labels with a bitset in order
to provide near constant time access. (Bruno Roustant, Mike Sokolov via Adrien Grand)
* LUCENE-9027: Use SIMD instructions to decode postings. (Adrien Grand)
* LUCENE-9049: Remove FST cached root arcs now redundant with labels indexed by bitset.
This frees some on-heap FST space. (Jack Conradson via Bruno Roustant)
* LUCENE-9045: Do not use TreeMap/TreeSet in BlockTree and PerFieldPostingsFormat. (Bruno Roustant)
Bug Fixes
* LUCENE-9001: Fix race condition in SetOnce. (Przemko Robakowski)
* LUCENE-9030: Fix WordnetSynonymParser behaviour so it behaves similar to
SolrSynonymParser. (Christoph Buescher via Alan Woodward)
* LUCENE-9054: Fix reproduceJenkinsFailures.py to not overwrite junit XML files when retrying (hossman)
* LUCENE-9031: UnsupportedOperationException on MatchesIterator.getQuery() (Alan Woodward, Mikhail Khludnev)
* LUCENE-8996: maxScore was sometimes missing from distributed grouped responses.
(Julien Massenet, Diego Ceccarelli, Munendra S N, Christine Poerschke)
* LUCENE-9055: Fix the detection of lines crossing triangles through edge points.
(Ignacio Vera)
* LUCENE-9103: Disjunctions can miss some hits in some rare conditions. (Adrien Grand)
Other
* LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - Part 2 (Koen De Groote)
* LUCENE-8994: Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). (Koen De Groote)
* LUCENE-8746: Refactor EdgeTree - Introduce a Component tree that represents the tree of components (e.g polygons).
Edge tree is now just a tree of edges. (Ignacio Vera)
* LUCENE-9046: Fix wrong example in Javadoc of TermInSetQuery (Namgyu Kim)
* LUCENE-8983: Add sandbox PhraseWildcardQuery to control multi-terms expansions in a phrase. (Bruno Roustant)
* LUCENE-9067: Polygon2D#contains() is now thread safe. (Ignacio Vera)
Build
* Upgrade forbiddenapis to version 2.7; upgrade Groovy to 2.4.17. (Uwe Schindler)
* LUCENE-9041: Upgrade ecj to 3.19.0 to fix sporadic precommit javadoc issues (Kevin Risden)
======================= Lucene 8.3.1 =======================
Bug Fixes
* LUCENE-9050: MultiTermIntervalsSource.visit() was not calling back to its
visitor. (Alan Woodward)
======================= Lucene 8.3.0 =======================
API Changes
* LUCENE-8909: IndexWriter#getFieldNames() method is used to get fields present in index. After LUCENE-8316, this
method is no longer required. Hence, deprecate IndexWriter#getFieldNames() method. (Adrien Grand, Munendra S N)
* LUCENE-8755: SpatialPrefixTreeFactory now consumes the "version" parsed with Lucene's Version class. The quad
and packed quad prefix trees are sensitive to this. It's recommended to pass the version like you
should do likewise for analysis components for tokenized text, or else changes to the encoding in future versions
may be incompatible with older indexes. (Chongchen Chen, David Smiley)
* LUCENE-8956: QueryRescorer now only sorts the first topN hits instead of all
initial hits. (Paul Sanwald via Adrien Grand)
* LUCENE-8921: IndexSearcher.termStatistics() no longer takes a TermStates; it takes the docFreq and totalTermFreq.
And don't call if docFreq <= 0. The previous implementation survives as deprecated and final. It's removed in 9.0.
(Bruno Roustant, David Smiley, Alan Woodward)
* LUCENE-8990: PointValues#estimateDocCount(visitor) estimates the number of documents that would be matched by
the given IntersectVisitor. THe method is used to compute the cost() of ScorerSuppliers instead of
PointValues#estimatePointCount(visitor). (Ignacio Vera, Adrien Grand)
New Features
* LUCENE-8936: Add SpanishMinimalStemFilter (vinod kumar via Tomoko Uchida)
* LUCENE-8764 LUCENE-8945: Add "export all terms and doc freqs" feature to Luke with delimiters. (Leonardo Menezes, Amish Shah via Tomoko Uchida)
* LUCENE-8747: Composite Matches from multiple subqueries now allow access to
their submatches, and a new NamedMatches API allows marking of subqueries
and a simple way to find which subqueries have matched on a given document
(Alan Woodward, Jim Ferenczi)
* LUCENE-8769: Introduce Range Query For Multiple Connected Ranges (Atri Sharma)
* LUCENE-8960: Introduce LatLonDocValuesPointInPolygonQuery for LatLonDocValuesField (Ignacio Vera)
* LUCENE-8753: New UniformSplitPostingsFormat (name "UniformSplit") primarily benefiting in simplicity and
extensibility. New STUniformSplitPostingsFormat (name "SharedTermsUniformSplit") that shares a single internal
term dictionary across fields. (Bruno Roustant, Juan Rodriguez, David Smiley)
Improvements
* LUCENE-8874: Show SPI names instead of class names in Luke Analysis tab. (Tomoko Uchida)
* LUCENE-8894: Add APIs to find SPI names for Tokenizer/CharFilter/TokenFilter factory classes. (Tomoko Uchida)
* LUCENE-8914: move the logic for discarding inner modes in FloatPointNearestNeighbor to the IntersectVisitor
so we take advantage of the change introduced in LUCENE-7862. (Ignacio Vera)
* LUCENE-8955: move the logic for discarding inner modes in LatLonPoint NearestNeighbor to the IntersectVisitor
so we take advantage of the change introduced in LUCENE-7862. (Ignacio Vera)
* LUCENE-8918: PhraseQuery throws exceptions at construction time if it is passed
null arguments. (Alan Woodward)
* LUCENE-8916: GraphTokenStreamFiniteStrings preserves all Token attributes
through its finite strings TokenStreams (Alan Woodward)
* LUCENE-8906: Expose Lucene50PostingsFormat.IntBlockTermState as public so that other postings formats can re-use it.
(Bruno Roustant)
* LUCENE-8942: Remove redundant parameters and improve visibility strictness in
LRUQueryCache (Atri Sharma)
* SOLR-13663: Introduce <SpanPositionRange> into XML Query Parser (Alessandro Benedetti via Mikhail Khludnev)
* LUCENE-8952: Use a sort key instead of true distance in NearestNeighbor (Julie Tibshirani).
* LUCENE-8620: Tessellator labels the edges of the generated triangles whether they belong to
the original polygon. This information is added to the triangle encoding. (Ignacio Vera)
* LUCENE-8964: Fix geojson shape parsing on string arrays in properties
(Alexander Reelsen)
* LUCENE-8976: Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor. (Ignacio Vera)
* LUCENE-8966: The Korean analyzer now splits tokens on boundaries between digits and alphabetic characters. (Jim Ferenczi)
Optimizations
* LUCENE-8922: DisjunctionMaxQuery more efficiently leverages impacts to skip
non-competitive hits. (Adrien Grand)
* LUCENE-8935: BooleanQuery with no scoring clause can now early terminate the query when
the total hits is not requested. (Jim Ferenczi)
* LUCENE-8941: Matches on wildcard queries will defer building their full
disjunction until a MatchesIterator is pulled (Alan Woodward)
* LUCENE-8755: spatial-extras quad and packed quad prefix trees now index points faster.
(Chongchen Chen, David Smiley)
* LUCENE-8860: add additional leaf node level optimizations in LatLonShapeBoundingBoxQuery.
(Igor Motov via Ignacio Vera)
* LUCENE-8968: Improve performance of WITHIN and DISJOINT queries for Shape queries by
doing just one pass whenever possible. (Ignacio Vera)
* LUCENE-8939: Introduce shared count based early termination across multiple slices
(Atri Sharma)
* LUCENE-8980: Blocktree's seekExact now short-circuits false if the term isn't in the min-max range of the segment.
Large perf gain for ID/time like data when populated sequentially. (Guoqiang Jiang)
Bug Fixes
* LUCENE-8755: spatial-extras quad and packed quad prefix trees could throw a
NullPointerException for certain cell edge coordinates (Chongchen Chen, David Smiley)
* LUCENE-9005: BooleanQuery.visit() would pull subVisitors from its parent visitor, rather
than from a visitor for its own specific query. This could cause problems when BQ was
nested under another BQ. Instead, we now pull a MUST subvisitor, pass it to any MUST
subclauses, and then pull SHOULD, MUST_NOT and FILTER visitors from it rather than from
the parent. (Alan Woodward)
Other
* LUCENE-8778 LUCENE-8911 LUCENE-8957: Define analyzer SPI names as static final fields and document the names in Javadocs.
(Tomoko Uchida, Uwe Schindler)
* LUCENE-8758: QuadPrefixTree: removed levelS and levelN fields which weren't used. (Amish Shah)
* LUCENE-8975: Code Cleanup: Use entryset for map iteration wherever possible. (Koen De Groote)
* LUCENE-8993, LUCENE-8807: Changed all repository and download references in build files
to HTTPS. (Uwe Schindler)
* LUCENE-8998: Fix OverviewImplTest.testIsOptimized reproducible failure. (Tomoko Uchida)
* LUCENE-8999: LuceneTestCase.expectThrows now propogates assert/assumption failures up to the test
w/o wrapping in a new assertion failure unless the caller has explicitly expected them (hossman)
* LUCENE-8062: GlobalOrdinalsWithScoreQuery is no longer eligible for query caching. (Jim Ferenczi)
======================= Lucene 8.2.0 =======================
API Changes
* LUCENE-8865: IndexSearcher now uses Executor instead of ExecutorSerivce.
This change is fully backwards compatible since ExecutorService directly
implements Executor. (Simon Willnauer)
* LUCENE-8856: Intervals queries have moved from the sandbox to the queries
module. (Alan Woodward)
* LUCENE-8893: Intervals.wildcard() and Intervals.prefix() methods now take
BytesRef rather than String. (Alan Woodward)
New Features
* LUCENE-8632: New XYShape Field and Queries for indexing and searching general cartesian
geometries. (Nick Knize)
* LUCENE-8891: Snowball stemmer/analyzer for the Estonian language.
(Gert Morten Paimla via Tomoko Uchida)
* LUCENE-8815: Provide a DoubleValues implementation for retrieving the value of features without
requiring a separate numeric field. Note that as feature values are stored with only 8 bits of
mantissa the values returned may have a delta from the original values indexed.
(Colin Goodheart-Smithe via Adrien Grand)
* LUCENE-8803: Provide a FeatureSortfield to allow sorting search hits by descending value of a
feature. This is exposed via the factory method FeatureField#newFeatureSort.
(Colin Goodheart-Smithe via Adrien Grand)
* LUCENE-8784: The KoreanTokenizer now preserves punctuations if discardPunctuation is set
to false (defaults to true).
(Namgyu Kim via Jim Ferenczi)
* LUCENE-8812: Add new KoreanNumberFilter that can change Hangul character to number
and process decimal point. It is similar to the JapaneseNumberFilter.
(Namgyu Kim)
* LUCENE-8362: Add doc-value support to range fields. (Atri Sharma via Adrien Grand)
* LUCENE-8766: Add monitor subproject (previously Luwak monitoring library). This
allows a stream of documents to be matched against a set of registered queries
in an efficient manner, for use as a monitoring or classification tool.
(Alan Woodward)
* LUCENE-7714: Add a numeric range query in sandbox that takes advantage of index sorting.
(Julie Tibshirani via Jim Ferenczi)
* LUCENE-8859: The completion suggester's postings format now have an option to
load its internal FST off-heap. (Jim Ferenczi)
Bug Fixes
* LUCENE-8831: Fixed LatLonShapeBoundingBoxQuery .hashCode methods. (Ignacio Vera)
* LUCENE-8775: Improve tessellator to handle better cases where a hole share a vertex
with the polygon. (Ignacio Vera)
* LUCENE-8785: Ensure new threadstates are locked before retrieving the number of active threadstates.
This causes assertion errors and potentially broken field attributes in the IndexWriter when
IndexWriter#deleteAll is called while actively indexing. (Simon Willnauer)
* LUCENE-8804: Forbid calls to putAttribute on frozen FieldType instances.
(Vamshi Vijay Nakkirtha via Adrien Grand)
* LUCENE-8828: Removes the buggy 'disallow overlaps' boolean from Intervals.unordered(),
and replaces it with a new Intervals.unorderedNoOverlaps() method (Alan Woodward)
* LUCENE-8843: Don't ignore exceptions that are thrown when trying to open a
file in IOUtils#fsync. (Jason Tedor via Adrien Grand)
* LUCENE-8835: FileSwitchDirectory now respects the file extension when listing directory
contents to ensure we don't expose pending deletes if both directory point to the same
underlying filesystem directory. (Simon Willnauer)
* LUCENE-8853: FileSwitchDirectory now applies best effort to place tmp files in the same
directory as the target files. (Simon Willnauer)
* LUCENE-8892: Add missing closing parentheses in MultiBoolFunction's description() (Florian Diebold, Munendra S N)
Improvements
* LUCENE-7840: Non-scoring BooleanQuery now removes SHOULD clauses before building the scorer supplier
as opposed to eliminating them during scoring construction. (Atri Sharma via Jim Ferenczi)
* LUCENE-8770: BlockMaxConjunctionScorer now leverages two-phase iterators in order to avoid
executing the second phase when scorers don't intersect. (Adrien Grand, Jim Ferenczi)
* LUCENE-8818: Fix smokeTestRelease.py encoding bug (janhoy)
* LUCENE-8845: Allow Intervals.prefix() and Intervals.wildcard() to specify
their maximum allowed expansions (Alan Woodward)
* LUCENE-8875: Introduce a Collector optimized for use cases when large
number of hits are requested (Atri Sharma)
* LUCENE-8848 LUCENE-7757 LUCENE-8492: The UnifiedHighlighter now detects that parts of the query are not understood by
it, and thus it should not make optimizations that result in no highlights or slow highlighting. This generally works
best for WEIGHT_MATCHES mode. Consequently queries produced by ComplexPhraseQueryParser and the surround QueryParser
will now highlight correctly. (David Smiley)
* LUCENE-8793: Luke enhanced UI for CustomAnalyzer: show detailed analysis steps. (Jun Ohtani via Tomoko Uchida)
* LUCENE-8855: Add Accountable to some Query implementations (ab, Adrien Grand)
Optimizations
* LUCENE-8796: Use exponential search instead of binary search in
IntArrayDocIdSet#advance method (Luca Cavanna via Adrien Grand)
* LUCENE-8865: Use incoming thread for execution if IndexSearcher has an executor.
Now caller threads execute at least one search on an index even if there is
an executor provided to minimize thread context switching. (Simon Willnauer)
* LUCENE-8868: New storing strategy for BKD tree leaves with low cardinality.
It stores the distinct values once with the cardinality value reducing the
storage cost. (Ignacio Vera)
* LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored
on leaves. (Ignacio Vera)
* LUCENE-8896: Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[])
for several queries. (Ignacio Vera)
* LUCENE-8901: Load frequencies lazily only when needed in BlockDocsEnum and
BlockImpactsEverythingEnum (Mayya Sharipova).
* LUCENE-8888: Optimize distribution of points with data dimensions in
BKD tree leaves. (Ignacio Vera)
* LUCENE-8311: Phrase queries now leverage impacts. (Adrien Grand)
Test Framework
* LUCENE-8825: CheckHits now display the shard index in case of mismatch
between top hits. (Atri Sharma via Adrien Grand)
Other
* LUCENE-8847: Code Cleanup: Remove StringBuilder.append with concatenated
strings. (Koen De Groote via Uwe Schindler)
* LUCENE-8861: Script to find open Github PRs that needs attention (janhoy)
* LUCENE-8852: ReleaseWizard tool for release managers (janhoy)
* LUCENE-8838: Remove support for Steiner points on Tessellator. (Ignacio Vera)
* LUCENE-8879: Improve BKDRadixSelector tests. (Ignacio Vera)
* LUCENE-8886: Fix TestMutablePointsReaderUtils tests. (Ignacio Vera)
======================= Lucene 8.1.1 =======================
(No Changes)
Improvements
* LUCENE-8781: FST lookup performance has been improved in many cases by
encoding Arcs using full-sized arrays with gaps. The new encoding is
enabled for postings in the default codec and for suggesters. (Mike Sokolov)
======================= Lucene 8.1.0 =======================
API Changes
* LUCENE-3041: A query introspection API has been added. Queries should
implement a visit() method, taking a QueryVisitor, and either pass the
visitor down to any child queries, or call a visitX() or consumeX() method
on it. All locations in the code that called Weight.extractTerms()
have been changed to use this API, and the extractTerms() method has
been deprecated. (Alan Woodward, Simon Willnauer, David Smiley, Luca
Cavanna)
* LUCENE-8735: Directory.getPendingDeletions is now abstract to ensure
subclasses override it. FilterDirectory now delegates the call, ensuring
correct default behaviour for subclasses. (Henning Andersen)
New Features
* LUCENE-2562: The well-known graphical user interface for inspecting Lucene
indexes "Luke" was added as a Lucene module. It can be started from the
binary distribution by calling the shell scripts in the module folder
or from the source checkout by using `ant -f lucene/luke/build.xml run`.
Luke provides a Swing-based user interface and can be used to open
Lucene or Solr (or Elasticsearch) indexes, inspect documents, check index
commits and segments, or test (custom) analyzers. It also has maintenance
functions to check index structures and force merge indexes for archival.
Luke was originally developed by Andrzej Bialecki, later maintained by
Dmitry Kan and finally rewritten by Tomoko Uchida to use the ASF licensing
compatible Swing framework (as shipped with JDKs).
(Tomoko Uchida, Uwe Schindler)
Bug fixes
* LUCENE-8736: LatLonShapePolygonQuery returns incorrect WITHIN results
with shared boundaries. Point in Polygon now correctly includes boundary
points. Box and Polygon relations with triangles have also been improved to
correctly include boundary points. (Nick Knize)
* LUCENE-8712: Polygon2D does not detect crossings through segment edges.
(Ignacio Vera)
* LUCENE-8720: NameIntCacheLRU (in the facets module) had an int
overflow bug that disabled cleaning of the cache (Russell A Brown)
* LUCENE-8726: ValueSource.asDoubleValuesSource() could leak a reference to
IndexSearcher (Alan Woodward, Yury Pakhomov)
* LUCENE-8719: FixedShingleFilter can miss shingles at the end of a token stream if
there are multiple paths with different lengths. (Alan Woodward)
* LUCENE-8688: TieredMergePolicy#findForcedMerges now tries to create the
cheapest merges that allow the index to go down to `maxSegmentCount` segments
or less. (Armin Braun via Adrien Grand)
* LUCENE-8477: Interval disjunctions could miss valid hits if some of the
clauses of the disjunction are minimized away. We now rewrite intervals
if a source contains a disjunction and the internal gaps matter for
matching. This behaviour can be disabled if users are more interested
in speed rather than accuracy of matching. (Alan Woodward, Jim Ferenczi)
* LUCENE-8741: ValueSource.fromDoubleValuesSource() was casting to
Scorer instead of Scorable, leading to ClassCastExceptions (Markus Jelsma,
Alan Woodward)
* LUCENE-8754: Fix ConcurrentModificationException in SegmentInfo if
attributes are accessed in MergePolicy while the merge is running (Simon Willnauer)
* LUCENE-8765: Fixed validation of the number of added points in KD trees.
(Zhao Yang via Adrien Grand)
Improvements
* LUCENE-8673: Use radix partitioning when merging dimensional points instead
of sorting all dimensions before hand. (Ignacio Vera, Adrien Grand)
* LUCENE-8687: Optimise radix partitioning for points on heap. (Ignacio Vera)
* LUCENE-8699: Change HeapPointWriter to use a single byte array instead to a list
of byte arrays. In addition a new interface PointValue is added to abstract out
the different formats between offline and on-heap writers. (Ignacio Vera)
* LUCENE-8703: Build point writers in the BKD tree only when they are needed.
(Ignacio Vera)
* LUCENE-8652: SynonymQuery can now deboost the document frequency of each term when
blending the score of the synonym. (Jim Ferenczi)
* LUCENE-8631: The Korean's user dictionary now picks the longest-matching word and discards
the other matches. (Yeongsu Kim via Jim Ferenczi)
* LUCENE-8732: ConstantScoreQuery can now early terminate the query if the minimum score is
greater than the constant score and total hits are not requested. (Jim Ferenczi)
* LUCENE-8750: Implements setMissingValue() on sort fields produced from
DoubleValuesSource and LongValuesSource (Mike Sokolov via Alan Woodward)
* LUCENE-8701: ToParentBlockJoinQuery now creates a child scorer that disallows skipping over
non-competitive documents if the score of a parent depends on the score of multiple
children (avg, max, min). Additionally the score mode `none` that assigns a constant score to
each parent can early terminate top scores's collection. (Jim Ferenczi)
* LUCENE-8751: Weight#matches now use the ScorerSupplier to build scorers with a lead cost of 1
(single document). (Jim Ferenczi)
* LUCENE-8752: Japanese new era name '令和' (Reiwa) is added to the dictionary used in
JapaneseTokenizer so that the analyzer handles the era name correctly.
Reiwa is set to replace the Heisei Era on May 1, 2019. (Tomoko Uchida)
* LUCENE-8671: Introduced reader attributes allows a per IndexReader configuration
of codec internals. This enables a per reader configuration if FSTs are on- or off-heap on a
per field basis (Simon Willnauer)
* LUCENE-8787: spatial-extras DateRangePrefixTree used to only parse ISO-8601 timestamps with 0 or 3
digits of milliseconds precision but now parses other lengths (although > 3 not used).
(Thomas Lemmé via David Smiley)
Changes in Runtime Behavior
* LUCENE-8671: Load FST off-heap also for ID-like fields if reader is not opened
from an IndexWriter. (Simon Willnauer)
* LUCENE-8730: WordDelimiterGraphFilter always emits its original token first. This
brings its behaviour into line with the deprecated WordDelimiterFilter, so that
the only difference in output between the two is in the position length
attribute. (Alan Woodward, Jim Ferenczi)
* LUCENE-7386: Disjunctions nested in disjunctions are now flattened. This might
trigger changes in the produced scores due to changes to the order in which
scores of sub clauses are summed up. (Adrien Grand)
* LUCENE-8756: MoreLikeThisQuery now respects custom term frequencies
(TermFrequencyAttribute) at search time (Olli Kuonanoja)
Other
* LUCENE-8680: Refactor EdgeTree#relateTriangle method. (Ignacio Vera)
* LUCENE-8685: Refactor LatLonShape tests. (Ignacio Vera)
* LUCENE-8713: Add Line2D tests. (Ignacio Vera)
* LUCENE-8729: Workaround: Disable accessibility doclints (Java 13+),
so compilation with recent JDK succeeds. (Uwe Schindler)
* LUCENE-8725: Make TermsQuery.SeekingTermSetTermsEnum a top level class and public (noble)
======================= Lucene 8.0.0 =======================
API Changes
* LUCENE-8662: TermsEnum.seekExact(BytesRef) to abstract and delegate seekExact(BytesRef)
in FilterLeafReader.FilterTermsEnum. (Jeffery Yuan via Tomás Fernández Löbbe, Simon Willnauer)
* LUCENE-8469: Deprecated StringHelper.compare has been removed. (Dawid Weiss)
* LUCENE-8039: Introduce a "delta distance" method set to GeoDistance. This
allows distance calculations, especially for paths, to take into account an
"excursion" to include the specified point.
* LUCENE-8007: Index statistics Terms.getSumDocFreq(), Terms.getDocCount() are
now required to be stored by codecs. Additionally, TermsEnum.totalTermFreq()
and Terms.getSumTotalTermFreq() are now required: if frequencies are not
stored they are equal to TermsEnum.docFreq() and Terms.getSumDocFreq(),
respectively, because all freq() values equal 1. (Adrien Grand, Robert Muir)
* LUCENE-8038: Deprecated PayloadScoreQuery constructors have been removed (Alan
Woodward)
* LUCENE-8014: Similarity.computeSlopFactor() and
Similarity.computePayloadFactor() have been removed (Alan Woodward)
* LUCENE-7996: Queries are now required to produce positive scores.
(Adrien Grand)
* LUCENE-8099: CustomScoreQuery, BoostedQuery and BoostingQuery have been
removed (Alan Woodward)
* LUCENE-8012: Explanation now takes Number rather than float (Alan Woodward,
Robert Muir)
* LUCENE-8116: SimScorer now only takes a frequency and a norm as per-document
scoring factors. (Adrien Grand)
* LUCENE-8113: TermContext has been renamed to TermStates, and can now be
constructed lazily if term statistics are not required (Alan Woodward)
* LUCENE-8242: Deprecated method IndexSearcher#createNormalizedWeight() has
been removed (Alan Woodward)
* LUCENE-8267: Memory codecs removed from the codebase (MemoryPostings,
MemoryDocValues). (Dawid Weiss)
* LUCENE-8144: Moved QueryCachingPolicy.ALWAYS_CACHE to the test framework.
(Nhat Nguyen via Adrien Grand)
* LUCENE-8356: StandardFilter and StandardFilterFactory have been removed
(Alan Woodward)
* LUCENE-8373: StandardAnalyzer.ENGLISH_STOP_WORD_SET has been removed
(Alan Woodward)
* LUCENE-8388: Unused PostingsEnum#attributes() method has been removed
(Alan Woodward)
* LUCENE-8405: TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector
no longer have an option to compute the maximum score when sorting by field.
(Adrien Grand)
* LUCENE-8411: TopFieldCollector no longer takes a fillFields option, it now
always fills fields. (Adrien Grand)
* LUCENE-8412: TopFieldCollector no longer takes a trackDocScores option. Scores
need to be set on top hits via TopFieldCollector#populateScores instead.
(Adrien Grand)
* LUCENE-6228: A new Scorable abstract class has been added, containing only those
methods from Scorer that should be called from Collectors. LeafCollector.setScorer()
now takes a Scorable rather than a Scorer. (Alan Woodward, Adrien Grand)
* LUCENE-8475: Deprecated constants have been removed from RamUsageEstimator.
(Dimitrios Athanasiou)
* LUCENE-8483: Scorers may no longer take null as a Weight (Alan Woodward)
* LUCENE-8352: TokenStreamComponents is now final, and can take a Consumer<Reader>
in its constructor (Mark Harwood, Alan Woodward, Adrien Grand)
* LUCENE-8498: LowerCaseTokenizer has been removed, and CharTokenizer no longer
takes a normalizer function. (Alan Woodward)
* LUCENE-7875: Moved MultiFields static methods out of the class. getLiveDocs is now
in MultiBits which is now public. getMergedFieldInfos and getIndexedFields are now in
FieldInfos. getTerms is now in MultiTerms. getTermPositionsEnum and getTermDocsEnum
were collapsed and renamed to just getTermPostingsEnum and moved to MultiTerms.
(David Smiley)
* LUCENE-8513: MultiFields.getFields is now removed. Please avoid this class,
and Fields in general, when possible. (David Smiley)
* LUCENE-8497: MultiTermAwareComponent has been removed, and in its place
TokenFilterFactory and CharFilterFactory now expose type-safe normalize()
methods. This decouples normalization from tokenization entirely.
(Mayya Sharipova, Alan Woodward)
* LUCENE-8597: IntervalIterator now exposes a gaps() method that reports the
number of gaps between its component sub-intervals. This can be used in a
new filter available via Intervals.maxgaps(). (Alan Woodward)
* LUCENE-8609: Remove IndexWriter#numDocs() and IndexWriter#maxDoc() in favor
of IndexWriter#getDocStats(). (Simon Willnauer)
* LUCENE-8292: Make TermsEnum fully abstract. (Simon Willnauer)
Changes in Runtime Behavior
* LUCENE-8333: Switch MoreLikeThis.setMaxDocFreqPct to use maxDoc instead of
numDocs. (Robert Muir, Dawid Weiss).
* LUCENE-7837: Indices that were created before the previous major version
will now fail to open even if they have been merged with the previous major
version. (Adrien Grand)
* LUCENE-8020: Similarities are no longer passed terms that don't exist by
queries such as SpanOrQuery, so scoring formulas no longer require
divide-by-zero hacks. IndexSearcher.termStatistics/collectionStatistics return null
instead of returning bogus values for a non-existent term or field. (Robert Muir)
* LUCENE-7996: FunctionQuery and FunctionScoreQuery now return a score of 0
when the function produces a negative value. (Adrien Grand)
* LUCENE-8116: Similarities now score fields that omit norms as if the norm was
1. This might change score values on fields that omit norms. (Adrien Grand)
* LUCENE-8134: Index options are no longer automatically downgraded.
(Adrien Grand)
* LUCENE-8031: Length normalization correctly reflects omission of term frequencies.
(Robert Muir, Adrien Grand)
* LUCENE-7444: StandardAnalyzer no longer defaults to removing English stopwords
(Alan Woodward)
* LUCENE-8060: IndexSearcher's search and searchAfter methods now only compute
total hit counts accurately up to 1,000 in order to enable top-hits
optimizations such as block-max WAND (LUCENE-8135). (Adrien Grand)
* LUCENE-8505: IndexWriter#addIndices will now fail if the target index is sorted but
the candidate is not. (Jim Ferenczi)
* LUCENE-8535: Highlighter and FVH doesn't support ToParent and ToChildBlockJoinQuery out of the
box anymore. In order to highlight on Block-Join Queries a custom WeightedSpanTermExtractor / FieldQuery
should be used. (Simon Willnauer, Jim Ferenczi, Julie Tibshirani)
* LUCENE-8563: BM25 scores don't include the (k1+1) factor in their numerator
anymore. This doesn't affect ordering as this is a constant factor which is
the same for every document. (Luca Cavanna via Adrien Grand)
* LUCENE-8509: WordDelimiterGraphFilter will no longer set the offsets of internal
tokens by default, preventing a number of bugs when the filter is chained with
tokenfilters that change the length of their tokens (Alan Woodward)
* LUCENE-8633: IntervalQuery scores do not use term weighting any more, the score
is instead calculated as a function of the sloppy frequency of the matching
intervals. (Alan Woodward, Jim Ferenczi)
* LUCENE-8635: FSTs can now remain off-heap, accessed via
IndexInput, and the default codec's term dictionary
(BlockTreeTermsReader) will now leave the FST for the terms index
off-heap for non-primary-key fields using MMapDirectory, reducing
heap usage for such fields. (Ankit Jain)
New Features
* LUCENE-8340: LongPoint#newDistanceFeatureQuery may be used to boost scores based on
how close a value of a long field is from an configurable origin. This is
typically useful to boost by recency. (Adrien Grand)
* LUCENE-8482: LatLonPoint#newDistanceFeatureQuery may be used to boost scores
based on the haversine distance of a LatLonPoint field to a provided point. This is
typically useful to boost by distance. (Ignacio Vera)
* LUCENE-8216: Added a new BM25FQuery in sandbox to blend statistics across several fields
using the BM25F formula. (Adrien Grand, Jim Ferenczi)
* LUCENE-8564: GraphTokenFilter is an abstract class useful for token filters that need
to read-ahead in the token stream and take into account graph structures. This
also changes FixedShingleFilter to extend GraphTokenFilter (Alan Woodward)
* LUCENE-8612: Intervals.extend() treats an interval as if it covered a wider
span than it actually does, allowing users to force minimum gaps between
intervals in a phrase. (Alan Woodward)
* LUCENE-8629: New interval functions: Intervals.before(), Intervals.after(),
Intervals.within() and Intervals.overlapping(). (Alan Woodward)
* LUCENE-8622: Adds a minimum-should-match interval function that produces intervals
spanning a subset of a set of sources. (Alan Woodward)
* LUCENE-8645: Intervals.fixField() allows you to report intervals from one field
as if they came from another. (Alan Woodward)
* LUCENE-8646: New interval functions: Intervals.prefix() and Intervals.wildcard()
(Alan Woodward)
* LUCENE-8655: Add a getter in FunctionScoreQuery class in order to access to the
underlying DoubleValuesSource. (Gérald Quaire via Alan Woodward)
* LUCENE-8697: GraphTokenStreamFiniteStrings correctly handles side paths
containing gaps (Alan Woodward)
* LUCENE-8702: Simplify intervals returned from vararg Intervals factory methods
(Alan Woodward)
Improvements
* LUCENE-7997: Add BaseSimilarityTestCase to sanity check similarities.
SimilarityBase switches to 64-bit doubles internally to help avoid common numeric issues.
Add missing range checks for similarity parameters.
Improve BM25 and ClassicSimilarity's explanations. (Robert Muir)
* LUCENE-8011: Improved similarity explanations.
(Mayya Sharipova via Adrien Grand)
* LUCENE-4198: Codecs now have the ability to index score impacts.
(Adrien Grand)
* LUCENE-8135: Boolean queries now implement the block-max WAND algorithm in
order to speed up selection of top scored documents. (Adrien Grand)
* LUCENE-8279: CheckIndex now cross-checks terms with norms. (Adrien Grand)
* LUCENE-8660: TopDocsCollectors now return an accurate count (instead of a lower bound)
if the total hit count is equal to the provided threshold. (Adrien Grand, Jim Ferenczi)
Optimizations
* LUCENE-8040: Optimize IndexSearcher.collectionStatistics, avoiding MultiFields/MultiTerms
(David Smiley, Robert Muir)
* LUCENE-4100: Disjunctions now support faster collection of top hits when the
total hit count is not required. (Stefan Pohl, Adrien Grand, Robert Muir)
* LUCENE-7993: Phrase queries are now faster if total hit counts are not
required. (Adrien Grand)
* LUCENE-8109: Boolean queries propagate information about the minimum
competitive score in order to make collection faster if there are disjunctions
or phrase queries as sub queries, which know how to leverage this information
to run faster. (Adrien Grand)
* LUCENE-8439: Disjunction max queries can skip blocks to select the top documents
if the total hit count is not required. (Jim Ferenczi, Adrien Grand)
* LUCENE-8204: Boolean queries with a mix of required and optional clauses are
now faster if the total hit count is not required. (Jim Ferenczi, Adrien Grand)
* LUCENE-8448: Boolean queries now propagates the mininum score to their sub-scorers.
(Jim Ferenczi, Adrien Grand)
* LUCENE-8511: MultiFields.getIndexedFields is now optimized; does not call getMergedFieldInfos
(David Smiley)
* LUCENE-8507: TopFieldCollector can now update the minimum competitive score if the primary sort
is by relevancy and the total hit count is not required. (Jim Ferenczi)
* LUCENE-8464: ConstantScoreScorer now implements setMinCompetitveScore in order
to early terminate the iterator if the minimum score is greater than the constant
score. (Christophe Bismuth via Jim Ferenczi)
* LUCENE-8607: MatchAllDocsQuery can shortcut when total hit count is not
required (Alan Woodward, Adrien Grand)
* LUCENE-8585: Index-time jump-tables for DocValues, for O(1) advance when retrieving doc values.
(Toke Eskildsen, Adrien Grand)
======================= Lucene 7.7.2 =======================
Bug fixes
* LUCENE-8726: ValueSource.asDoubleValuesSource() could leak a reference to
IndexSearcher (Alan Woodward, Yury Pakhomov)
* LUCENE-8735: FilterDirectory.getPendingDeletions now forwards to the delegate
even the method is not abstract in the super class. This prevents issues
where our best effort in carrying on generations in the IndexWriter since pending
deletions are swallowed by the FilterDirectory. (Henning Andersen, Simon Willnauer)
* LUCENE-8688: TieredMergePolicy#findForcedMerges now tries to create the
cheapest merges that allow the index to go down to `maxSegmentCount` segments
or less. (Armin Braun via Adrien Grand)
* LUCENE-8785: Ensure new threadstates are locked before retrieving the number of active threadstates.
This causes assertion errors and potentially broken field attributes in the IndexWriter when
IndexWriter#deleteAll is called while actively indexing. (Simon Willnauer)
* LUCENE-8720: NameIntCacheLRU (in the facets module) had an int
overflow bug that disabled cleaning of the cache (Russell A Brown)
* LUCENE-8809: Refresh and rollback concurrently can leave segment states unclosed (Nhat Nguyen)
======================= Lucene 7.7.1 =======================
(No Changes)
======================= Lucene 7.7.0 =======================
Changes in Runtime Behavior
* LUCENE-8527: StandardTokenizer and UAX29URLEmailTokenizer now support Unicode 9.0,
and provide Unicode UTS#51 v11.0 Emoji tokenization with the "<EMOJI>" token type.
Build
* LUCENE-8611: Update randomizedtesting to 2.7.2, JUnit to 4.12, add hamcrest-core
dependency. (Dawid Weiss)
* LUCENE-8537: ant test command fails under lucene/tools (Peter Somogyi)
Bug fixes:
* LUCENE-8669: Fix LatLonShape WITHIN queries that fail with Multiple search Polygons
that share the dateline. (Nick Knize)
* LUCENE-8603: Fix the inversion of right ids for additional nouns in the Korean user dictionary.
(Yoo Jeongin via Jim Ferenczi)
* LUCENE-8624: int overflow in ByteBuffersDataOutput.size(). (Mulugeta Mammo,
Dawid Weiss)
* LUCENE-8625: int overflow in ByteBuffersDataInput.sliceBufferList. (Mulugeta Mammo,
Dawid Weiss)
* LUCENE-8639: Newly created threadstates while flushing / refreshing can cause duplicated
sequence IDs on IndexWriter. (Simon Willnauer)
* LUCENE-8649: LatLonShape's within and disjoint queries can return false positives with
indexed multi-shapes. (Ignacio Vera)
* LUCENE-8654: Polygon2D#relateTriangle returns the wrong answer if polygon is inside
the triangle. (Ignacio Vera)
* LUCENE-8650: ConcatenatingTokenStream did not correctly clear its state in reset(), and
was not propagating final position increments from its child streams correctly.
(Dan Meehl, Alan Woodward)
* LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused
by a big buffer (1024 chars). (Jim Ferenczi)
New Features
* LUCENE-8026: ExitableDirectoryReader may now time out queries that run on
points such as range queries or geo queries.
(Christophe Bismuth via Adrien Grand)
* LUCENE-8508: IndexWriter can now set the created version via
IndexWriterConfig#setIndexCreatedVersionMajor. This is an expert feature.
(Adrien Grand)
* LUCENE-8601: Attributes set in the IndexableFieldType for each field during indexing will
now be recorded into the corresponding FieldInfo's attributes, accessible at search
time (Murali Krishna P)
Improvements
* LUCENE-8463: TopFieldCollector can now early-terminates queries when sorting by SortField.DOC.
(Christophe Bismuth via Jim Ferenczi)
* LUCENE-8562: Speed up merging segments of points with data dimensions by only sorting on the indexed
dimensions. (Ignacio Vera)
* LUCENE-8529: TopSuggestDocsCollector will now use the completion key to tiebreak completion
suggestion with identical scores. (Jim Ferenczi)
* LUCENE-8575: SegmentInfos#toString now includes attributes and diagnostics.
(Namgyu Kim via Adrien Grand)
* LUCENE-8548: The KoreanTokenizer no longer splits unknown words on combining diacritics and
detects script boundaries more accurately with Character#UnicodeScript#of.
(Christophe Bismuth, Jim Ferenczi)
* LUCENE-8581: Change LatLonShape encoding to use 4 bytes Per Dimension.
(Ignacio Vera, Nick Knize, Adrien Grand)
* LUCENE-8527: Upgrade JFlex dependency to 1.7.0; in StandardTokenizer and UAX29URLEmailTokenizer,
increase supported Unicode version from 6.3 to 9.0, and support Unicode UTS#51 v11.0 Emoji tokenization.
* LUCENE-8640: Date Range format validation (Lucky Sharma, David Smiley via Mikhail Khludnev)
Optimizations
* LUCENE-8552: FieldInfos.getMergedFieldInfos no longer does any merging if there is <= 1 segment.
(Christophe Bismuth via David Smiley)
* LUCENE-8590: BufferedUpdates now uses an optimized storage for buffering docvalues updates that
can safe up to 80% of the heap used compared to the previous implementation and uses non-object
based datastructures. (Simon Willnauer, Mike McCandless, Shai Erera, Adrien Grand)
* LUCENE-8598: Moved to the default accepted overhead ratio for packet ints in DocValuesFieldUpdats
yields an up-to 4x performance improvement when applying doc values updates. (Simon Willnauer, Adrien Grand)
* LUCENE-8599: Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates.
(Simon Willnauer, Adrien Grand)
* LUCENE-8600: Doc-value updates get applied faster by sorting with quicksort,
rather than an in-place mergesort, which needs to perform fewer swaps.
(Adrien Grand)
* LUCENE-8623: Decrease I/O pressure when merging high dimensional points. (Ignacio Vera)
Test Framework
* LUCENE-8604: TestRuleLimitSysouts now has an optional "hard limit" of bytes that can be written
to stderr and stdout (anything beyond the hard limit is ignored). The default hard limit is 2 GB of
logs per test class. (Dawid Weiss)
Other
* LUCENE-8573: BKDWriter now uses FutureArrays#mismatch to compute shared prefixes.
(Christoph Büscher via Adrien Grand)
* LUCENE-8605: Separate bounding box spatial logic from query logic on LatLonShapeBoundingBoxQuery.
(Ignacio Vera)
* LUCENE-8609: Deprecated IndexWriter#numDocs() and IndexWriter#maxDoc() in favor of IndexWriter#getDocStats()
that allows to get consistent numDocs and maxDoc stats that are not subject to concurrent changes.
(Simon Willnauer, Nhat Nguyen)
======================= Lucene 7.6.0 =======================
Build
* LUCENE-8504: Upgrade forbiddenapis to version 2.6. (Uwe Schindler)
* LUCENE-8493: Stop publishing insecure .sha1 files with releases (janhoy)
Bug fixes
* LUCENE-8479: QueryBuilder#analyzeGraphPhrase now throws TooManyClause exception
if the number of expanded path reaches the BooleanQuery#maxClause limit. (Jim Ferenczi)
* LUCENE-8522: throw InvalidShapeException when constructing a polygon and
all points are coplanar. (Ignacio Vera)
* LUCENE-8531: QueryBuilder#analyzeGraphPhrase now creates one phrase query per finite strings
in the graph if the slop is greater than 0. Span queries cannot be used in this case because
they don't handle slop the same way than phrase queries. (Steve Rowe, Uwe Schindler, Jim Ferenczi)
* LUCENE-8524: Add the Hangul Letter Araea (interpunct) as a separator in Nori's tokenizer.
This change also removes empty terms and trim surface form in Nori's Korean dictionary. (Trey Jones, Jim Ferenczi)
* LUCENE-8550: Fix filtering of coplanar points when creating linked list on
polygon tesselator. (Ignacio Vera)
* LUCENE-8549: Polygon tessellator throws an error if some parts of the shape
could not be processed. (Ignacio Vera)
* LUCENE-8540: Better handling of min/max values for Geo3d encoding. (Ignacio Vera)
* LUCENE-8534: Fix incorrect computation for triangles intersecting polygon edges in
shape tessellation. (Ignacio Vera)
* LUCENE-8559: Fix bug where polygon edges were skipped when checking for intersections.
(Ignacio Vera)
* LUCENE-8556: Use latitude and longitude instead of encoding values to check if triangle is ear
when using morton optimisation. (Ignacio Vera)
* LUCENE-8586: Intervals.or() could get stuck in an infinite loop on certain indexes
(Alan Woodward)
* LUCENE-8595: Fix interleaved DV update and reset. Interleaved update and reset value
to the same doc in the same updates package looses an update if the reset comes before
the update as well as loosing the reset if the update comes frist. (Simon Willnauer, Adrien Grand)
* LUCENE-8592: Fix index sorting corruption due to numeric overflow. The merge of sorted segments
can produce an invalid sort if the sort field is an Integer/Long that uses reverse order and contains
values equal to Integer/Long#MIN_VALUE. These values are always sorted first during a merge
(instead of last because of the reverse order) due to this bug. Indices affected by the bug can be
detected by running the CheckIndex command on a distribution that contains the fix (7.6+).
(Jim Ferenczi, Adrien Grand, Mike McCandless, Simon Willnauer)
New Features
* LUCENE-8496: Selective indexing - modify BKDReader/BKDWriter to allow users
to select a fewer number of dimensions to be used for creating the index than
the total number of dimensions used for field encoding. i.e., dimensions 0 to N
may be used to determine how to split the inner nodes, and dimensions N+1 to D
are ignored and stored as data dimensions at the leaves. (Nick Knize)
* LUCENE-8538: Add a Simple WKT Shape Parser for creating Lucene Geometries (Polygon, Line,
Rectangle) from WKT format. (Nick Knize)
* LUCENE-8462: Adds an Arabic snowball stemmer based on
https://github.com/snowballstem/snowball/blob/master/algorithms/arabic.sbl
(Ryadh Dahimene via Jim Ferenczi)
* LUCENE-8554: Add new LatLonShapeLineQuery that queries indexed LatLonShape fields
by arbitrary lines. (Nick Knize)
* LUCENE-8555: Add dateline crossing support to LatLonShapeBoundingBoxQuery. (Nick Knize)
Improvements
* LUCENE-8521: Change LatLonShape encoding to 7 dimensions instead of 6; where the
first 4 are index dimensions defining the bounding box of the Triangle and the
remaining 3 data dimensions define the vertices of the triangle. (Nick Knize)
* LUCENE-8557: LeafReader.getFieldInfos is now documented and tested that it ought to return
the same cached instance. MemoryIndex's impl now pre-creates the FieldInfos instead of
re-calculating a new instance each time. (Tim Underwood, David Smiley)
* LUCENE-8558: Replace O(N) lookup with O(1) lookup in PerFieldMergeState#FilterFieldInfos.
(Kranthi via Simon Willnauer)
Other
* LUCENE-8523: Correct typo in JapaneseNumberFilterFactory javadocs (Ankush Jhalani
via Alan Woodward)
* LUCENE-8533: Fix Javadocs of DataInput#readVInt(): Negative numbers are
supported, but should be avoided. (Vladimir Dolzhenko via Uwe Schindler)
======================= Lucene 7.5.1 =======================
Bug Fixes
* LUCENE-8454: Fix incorrect vertex indexing and other computation errors in
shape tessellation that would sometimes cause an infinite loop. (Nick Knize)
======================= Lucene 7.5.0 =======================
API Changes
* LUCENE-8467: RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated
(Dawid Weiss)
* LUCENE-8356: StandardFilter is deprecated (Alan Woodward)
* LUCENE-8373: ENGLISH_STOP_WORD_SET on StandardAnalyzer is deprecated. Instead
use EnglishAnalyzer.ENGLISH_STOP_WORD_SET. The default constructor for
StopAnalyzer is also deprecated, and a stop word set should be explicitly
passed to the constructor. (Alan Woodward)
* LUCENE-8378: Add DocIdSetIterator.range static method to return an iterator
matching a range of docids (Mike McCandless)
* LUCENE-8379: Add experimental TermQuery.getTermStates method (Mike McCandless)
* LUCENE-8407: Add experimental SpanTermQuery.getTermStates method (David Smiley)
* LUCENE-8390: MatchesIteratorSupplier replaced by IOSupplier (Alan Woodward,
David Smiley)
* LUCENE-8397: Add DirectoryTaxonomyWriter.getCache (Mike McCandless)
* LUCENE-8387: Add experimental IndexSearcher.getSlices API to see which slices
IndexSearcher is searching concurrently when it's created with an ExecutorService
(Mike McCandless)
* LUCENE-8263: TieredMergePolicy's reclaimDeletesWeight has been replaced with a
new deletesPctAllowed setting to control how aggressively deletes should be
reclaimed. (Erick Erickson, Adrien Grand)
* LUCENE-7314: Graduate LatLonPoint and query classes to core (Nick Knize)
* LUCENE-8428: The way that oal.util.PriorityQueue creates sentinel objects has
been changed from a protected method to a java.util.function.Supplier as a
constructor argument. (Adrien Grand)
* LUCENE-8437: CheckIndex.Status.cantOpenSegments and missingSegmentVersion
have been removed as they were not computed correctly. (Adrien Grand)
* LUCENE-8286: The UnifiedHighlighter has a new HighlightFlag.WEIGHT_MATCHES flag that
will tell this highlighter to use the new MatchesIterator API as the underlying
approach to navigate matching hits for a query. This mode will highlight more
accurately than any other highlighter, and can mark up phrases as one span instead of
word-by-word. The UH's public internal APIs changed a bit in the process.
(David Smiley)
* LUCENE-8471: IndexWriter.getFlushingBytes() returns how many bytes are currently
being flushed to disk. (Alan Woodward)
* LUCENE-8422: Static helper functions for Matches and MatchesIterator implementations
have been moved from Matches to MatchesUtils (Alan Woodward)
* LUCENE-8343: Suggesters now require Long (versus long, previously) from weight() method
while indexing, and provide double (versus long, previously) scores at lookup time
(Alessandro Benedetti)
* LUCENE-8459: SearcherTaxonomyManager now has a constructor taking already opened
IndexReaders, allowing the caller to pass a FilterDirectoryReader, for example.
(Mike McCandless)
Bug Fixes
* LUCENE-8445: Tighten condition when two planes are identical to prevent constructing
bogus tiles when building GeoPolygons. (Ignacio Vera)
* LUCENE-8444: Prevent building functionally identical plane bounds when constructing
DualCrossingEdgeIterator . (Ignacio Vera)
* LUCENE-8380: UTF8TaxonomyWriterCache inconsistency. (Ruslan Torobaev, Dawid Weiss)
* LUCENE-8164: IndexWriter silently accepts broken payload. This has been fixed
via LUCENE-8165 since we are now checking for offset+length going out of bounds.
(Robert Muir, Nhat Nyugen, Simon Willnauer)
* LUCENE-8370: Reproducing
TestLucene{54,70}DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields()
failures (Erick Erickson)
* LUCENE-8376, LUCENE-8371: ConditionalTokenFilter.end() would not propagate correctly
if the last token in the stream was subsequently dropped; FixedShingleFilter did
not set position increment in end() (Alan Woodward)
* LUCENE-8395: WordDelimiterGraphFilter would incorrectly insert a hole into a
TokenStream if a token consisting entirely of delimiter characters was
encountered, but preserve_original was set. (Alan Woodward)
* LUCENE-8398: TieredMergePolicy.getMaxMergedSegmentMB has rounding error (Erick Erickson)
* LUCENE-8429: DaciukMihovAutomatonBuilder is no longer prone to stack
overflows by enforcing a maximum term length. (Adrien Grand)
* LUCENE-8441: IndexWriter now checks doc value type for index sort fields
and fails the document if they are not compatible. (Jim Ferenczi, Mike McCandless)
* LUCENE-8458: Adjust initialization condition of PendingSoftDeletes and ensures
it is initialized before accepting deletes (Simon Willnauer, Nhat Nguyen)
* LUCENE-8466: IndexWriter.deleteDocs(Query... query) incorrectly applies deletes on flush
if the index is sorted. (Adrien Grand, Jim Ferenczi, Vish Ramachandran)
* LUCENE-8502: Allow access to delegate in FilterCodecReader. FilterCodecReader didn't
allow access to it's delegate like other filter readers. This adds a new #getDelegate method
to access the wrapped reader. (Simon Willnauer)
Changes in Runtime Behavior
* LUCENE-7976: TieredMergePolicy now respects maxSegmentSizeMB by default when executing
findForcedMerges and findForcedDeletesMerges (Erick Erickson)
* LUCENE-8263: TieredMergePolicy now reclaims deleted documents more
aggressively by default ensuring that no more than ~1/3 of the index size is
used by deleted documents. (Adrien Grand)
* LUCENE-8503: Call #getDelegate instead of direct member access during unwrap.
Filter*Reader instances access the member or the delegate directly instead of
calling getDelegate(). In order to track access of the delegate these methods
should call #getDelegate() (Simon Willnauer)
Improvements
* LUCENE-8468: A ByteBuffer based Directory implementation. (Dawid Weiss)
* LUCENE-8447: Add DISJOINT and WITHIN support to LatLonShape queries. (Nick Knize)
* LUCENE-8440: Add support for indexing and searching Line and Point shapes using LatLonShape encoding (Nick Knize)
* LUCENE-8435: Add new LatLonShapePolygonQuery for querying indexed LatLonShape fields by arbitrary polygons (Nick Knize)
* LUCENE-8367: Make per-dimension drill down optional for each facet dimension (Mike McCandless)
* LUCENE-8396: Add Points Based Shape Indexing and Search that decomposes shapes
into a triangular mesh and indexes individual triangles as a 6 dimension point (Nick Knize)
* LUCENE-8345, GitHub PR #392: Remove instantiation of redundant wrapper classes for primitives;
add wrapper class constructors to forbiddenapis. (Michael Braun via Uwe Schindler)
* LUCENE-8415: Clean up Directory contracts and JavaDoc comments. (Dawid Weiss)
* LUCENE-8414: Make segmentInfos private in IndexWriter (Simon Willnauer, Nhat Nguyen)
* LUCENE-8446: The UnifiedHighlighter's DefaultPassageFormatter now treats overlapping matches in
the passage as merged (as if one larger match). (David Smiley)
* LUCENE-8460: Better argument validation in StoredField. (Namgyu Kim)
* LUCENE-8432: TopFieldComparator stops comparing documents if the index is
sorted, even if hits still need to be visited to compute the hit count.
(Nikolay Khitrin)
* LUCENE-8422: IntervalQuery now returns useful Matches (Alan Woodward)
* LUCENE-7862: Store the real bounds of the leaf cells in the BKD index when the
number of dimensions is bigger than 1. It improves performance when there is
correlation between the dimensions, for example ranges. (Ignacio Vera, Adrien Grand)
Build
* LUCENE-5143: Stop publishing KEYS file with each version, use topmost lucene/KEYS file only.
The buildAndPushRelease.py script validates that RM's PGP key is in the KEYS file.
Remove unused 'copy-to-stage' and '-dist-keys' targets from ant build. (janhoy)
Other
* LUCENE-8485: Update randomizedtesting to version 2.6.4. (Dawid Weiss)
* LUCENE-8366: Upgrade to ICU 62.1. Emoji handling now uses Unicode 11's
Extended_Pictographic property. (Robert Muir)
* LUCENE-8408: original Highlighter: Remove obsolete static AttributeFactory instance
in TokenStreamFromTermVector. (Michael Braun, David Smiley)
* LUCENE-8420: Upgrade OpenNLP to 1.9.0 so OpenNLP tool can read the new model format which 1.8.x
cannot read. 1.9.0 can read the old format. (Koji Sekiguchi)
* LUCENE-8453: Add documentation to analysis factories of Korean (Nori) analyzer
module. (Tomoko Uchida via Uwe Schindler)
* LUCENE-8455: Upgrade ECJ compiler to 4.6.1 in lucene/common-build.xml (Erick Erickson)
* LUCENE-8456: Upgrade Apache Commons Compress to v1.18 (Steve Rowe)
* LUCENE-765: Improved org.apache.lucene.index javadocs. (Mike Sokolov)
* LUCENE-8476: Remove redundant nullity check and switch to optimized List.sort in the
Korean's user dictionary. (Namgyu Kim)
======================= Lucene 7.4.1 =======================
Bug Fixes
* LUCENE-8365: Fix ArrayIndexOutOfBoundsException in UnifiedHighlighter. This fixes
a "off by one" error in the UnifiedHighlighter's code that is only triggered when
two nested SpanNearQueries contain the same term. (Marc-Andre Morissette via Simon Willnauer)
* LUCENE-8381: Fix IndexWriter incorrectly interprets hard-deletes as soft-deletes
while wrapping reader for merges. (Simon Willnauer, Nhat Nguyen)
* LUCENE-8384: Fix missing advance docValues generation while handling docValues
update in PendingSoftDeletes. (Simon Willnauer, Nhat Nguyen)
* LUCENE-8472: Always rewrite the soft-deletes merge retention query. (Adrien Grand, Nhat Nguyen)
======================= Lucene 7.4.0 =======================
Upgrading
* LUCENE-8344: If you are using the AnalyzingSuggester or FuzzySuggester subclass, and if you
explicitly use the preservePositionIncrements=false setting (not the default), then you ought
to rebuild your suggester index. If you don't, queries or indexed data with trailing position
gaps (e.g. stop words) may not work correctly. (David Smiley, Jim Ferenczi)
API Changes
* LUCENE-8242: IndexSearcher.createNormalizedWeight() has been deprecated.
Instead use IndexSearcher.createWeight(), rewriting the query first.
(Alan Woodward)
* LUCENE-8248: MergePolicyWrapper is renamed to FilterMergePolicy and now
also overrides getMaxCFSSegmentSizeMB (Mike Sokolov via Mike McCandless)
* LUCENE-8303: LiveDocsFormat is now only responsible for (de)serialization of
live docs. (Adrien Grand)
Changes in Runtime Behavior
* LUCENE-8309: Live docs are no longer backed by a FixedBitSet. (Adrien Grand)
* LUCENE-8330: Detach IndexWriter from MergePolicy. MergePolicy now instead of
requiring IndexWriter as a hard dependency expects a MergeContext which
IndexWriter implements. (Simon Willnauer, Robert Muir, Dawid Weiss, Mike McCandless)
New Features
* LUCENE-8200: Allow doc-values to be updated atomically together
with a document. Doc-Values updates now can be used as a soft-delete
mechanism to all keeping several version of a document or already
deleted documents around for later reuse. See "IW.softUpdateDocument(...)"
for reference. (Simon Willnauer)
* LUCENE-8197: A new FeatureField makes it easy and efficient to integrate
static relevance signals into the final score. (Adrien Grand, Robert Muir)
* LUCENE-8202: Add a FixedShingleFilter (Alan Woodward, Adrien Grand, Jim
Ferenczi)
* LUCENE-8125: ICUTokenizer support for emoji/emoji sequence tokens. (Robert Muir)
* LUCENE-8196, LUCENE-8300: A new IntervalQuery in the sandbox allows efficient proximity
searches based on minimum-interval semantics. (Alan Woodward, Adrien Grand,
Jim Ferenczi, Simon Willnauer, Matt Weber)
* LUCENE-8233: Add support for soft deletes to IndexWriter delete accounting.
Soft deletes are accounted for inside the index writer and therefor also
by merge policies. A SoftDeletesRetentionMergePolicy is added that allows
to selectively carry over soft_deleted document across merges for retention
policies (Simon Willnauer, Mike McCandless, Robert Muir)
* LUCENE-8237: Add a SoftDeletesDirectoryReaderWrapper that allows to respect
soft deletes if the reader is opened form a directory. (Simon Willnauer,
Mike McCandless, Uwe Schindler, Adrien Grand)
* LUCENE-8229, LUCENE-8270: Add a method Weight.matches(LeafReaderContext, doc)
that returns an iterator over matching positions for a given query and document.
This allows exact hit extraction and will enable implementation of accurate
highlighters. (Alan Woodward, Adrien Grand, David Smiley)
* LUCENE-8249: Implement Matches API for phrase queries (Alan Woodward, Adrien
Grand)
* LUCENE-8246: Allow to customize the number of deletes a merge claims. This
helps merge policies in the soft-delete case to correctly implement retention
policies without triggering uncessary merges. (Simon Willnauer, Mike McCandless)
* LUCENE-8231: A new analysis module (nori) similar to Kuromoji
but to handle Korean using mecab-ko-dic and morphological analysis.
(Robert Muir, Jim Ferenczi)
* LUCENE-8265: WordDelimter/GraphFilter now have an option to skip tokens
marked with KeywordAttribute (Mike Sokolov via Mike McCandless)
* LUCENE-8297: Add IW#tryUpdateDocValues(Reader, int, Fields...) IndexWriter can
update doc values for a specific term but this might affect all documents
containing the term. With tryUpdateDocValues users can update doc-values
fields for individual documents. This allows for instance to soft-delete
individual documents. (Simon Willnauer)
* LUCENE-8298: Allow DocValues updates to reset a value. Passing a DV field with a null
value to IW#updateDocValues or IW#tryUpdateDocValues will now remove the value from the
provided document. This allows to undelete a soft-deleted document unless it's been claimed
by a merge. (Simon Willnauer)
* LUCENE-8273: ConditionalTokenFilter allows analysis chains to skip particular token
filters based on the attributes of the current token. This generalises the keyword
token logic currently used for stemmers and WDF. It is integrated into
CustomAnalyzer by using the `when` and `whenTerm` builder methods, and a new
ProtectedTermFilter is added as an example. (Alan Woodward, Robert Muir,
David Smiley, Steve Rowe, Mike Sokolov)
* LUCENE-8310: Ensure IndexFileDeleter accounts for pending deletes. Today we fail
creating the IndexWriter when the directory has a pending delete. Yet, this
is mainly done to prevent writing still existing files more than once.
IndexFileDeleter already accounts for that for existing files which we can
now use to also take pending deletes into account which ensures that all file
generations per segment always go forward. (Simon Willnauer)
* LUCENE-7960: Add preserveOriginal option to the NGram and EdgeNGram filters.
(Ingomar Wesp, Shawn Heisey via Robert Muir)
* LUCENE-8335: Enforce soft-deletes field up-front. Soft deletes field must be marked
as such once it's introduced and can't be changed after the fact.
(Nhat Nguyen via Simon Willnauer)
* LUCENE-8332: New ConcatenateGraphFilter for concatenating all tokens into one (or more
in the event of a graph input). This is useful for fast analyzed exact-match lookup,
suggesters, and as a component of a named entity recognition system. This was excised
out of CompletionTokenStream in the NRT doc suggester. (David Smiley, Jim Ferenczi)
Bug Fixes
* LUCENE-8221: MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger
indexes.
* LUCENE-8266: Detect bogus tiles when creating a standard polygon and
throw a TileException. (Ignacio Vera)
* LUCENE-8234: Fixed bug in how spatial relationship is computed for
GeoStandardCircle when it covers the whole world. (Ignacio Vera)
* LUCENE-8236: Filter duplicated points when creating GeoPath shapes to
avoid creation of bogus planes. (Ignacio Vera)
* LUCENE-8243: IndexWriter.addIndexes(Directory[]) did not properly preserve
index file names for updated doc values fields (Simon Willnauer,
Michael McCandless, Nhat Nguyen)
* LUCENE-8275: Push up #checkPendingDeletes to Directory to ensure IW fails if
the directory has pending deletes files even if the directory is filtered or
a FileSwitchDirectory (Simon Willnauer, Robert Muir)
* LUCENE-8244: Do not leak open file descriptors in SearcherTaxonomyManager's
refresh on exception (Mike McCandless)
* LUCENE-8305: ComplexPhraseQuery.rewrite now handles an embedded MultiTermQuery
that rewrites to a MatchNoDocsQuery instead of throwing an exception.
(Bjarke Mortensen, Andy Tran via David Smiley)
* LUCENE-8287: Ensure that empty regex completion queries always return no results.
(Julie Tibshirani via Jim Ferenczi)
* LUCENE-8317: Prevent concurrent deletes from being applied during full flush.
Future deletes could potentially be exposed to flushes/commits/refreshes if the
amount of RAM used by deletes is greater than half of the IW RAM buffer. (Simon Willnauer)
* LUCENE-8320: Fix WindowsFS to correctly account for rename and hardlinks.
(Simon Willnauer, Nhat Nguyen)
* LUCENE-8328: Ensure ReadersAndUpdates consistently executes under lock.
(Nhat Nguyen via Simon Willnauer)
* LUCENE-8325: Fixed the smartcn tokenizer to not split UTF-16 surrogate pairs.
(chengpohi via Jim Ferenczi)
* LUCENE-8186: LowerCaseTokenizerFactory now lowercases text in multi-term
queries. (Tim Allison via Adrien Grand)
* LUCENE-8278: Some end-of-input no-scheme domain-only URL tokens are typed as
<ALPHANUM> rather than <URL>. (Junte Zhang, Steve Rowe)
* LUCENE-8355: Prevent IW from opening an already dropped segment while DV updates
are written. (Nhat Nguyen via Simon Willnauer)
* LUCENE-8344: TokenStreamToAutomaton (used by some suggesters) was not ignoring a trailing
position increment when the preservePositionIncrement setting is false.
(David Smiley, Jim Ferenczi)
* LUCENE-8357: FunctionScoreQuery.boostByQuery() and boostByValue() were
producing truncated Explanations (Markus Jelsma, Alan Woodward)
* LUCENE-8360: NGramTokenFilter and EdgeNGramTokenFilter did not correctly
set position increments in end() (Alan Woodward)
Other
* LUCENE-8301: Update randomizedtesting to 2.6.0. (Dawid Weiss)
* LUCENE-8299: Geo3D wrapper uses new polygon method factory that gives better
support for polygons with many points (>100). (Ignacio vera)
* LUCENE-8261: InterpolatedProperties.interpolate and recursive property
references. (Steve Rowe, Dawid Weiss)
* LUCENE-8228: removed obsolete IndexDeletionPolicy clone() requirements from
the javadoc. (Dawid Weiss)
* LUCENE-8219: Use a realistic estimate of the number of nodes and links in
LevensteinAutomaton.java, to save reallocation of arrays.
(Christian Ziech)
* LUCENE-8214: Improve selection of testPoint for GeoComplexPolygon.
(Ignacio Vera)
* SOLR-10912: Add automatic patch validation. (Mano Kovacs, Steve Rowe)
* LUCENE-8122, LUCENE-8175: Upgrade analysis/icu to ICU 61.1.
(Robert Muir, Adrien Grand, Uwe Schindler)
* LUCENE-8291: Remove QueryTemplateManager utility class from XML queryparser.
This class is just a general XML transforming tool (using property files and
XSLT) and has nothing to do with query parsing. It can easily be implemented
using more sophisticated libraries or using XSL transformers from the JDK.
This change also removes the Lucene demo webapp to prevent XSS issues in
untested/unmaintained code. (Uwe Schindler)
Build
* LUCENE-7935: Publish .sha512 hash files with the release artifacts and stop
publishing .md5 hashes since the algorithm is broken (janhoy)
* LUCENE-8230: Upgrade forbiddenapis to version 2.5. (Uwe Schindler)
Documentation
* LUCENE-8238: Improve WordDelimiterFilter and WordDelimiterGraphFilter javadocs
(Mike Sokolov via Mike McCandless)
======================= Lucene 7.3.1 =======================
Bug fixes
* LUCENE-8254: LRUQueryCache could cause IndexReader to hang on close, when
shared with another reader with no CacheHelper (Alan Woodward, Simon Willnauer,
Adrien Grand)
======================= Lucene 7.3.0 =======================
API Changes
* LUCENE-8051: LevensteinDistance renamed to LevenshteinDistance.
(Pulak Ghosh via Adrien Grand)
* LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery and BoostingQuery.
Users should instead use FunctionScoreQuery, possibly combined with
a lucene expression (Alan Woodward)
* LUCENE-8104: Remove facets module compile-time dependency on queries
(Alan Woodward)
* LUCENE-8145: UnifiedHighlighter now uses a unitary OffsetsEnum rather
than a list of enums (Alan Woodward, David Smiley, Jim Ferenczi, Timothy
Rodriguez)
New Features
* LUCENE-2899: Add new module analysis/opennlp, with analysis components
to perform tokenization, part-of-speech tagging, lemmatization and phrase
chunking by invoking the corresponding OpenNLP tools. Named entity
recognition is also provided as a Solr update request processor.
(Lance Norskog, Grant Ingersoll, Joern Kottmann, Em, Kai Gülzau,
Rene Nederhand, Robert Muir, Steven Bower, Steve Rowe)
* LUCENE-8126: Add new spatial prefix tree (SPT) based on google S2 geometry.
It can only be used currently with Geo3D spatial context and it provides
improvements on indexing time for non-points shapes and on query performance.
(Ignacio Vera, David Smiley).
Improvements
* LUCENE-8081: Allow IndexWriter to opt out of flushing on indexing threads
Index/Update Threads try to help out flushing pending document buffers to
disk. This change adds an expert setting to opt ouf of this behavior unless
flusing is falling behind. (Simon Willnauer)
* LUCENE-8086: spatial-extras Geo3dFactory: Use GeoExactCircle with
configurable precision for non-spherical planet models.
(Ignacio Vera via David Smiley)
* LUCENE-8093: TrimFilterFactory implements MultiTermAwareComponent (Alan Woodward)
* LUCENE-8094: TermInSetQuery.toString now returns "field:(A B C)" (Mike McCandless)
* LUCENE-8121: UnifiedHighlighter passage relevancy is improved for terms that are
position sensitive (e.g. part of a phrase) by having an accurate freq.
(David Smiley)
* LUCENE-8129: A Unicode set filter can now be specified when using ICUFoldingFilter.
(Ere Maijala)
* LUCENE-7966: Build Multi-Release JARs to enable usage of optimized intrinsic methods
from Java 9 for index bounds checking and array comparison/mismatch. This change
introduces Java 8 replacements for those Java 9 methods and patches the compiled
classes to use the optimized variants through the MR-JAR mechanism.
(Uwe Schindler, Robert Muir, Adrien Grand, Mike McCandless)
* LUCENE-8127: Speed up rewriteNoScoring when there are no MUST clauses.
(Michael Braun via Adrien Grand)
* LUCENE-8152: Improve consumption of doc-value iterators. (Horatiu Lazu via
Adrien Grand)
* LUCENE-8033: FieldInfos now always use a dense encoding. (Mayya Sharipova
via Adrien Grand)
* LUCENE-8190: Specialized cell interface to allow any spatial prefix tree to
benefit from the setting setPruneLeafyBranches on RecursivePrefixTreeStrategy.
(Ignacio Vera)
Bug Fixes
* LUCENE-8077: Fixed bug in how CheckIndex verifies doc-value iterators.
(Xiaoshan Sun via Adrien Grand)
* SOLR-11758: Fixed FloatDocValues.boolVal to correctly return true for all values != 0.0F
(Munendra S N via hossman)
* LUCENE-8121: The UnifiedHighlighter would highlight some terms within some nested
SpanNearQueries at positions where it should not have. It's fixed in the UH by
switching to the SpanCollector API. The original Highlighter still has this
problem (LUCENE-2287, LUCENE-5455, LUCENE-6796). Some public but internal parts of
the UH were refactored. (David Smiley, Steve Davids)
* LUCENE-8120: Fix LatLonBoundingBox's toString() method (Martijn van Groningen, Adrien Grand)
* LUCENE-8130: Fix NullPointerException from TermStates.toString() (Mike McCandless)
* LUCENE-8124: Fixed HyphenationCompoundWordTokenFilter to handle correctly
hyphenation patterns with indicator >= 7. (Holger Bruch via Adrien Grand)
* LUCENE-8163: BaseDirectoryTestCase could produce random filenames that fail
on Windows (Alan Woodward)
* LUCENE-8174: Fixed {Float,Double,Int,Long}Range.toString(). (Oliver Kaleske
via Adrien Grand)
* LUCENE-8182: Fixed BoostingQuery to apply the context boost instead of the parent query
boost (Jim Ferenczi)
* LUCENE-8188: Fixed bugs in OpenNLPOpsFactory that were causing InputStreams fetched from the
ResourceLoader to be leaked (hossman)
Other
* LUCENE-8111: IndexOrDocValuesQuery Javadoc references outdated method name.
(Kai Chan via Adrien Grand)
* LUCENE-8106: Add script (reproduceJenkinsFailures.py) to attempt to reproduce
failing tests from a Jenkins log. (Steve Rowe)
* LUCENE-8075: Removed unnecessary null check in IntersectTermsEnum.
(Pulak Ghosh via Adrien Grand)
* LUCENE-8156: Require users to not have ASM on the Ant classpath during build.
This is required by LUCENE-7966. (Adrien Grand, Uwe Schindler)
* LUCENE-8161: spatial-extras: the Spatial4j dependency has been updated from 0.6 to 0.7,
which is drop-in compatible (Lucene doesn't expressly use any of the few API differences).
Spatial4j 0.7 is compatible with JTS 1.15.0 and not any prior version. JTS 1.15.0 is
dual-licensed to include BSD; prior versions were LGPL. (David Smiley)
* LUCENE-8155: Add back support in smoke tester to run against later Java versions.
(Uwe Schindler)
* LUCENE-8169: Migrated build to use OpenClover 4.2.1 for checking code coverage.
(Uwe Schindler)
* LUCENE-8170: Improve OpenClover reports (separate test from production code);
enable coverage reports inside test-frameworks. (Uwe Schindler)
Build
* LUCENE-8168: Moved Groovy scripts in build files to separate files.
Update Groovy to 2.4.13. (Uwe Schindler)
* LUCENE-8176: HttpReplicatorTest awaits more than a minute for stopping Jetty threads
(Mikhail Khludnev)
======================= Lucene 7.2.1 =======================
Bug Fixes
* LUCENE-8117: Fix advanceExact on SortedNumericDocValues produced by Lucene54DocValues. (Jim Ferenczi).
======================= Lucene 7.2.0 =======================
API Changes
* LUCENE-8017, LUCENE-8042: Weight, DoubleValuesSource and related objects
now implement a SegmentCacheable interface, with a single method
isCacheable(LeafReaderContext) determining whether or not the object may
be cached against a LeafReader. (Alan Woodward, Robert Muir)
* LUCENE-8038: Payload factors for scoring in PayloadScoreQuery are now
calculated by a PayloadDecoder, instead of delegating to the Similarity.
(Alan Woodward)
* LUCENE-8014: Similarity.computeSlopFactor() and
Similarity.computePayloadFactor() have been deprecated. (Alan Woodward)
* LUCENE-6278: Scorer.freq() has been removed (Alan Woodward)
* LUCENE-7736: DoubleValuesSource and LongValuesSource now expose a
rewrite(IndexSearcher) function. (Alan Woodward)
* LUCENE-7998: DoubleValuesSource.fromQuery() allows you to use the scores
from a Query as a DoubleValuesSource. (Alan Woodward)
* LUCENE-8049: IndexWriter.getMergingSegments()'s return type was changed from
Collection to Set to more accurately reflect it's nature. (David Smiley)
* LUCENE-8059: TopFieldDocCollector can now early terminate collection when
the sort order is compatible with the index order. As a consequence,
EarlyTerminatingSortingCollector is now deprecated. (Adrien Grand)
New Features
* LUCENE-8061: Add convenience factory methods to create BBoxes and XYZSolids
directly from bounds objects.
* LUCENE-7736: IndexReaderFunctions expose various IndexReader statistics as
DoubleValuesSources. (Alan Woodward)
* LUCENE-8068: Allow IndexWriter to write a single DWPT to disk Adds a
flushNextBuffer method to IndexWriter that allows the caller to
synchronously move the next pending or the biggest non-pending index buffer to
disk. This enables flushing selected buffer to disk without highjacking an
indexing thread. This is for instance useful if more than one IW (shards) must
be maintained in a single JVM / system. (Simon Willnauer)
Bug Fixes
* LUCENE-8076: Normalize Vincenti distance calculation for planet models that aren't normalized.
(Ignacio Vera)
* LUCENE-8057: Exact circle bounds computation was incorrect.
(Ignacio Vera)
* LUCENE-8056: Exact circle segment bounding suffered from precision errors.
(Karl Wright)
* LUCENE-8054: Fix the exact circle case where relationships fail when the
planet model has c <= ab, because the planes are constructed incorrectly.
(Ignacio Vera)
* LUCENE-7991: KNearestNeighborDocumentClassifier.knnSearch no longer applies
a previous boosted field's factor to subsequent unboosted fields.
(Christine Poerschke)
* LUCENE-7999: Switch from int to long to track the name for the next
segment to write, so that very long lived indices with very frequent
refreshes or commits, and high indexing thread counts, do not
overflow an int (Mykhailo Demianenko via Mike McCandless)
* LUCENE-8025: Use sumTotalTermFreq=sumDocFreq when scoring DOCS_ONLY fields
that omit term frequency information, as it is equivalent in that case.
Previously bogus numbers were used, and many similarities would
completely degrade. (Robert Muir, Adrien Grand)
* LUCENE-8045: ParallelLeafReader did not correctly report FieldInfo.dvGen
(Alan Woodward)
* LUCENE-8034: Use subtraction instead of addition to sidestep int
overflow in SpanNotQuery. (Hari Menon via Mike McCandless)
* LUCENE-8078: The query cache should not cache instances of
MatchNoDocsQuery. (Jon Harper via Adrien Grand)
* LUCENE-8048: Filesystems do not guarantee order of directories updates
(Nikolay Martynov, Simon Willnauer, Erick Erickson)
Optimizations
* LUCENE-8018: Smaller FieldInfos memory footprint by not retaining unnecessary
references to TreeMap entries. (Julian Vassev via Adrien Grand)
* LUCENE-7994: Use int/int scatter map to gather facet counts when the
number of hits is small relative to the number of unique facet labels
(Dawid Weiss, Robert Muir, Mike McCandless)
* LUCENE-8062: GlobalOrdinalsQuery is no longer eligible for caching. (Jim Ferenczi)
* LUCENE-8058: Large instances of TermInSetQuery are no longer eligible for
caching as they could break memory accounting of the query cache.
(Adrien Grand)
* LUCENE-8055: MemoryIndex.MemoryDocValuesIterator returns 2 documents
instead of 1. (Simon Willnauer)
* LUCENE-8043: Fix document accounting in IndexWriter to prevent writing too many
documents. Once this happens, Lucene refuses to open the index and throws a
CorruptIndexException. (Simon Willnauer, Yonik Seeley, Mike McCandless)
Tests
* LUCENE-8035: Run tests with JDK-specific options: --illegal-access=deny
on Java 9+. (Uwe Schindler)
Build
* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy
jars in ~/.ant/lib/. (Shawn Heisey, Steve Rowe)
======================= Lucene 7.1.0 =======================
Changes in Runtime Behavior
* Resolving of external entities in queryparser/xml/CoreParser is disallowed
by default. See SOLR-11477 for details.
New Features
* LUCENE-7970: Add a shape to Geo3D that consists of multiple planes that
approximate a true circle, rather than an ellipse, for non-spherical planet models.
(Karl Wright, Ignacio Vera)
* LUCENE-7955: Add support for the concept of "nearest distance" to Geo3D's
GeoPath abstraction, which is the distance along the path to the point that is
closest to the provided point. (Karl Wright)
* LUCENE-7906: Add spatial relationships between all currently-defined Geo shapes.
(Ignacio Vera)
* LUCENE-7955: Add support for zero-width paths. (Karl Wright)
* LUCENE-7936: Add serialization and deserialization support to Geo3D. (Karl Wright,
Ignacio Vera)
* LUCENE-7942: Distance computations now have the ability to accurately aggregate
distances, rather than just doing sums. (Karl Wright)
* LUCENE-7934: Add a planet model interface. (Karl Wright)
* LUCENE-7918: Revamp the API for composites so that it's generic and can be used
for many kinds of shapes. (Ignacio Vera)
* LUCENE-7621: Add CoveringQuery, a query whose required number of matching
clauses can be defined per document. (Adrien Grand)
* LUCENE-7927: Add LongValueFacetCounts, to compute facet counts for individual
numeric values (Mike McCandless)
* LUCENE-7940: Add BengaliAnalyzer. (Md. Abdulla-Al-Sun via Robert Muir)
* LUCENE-7392: Add point based LatLonBoundingBox as new RangeField Type.
(Nick Knize)
* LUCENE-7951: Spatial-extras has much better Geo3d support by implementing Spatial4j
abstractions: SpatialContextFactory, ShapeFactory, BinaryCodec, DistanceCalculator.
(Ignacio Vera, David Smiley)
* LUCENE-7973: Update dictionary version for Ukrainian analyzer to 3.9.0 (Andriy
Rysin via Dawid Weiss)
* LUCENE-7974: Add FloatPointNearestNeighbor, an N-dimensional FloatPoint
K-nearest-neighbor search implementation. (Steve Rowe)
* LUCENE-7975: Change the default taxonomy facets cache to a faster
byte[] (UTF-8) based cache. (Mike McCandless)
* LUCENE-7972: DirectoryTaxonomyReader, in Lucene's facet module, now
implements Accountable, so you can more easily track how much heap
it's using. (Mike McCandless)
* LUCENE-7982: A new NormsFieldExistsQuery matches documents that have
norms in a specified field (Colin Goodheart-Smithe via Mike McCandless)
Optimizations
* LUCENE-7905: Optimize how OrdinalMap (used by
SortedSetDocValuesFacetCounts and others) builds its map (Robert
Muir, Adrien Grand, Mike McCandless)
* LUCENE-7655: Speed up geo-distance queries in case of dense single-valued
fields when most documents match. (Maciej Zasada via Adrien Grand)
* LUCENE-7897: IndexOrDocValuesQuery now requires the range cost to be more
than 8x greater than the cost of the lead iterator in order to use doc values.
(Murali Krishna P via Adrien Grand)
* LUCENE-7925: Collapse duplicate SHOULD or MUST clauses by summing up their
boosts. (Adrien Grand)
* LUCENE-7939: MinShouldMatchSumScorer now leverages two-phase iteration in
order to be faster when used in conjunctions. (Adrien Grand)
* LUCENE-7827: AnalyzingInfixSuggester doesn't create "textgrams"
when minPrefixChar=0 (Mikhail Khludnev)
Bug Fixes
* LUCENE-8066: It was still possible to construct a concave GeoExactCircle, so use
a sector approach to prevent that. (Ignacio Vera)
* LUCENE-7967: The GeoDegeneratePoint isWithin() method needed allowance for
numerical precision. (Karl Wright)
* LUCENE-7965: GeoBBoxFactory was constructing the wrong shape at the poles
if the longitude span was greater than 180 degrees. (Karl Wright)
* LUCENE-7916: Prevent ArrayIndexOutOfBoundsException if ICUTokenizer is used
with a different ICU JAR version than it is compiled against. Note, this is
not recommended, lucene-analyzers-icu contains binary data structures
specific to ICU/Unicode versions it is built against. (Chris Koenig, Robert Muir)
* LUCENE-7891: Lucene's taxonomy facets now uses a non-buggy LRU cache
by default. (Jan-Willem van den Broek via Mike McCandless)
* LUCENE-7959: Improve NativeFSLockFactory's exception message if it cannot create
write.lock for an empty index due to bad permissions/read-only filesystem/etc.
(Erick Erickson, Shawn Heisey, Robert Muir)
* LUCENE-7968: AnalyzingSuggester would sometimes order suggestions incorrectly,
it did not properly break ties on the surface forms when both the weights and
the analyzed forms were equal. (Robert Muir)
* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all
child scorers (Adrien Grand, Mike McCandless)
* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
Build
* SOLR-11181: Switch order of maven artifact publishing procedure: deploy first
instead of locally installing first, to workaround a double repository push of
*-sources.jar and *-javadoc.jar files. (Lynn Monson via Steve Rowe)
* LUCENE-6673: Maven build fails for target javadoc:jar.
(Ramkumar Aiyengar, Daniel Collins via Steve Rowe)
* LUCENE-7985: Upgrade forbiddenapis to 2.4.1. (Uwe Schindler)
Other
* LUCENE-7948, LUCENE-7937: Upgrade randomizedtesting to 2.5.3 (minor fixes
in test filtering for IDEs). (Mike Sokolov, Dawid Weiss)
* LUCENE-7933: LongBitSet now validates the numBits parameter (Won
Jonghoon, Mike McCandless)
* LUCENE-7978: Add some more documentation about setting up build
environment. (Anton R. Yuste via Uwe Schindler)
* LUCENE-7983: IndexWriter.IndexReaderWarmer is now a functional interface
instead of an abstract class with a single method (Dawid Weiss)
* LUCENE-5753: Update TLDs recognized by UAX29URLEmailTokenizer. (Steve Rowe)
======================= Lucene 7.0.1 =======================
Bug Fixes
* LUCENE-7957: ConjunctionScorer.getChildren was failing to return all
child scorers (Adrien Grand, Mike McCandless)
======================= Lucene 7.0.0 =======================
New Features
* LUCENE-7703: SegmentInfos now record the major Lucene version at index
creation time. (Adrien Grand)
* LUCENE-7756: LeafReader.getMetaData now exposes the index created version as
well as the oldest Lucene version that contributed to the segment.
(Adrien Grand)
* LUCENE-7854: The new TermFrequencyAttribute used during analysis
with a custom token stream allows indexing custom term frequencies
(Mike McCandless)
* LUCENE-7866: Add a new DelimitedTermFrequencyTokenFilter that allows to
mark tokens with a custom term frequency (LUCENE-7854). It parses a numeric
value after a separator char ('|') at the end of each token and changes
the term frequency to this value. (Uwe Schindler, Robert Muir, Mike
McCandless)
* LUCENE-7868: Multiple threads can now resolve deletes and doc values
updates concurrently, giving sizable speedups in update-heavy
indexing use cases (Simon Willnauer, Mike McCandless)
* LUCENE-7823: Pure query based naive bayes classifier using BM25 scores (Tommaso Teofili)
* LUCENE-7838: Knn classifier based on fuzzified term queries (Tommaso Teofili)
* LUCENE-7855: Added advanced options of the Wikipedia tokenizer to its factory.
(Juan Pedro via Adrien Grand)
API Changes
* LUCENE-2605: Classic QueryParser no longer splits on whitespace by default.
Use setSplitOnWhitespace(true) to get the old behavior. (Steve Rowe)
* LUCENE-7369: Similarity.coord and BooleanQuery.disableCoord are removed.
(Adrien Grand)
* LUCENE-7368: Removed query normalization. (Adrien Grand)
* LUCENE-7355: AnalyzingQueryParser has been removed as its functionality has
been folded into the classic QueryParser. (Adrien Grand)
* LUCENE-7407: Doc values APIs have been switched from random access
to iterators, enabling future codec compression improvements. (Mike
McCandless)
* LUCENE-7475: Norms now support sparsity, allowing to pay for what is
actually used. (Adrien Grand)
* LUCENE-7494: Points now have a per-field API, like doc values. (Adrien Grand)
* LUCENE-7410: Cache keys and close listeners have been refactored in order
to be less trappy. See IndexReader.getReaderCacheHelper and
LeafReader.getCoreCacheHelper. (Adrien Grand)
* LUCENE-6819: Index-time boosts are not supported anymore. As a replacement,
index-time scoring factors should be indexed into a doc value field and
combined at query time using eg. FunctionScoreQuery. (Adrien Grand)
* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType.
(David Smiley)
* LUCENE-7701: Grouping collectors have been refactored, such that groups are
now defined by a GroupSelector implementation. (Alan Woodward)
* LUCENE-7741: DoubleValuesSource now has an explain() method (Alan Woodward,
Adrien Grand)
* LUCENE-7815: Removed the PostingsHighlighter; you should use the UnifiedHighlighter
instead, which derived from the UH. WholeBreakIterator and
CustomSeparatorBreakIterator were moved to UH's package. (David Smiley)
* LUCENE-7850: Removed support for legacy numerics. (Adrien Grand)
* LUCENE-7500: Removed abstract LeafReader.fields(); instead terms(fieldName)
has been made abstract, fomerly was final. Also, MultiFields.getTerms
was optimized to work directly instead of being implemented on getFields.
(David Smiley)
* LUCENE-7872: TopDocs.totalHits is now a long. (Adrien Grand, hossman)
* LUCENE-7868: IndexWriterConfig.setMaxBufferedDeleteTerms is
removed. (Simon Willnauer, Mike McCandless)
* LUCENE-7877: PrefixAwareTokenStream is replaced with ConcatenatingTokenStream
(Alan Woodward, Uwe Schindler, Adrien Grand)
* LUCENE-7867: The deprecated Token class is now only available in the test
framework (Alan Woodward, Adrien Grand)
* LUCENE-7723: DoubleValuesSource enforces implementation of equals() and
hashCode() (Alan Woodward)
* LUCENE-7737: The spatial-extras module no longer has a dependency on the
queries module. All uses of ValueSource are either replaced with core
DoubleValuesSource extensions, or with the new ShapeValuesSource and
ShapeValuesPredicate classes (Alan Woodward, David Smiley)
* LUCENE-7892: Doc-values query factory methods have been renamed so that their
name contains "slow" in order to cleary indicate that they would usually be a
bad choice. (Adrien Grand)
* LUCENE-7899: FieldValueQuery is renamed to DocValuesFieldExistsQuery
(Adrien Grand, Mike McCandless)
Bug Fixes
* LUCENE-7626: IndexWriter will no longer accept broken token offsets
(Mike McCandless)
* LUCENE-7859: Spatial-extras PackedQuadPrefixTree bug that only revealed itself
with the new pointsOnly optimizations in LUCENE-7845. (David Smiley)
* LUCENE-7871: fix false positive match in BlockJoinSelector when children have no value, introducing
wrap methods accepting children as DISI. Extracting ToParentDocValues (Mikhail Khludnev)
* LUCENE-7914: Add a maximum recursion level in automaton recursive
functions (Operations.isFinite and Operations.topsortState) to prevent
large automaton to overflow the stack (Robert Muir, Adrien Grand, Jim Ferenczi)
* LUCENE-7864: IndexMergeTool is not using intermediate hard links (even
if possible). (Dawid Weiss)
* LUCENE-7956: Fixed potential stack overflow error in ICUNormalizer2CharFilter.
(Adrien Grand)
* LUCENE-7963: Remove useless getAttribute() in DefaultIndexingChain that
causes performance drop, introduced by LUCENE-7626. (Daniel Mitterdorfer
via Uwe Schindler)
Improvements
* LUCENE-7489: Better storage of sparse doc-values fields with the default
codec. (Adrien Grand)
* LUCENE-7730: More accurate encoding of the length normalization factor
thanks to the removal of index-time boosts. (Adrien Grand)
* LUCENE-7901: Original Highlighter now eagerly throws an exception if you
provide components that are null. (Jason Gerlowski, David Smiley)
* LUCENE-7841: Normalize ґ to г in Ukrainian analyzer. (Andriy Rysin via Dawid Weiss)
Optimizations
* LUCENE-7416: BooleanQuery optimizes queries that have queries that occur both
in the sets of SHOULD and FILTER clauses, or both in MUST/FILTER and MUST_NOT
clauses. (Spyros Kapnissis via Adrien Grand, Uwe Schindler)
* LUCENE-7506: FastTaxonomyFacetCounts should use CPU in proportion to
the size of the intersected set of hits from the query and documents
that have a facet value, so sparse faceting works as expected
(Adrien Grand via Mike McCandless)
* LUCENE-7519: Add optimized APIs to compute browse-only top level
facets (Mike McCandless)
* LUCENE-7589: Numeric doc values now have the ability to encode blocks of
values using different numbers of bits per value if this proves to save
storage. (Adrien Grand)
* LUCENE-7845: Enhance spatial-extras RecursivePrefixTreeStrategy queries when the
query is a point (for 2D) or a is a simple date interval (e.g. 1 month). When
the strategy is marked as pointsOnly, the results is a TermQuery. (David Smiley)
* LUCENE-7874: DisjunctionMaxQuery rewrites to a BooleanQuery when tiebreaker is set to 1. (Jim Ferenczi)
* LUCENE-7828: Speed up range queries on range fields by improving how we
compute the relation between the query and inner nodes of the BKD tree.
(Adrien Grand)
Other
* LUCENE-7923: Removed FST.Arc.node field (unused). (Dawid Weiss)
* LUCENE-7328: Remove LegacyNumericEncoding from GeoPointField. (Nick Knize)
* LUCENE-7360: Remove Explanation.toHtml() (Alan Woodward)
* LUCENE-7681: MemoryIndex uses new DocValues API (Alan Woodward)
* LUCENE-7753: Make fields static when possible.
(Daniel Jelinski via Adrien Grand)
* LUCENE-7540: Upgrade ICU to 59.1 (Mike McCandless, Jim Ferenczi)
* LUCENE-7852: Correct copyright year(s) in lucene/LICENSE.txt file.
(Christine Poerschke, Steve Rowe)
* LUCENE-7719: Generalized the UnifiedHighlighter's support for AutomatonQuery
for character & binary automata. Added AutomatonQuery.isBinary. (David Smiley)
* LUCENE-7873: Due to serious problems with context class loaders in several
frameworks (OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats,
DocValuesFormats and all analysis factories was changed to only inspect the
current classloader that defined the interface class (lucene-core.jar).
See MIGRATE.txt for more information! (Uwe Schindler, Dawid Weiss)
* LUCENE-7883: Lucene no longer uses the context class loader when resolving
resources in CustomAnalyzer or ClassPathResourceLoader. Resources are only
resolved against Lucene's class loader by default. Please use another builder
method to change to a custom classloader. (Uwe Schindler)
* LUCENE-5822: Convert README to Markdown (Jason Gerlowski via Mike Drob)
* LUCENE-7773: Remove unused/deprecated token types from StandardTokenizer.
(Ahmet Arslan via Steve Rowe)
* LUCENE-7800: Remove code that potentially rethrows checked exceptions
from methods that don't declare them ("sneaky throw" hack). (Robert Muir,
Uwe Schindler, Dawid Weiss)
* LUCENE-7876: Avoid calls to LeafReader.fields() and MultiFields.getFields()
that are trivially replaced by LeafReader.terms() and MultiFields.getTerms()
(David Smiley)
======================= Lucene 6.6.5 =======================
(No Changes)
======================= Lucene 6.6.4 =======================
(No Changes)
======================= Lucene 6.6.3 =======================
Build
* LUCENE-6144: Upgrade Ivy to 2.4.0; 'ant ivy-bootstrap' now removes old Ivy
jars in ~/.ant/lib/. (Shawn Heisey, Steve Rowe)
======================= Lucene 6.6.2 =======================
Changes in Runtime Behavior
* Resolving of external entities in queryparser/xml/CoreParser is disallowed
by default. See SOLR-11477 for details.
Bug Fixes
* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
======================= Lucene 6.6.1 =======================
Bug Fixes
* LUCENE-7869: Changed MemoryIndex to sort 1d points. In case of 1d points, the PointInSetQuery.MergePointVisitor expects
that these points are visited in ascending order. The memory index doesn't do this and this can result in document
with multiple points that should match to not match. (Martijn van Groningen)
* LUCENE-7878: Fix query builder to keep the SHOULD clause that wraps multi-word synonyms. (Jim Ferenczi)
======================= Lucene 6.6.0 =======================
New Features
* LUCENE-7811: Add a concurrent SortedSet facets implementation.
(Mike McCandless)
Bug Fixes
* LUCENE-7777: ByteBlockPool.readBytes sometimes throws
ArrayIndexOutOfBoundsException when byte blocks larger than 32 KB
were added (Mike McCandless)
* LUCENE-7797: The static FSDirectory.listAll(Path) method was always
returning an empty array. (Atkins Chang via Mike McCandless)
* LUCENE-7481: Fixed missing rewrite methods for SpanPayloadCheckQuery
and PayloadScoreQuery. (Erik Hatcher)
* LUCENE-7808: Fixed PayloadScoreQuery and SpanPayloadCheckQuery
.equals and .hashCode methods. (Erik Hatcher)
* LUCENE-7798: Add .equals and .hashCode to ToParentBlockJoinSortField
(Mikhail Khludnev)
* LUCENE-7814: DateRangePrefixTree (in spatial-extras) had edge-case bugs for
years >= 292,000,000. (David Smiley)
* LUCENE-5365, LUCENE-7818: Fix incorrect condition in queryparser's
QueryNodeOperation#logicalAnd(). (Olivier Binda, Amrit Sarkar,
AppChecker via Uwe Schindler)
* LUCENE-7821: The classic and flexible query parsers, as well as Solr's
"lucene"/standard query parser, should require " TO " in range queries,
and accept "TO" as endpoints in range queries. (hossman, Steve Rowe)
* LUCENE-7824: Fix graph query analysis for multi-word synonym rules with common terms (eg. new york, new york city).
(Jim Ferenczi)
* LUCENE-7817: Pass cached query to onQueryCache instead of null.
(Christoph Kaser via Adrien Grand)
* LUCENE-7831: CodecUtil should not seek to negative offsets. (Adrien Grand)
* LUCENE-7833: ToParentBlockJoinQuery computed the min score instead of the max
score with ScoreMode.MAX. (Adrien Grand)
* LUCENE-7847: Fixed all-docs-match optimization of range queries on range
fields. (Adrien Grand)
* LUCENE-7810: Fix equals() and hashCode() methods of several join queries.
(Hossman, Adrien Grand, Martijn van Groningen)
Improvements
* LUCENE-7782: OfflineSorter now passes the total number of items it
will write to getWriter (Mike McCandless)
* LUCENE-7785: Move dictionary for Ukrainian analyzer to external dependency.
(Andriy Rysin via Steve Rowe, Dawid Weiss)
* LUCENE-7801: SortedSetDocValuesReaderState now implements
Accountable so you can see how much RAM it's using (Robert Muir,
Mike McCandless)
* LUCENE-7792: OfflineSorter can now run concurrently if you pass it
an optional ExecutorService (Dawid Weiss, Mike McCandless)
* LUCENE-7811: Sorted set facets now use sparse storage when
collecting hits, when appropriate. (Mike McCandless)
Optimizations
* LUCENE-7787: spatial-extras HeatmapFacetCounter will now short-circuit it's
work when Bits.MatchNoBits is passed. (David Smiley)
Other
* LUCENE-7796: Make IOUtils.reThrow idiom declare Error return type so
callers may use it in a way that compiler knows subsequent code is
unreachable. reThrow is now deprecated in favor of IOUtils.rethrowAlways
with a slightly different semantics (see javadoc). (Hossman, Robert Muir,
Dawid Weiss)
* LUCENE-7754: Inner classes should be static whenever possible.
(Daniel Jelinski via Adrien Grand)
* LUCENE-7751: Avoid boxing primitives only to call compareTo.
(Daniel Jelinski via Adrien Grand)
* LUCENE-7743: Never call new String(String).
(Daniel Jelinski via Adrien Grand)
* LUCENE-7761: Fixed comment in ReqExclScorer.
(Pablo Pita Leira via Adrien Grand)
======================= Lucene 6.5.1 =======================
Bug Fixes
* LUCENE-7755: Fixed join queries to not reference IndexReaders, as it could
cause leaks if they are cached. (Adrien Grand)
* LUCENE-7749: Made LRUQueryCache delegate the scoreSupplier method.
(Martin Amirault via Adrien Grand)
* LUCENE-7769: The UnifiedHighligter wasn't highlighting portions of the query
wrapped in BoostQuery or SpanBoostQuery. (David Smiley, Dmitry Malinin)
Other
* LUCENE-7763: Remove outdated comment in IndexWriterConfig.setIndexSort javadocs.
(马可阳 via Christine Poerschke)
======================= Lucene 6.5.0 =======================
API Changes
* LUCENE-7740: Refactor Range Fields to remove Field suffix (e.g., DoubleRange),
move InetAddressRange and InetAddressPoint from sandbox to misc module, and
refactor all other range fields from sandbox to core. (Nick Knize)
* LUCENE-7624: TermsQuery has been renamed as TermInSetQuery and moved to core.
(Alan Woodward)
* LUCENE-7637: TermInSetQuery requires that all terms come from the same field.
(Adrien Grand)
* LUCENE-7644: FieldComparatorSource.newComparator() and
SortField.getComparator() no longer throw IOException (Alan Woodward)
* LUCENE-7643: Replaced doc-values queries in lucene/sandbox with factory
methods on the *DocValuesField classes. (Adrien Grand)
* LUCENE-7659: Added a IndexWriter#getFieldNames() method (experimental) to return
all field names as visible from the IndexWriter. This would be useful for
IndexWriter#updateDocValues() calls, to prevent calling with non-existent
docValues fields (Ishan Chattopadhyaya, Adrien Grand, Mike McCandless)
* LUCENE-6959: Removed ToParentBlockJoinCollector in favour of
ParentChildrenBlockJoinQuery, that can return the matching children documents per
parent document. This query should be executed for each matching parent document
after the main query has been executed. (Adrien Grand, Martijn van Groningen,
Mike McCandless)
* LUCENE-7628: Scorer.getChildren() now only returns Scorers that are
positioned on the current document, and can throw an IOException.
AssertingScorer checks that getChildren() is not called on an unpositioned
Scorer. (Alan Woodward, Adrien Grand)
* LUCENE-7702: Removed GraphQuery in favour of simple boolean query. (Matt Webber via Jim Ferenczi)
* LUCENE-7707: TopDocs.merge now takes a boolean option telling it
when to use the incoming shard index versus when to assign the shard
index itself, allowing users to merge shard responses incrementally
instead of once all shard responses are present. (Simon Willnauer,
Mike McCandless)
* LUCENE-7700: A cleanup of merge throughput control logic. Refactored all the
code previously scattered throughout the IndexWriter and
ConcurrentMergeScheduler into a more accessible set of public methods (see
MergePolicy.OneMergeProgress, MergeScheduler.wrapForMerge and
OneMerge.mergeInit). (Dawid Weiss, Mike McCandless).
* LUCENE-7734: FieldType's copy constructor was widened to accept any IndexableFieldType.
(David Smiley)
New Features
* LUCENE-7738: Add new InetAddressRange for indexing and querying InetAddress
ranges. (Nick Knize)
* LUCENE-7449: Add CROSSES relation support to RangeFieldQuery. (Nick Knize)
* LUCENE-7623: Add FunctionScoreQuery and FunctionMatchQuery (Alan Woodward,
Adrien Grand, David Smiley)
* LUCENE-7619: Add WordDelimiterGraphFilter, just like
WordDelimiterFilter except it produces correct token graphs so that
proximity queries at search time will produce correct results (Mike
McCandless)
* LUCENE-7656: Added the LatLonDocValuesField.new(Box/Distance)Query() factory
methods that are the equivalent of factory methods on LatLonPoint but operate
on doc values. These new methods should be wrapped in an IndexOrDocValuesQuery
for best performance. (Adrien Grand)
* LUCENE-7673: Added MultiValued[Int/Long/Float/Double]FieldSource that given a
SortedNumericSelector.Type can give a ValueSource view of a
SortedNumericDocValues field. (Tomás Fernández Löbbe)
* LUCENE-7465: Add SimplePatternTokenizer and
SimplePatternSplitTokenizer, using Lucene's regexp/automaton
implementation for analysis/tokenization (Clinton Gormley, Mike
McCandless)
* LUCENE-7688: Add OneMergeWrappingMergePolicy class.
(Keith Laban, Christine Poerschke)
* LUCENE-7686: The near-real-time document suggester can now
efficiently filter out duplicate suggestions (Uwe Schindler, Mike
McCandless)
* LUCENE-7712: SimpleQueryParser now supports default fuzziness
syntax, mapping foo~ to a FuzzyQuery with edit distance 2. (Lee
Hinman, David Pilato via Mike McCandless)
Bug Fixes
* LUCENE-7630: Fix (Edge)NGramTokenFilter to no longer drop payloads
and preserve all attributes. (Nathan Gass via Uwe Schindler)
* LUCENE-7679: MemoryIndex was ignoring omitNorms settings on passed-in
IndexableFields. (Alan Woodward)
* LUCENE-7692: PatternReplaceCharFilterFactory now implements MultiTermAware.
(Adrien Grand)
* LUCENE-7685: ToParentBlockJoinQuery and ToChildBlockJoinQuery now use the
rewritten child query in their equals and hashCode implementations.
(Adrien Grand)
* LUCENE-7698: CommonGramsQueryFilter was producing a disconnected
token graph, messing up phrase queries when it was used during query
parsing (Ere Maijala via Mike McCandless)
* LUCENE-7708: ShingleFilter without unigram was producing a disconnected
token graph, messing up queries when it was used during query
parsing (Jim Ferenczi)
Improvements
* LUCENE-7055: Added Weight#scorerSupplier, which allows to estimate the cost
of a Scorer before actually building it, in order to optimize how the query
should be run, eg. using points or doc values depending on costs of other
parts of the query. (Adrien Grand)
* LUCENE-7643: IndexOrDocValuesQuery allows to execute range queries using
either points or doc values depending on which one is more efficient.
(Adrien Grand)
* LUCENE-7662: If index files are missing, throw CorruptIndexException instead
of the less descriptive FileNotFound or NoSuchFileException (Mike Drob via
Mike McCandless, Erick Erickson)
* LUCENE-7680: UsageTrackingQueryCachingPolicy never caches term filters anymore
since they are plenty fast. This also has the side-effect of leaving more
space in the history for costly filters. (Adrien Grand)
* LUCENE-7677: UsageTrackingQueryCachingPolicy now caches compound queries a bit
earlier than regular queries in order to improve cache efficiency.
(Adrien Grand)
* LUCENE-7710: BlockPackedReader throws CorruptIndexException and includes
IndexInput description instead of plain IOException (Mike Drob via
Mike McCandless)
* LUCENE-7695: ComplexPhraseQueryParser to support query time synonyms (Markus Jelsma
via Mikhail Khludnev)
* LUCENE-7747: QueryBuilder now iterates lazily over the possible paths when building a graph query
(Jim Ferenczi)
Optimizations
* LUCENE-7641: Optimized point range queries to compute documents that do not
match the range on single-valued fields when more than half the documents in
the index would match. (Adrien Grand)
* LUCENE-7656: Speed up for LatLonPointDistanceQuery by computing distances even
less often. (Adrien Grand)
* LUCENE-7661: Speed up for LatLonPointInPolygonQuery by pre-computing the
relation of the polygon with a grid. (Adrien Grand)
* LUCENE-7660: Speed up LatLonPointDistanceQuery by improving the detection of
whether BKD cells are entirely within the distance close to the dateline.
(Adrien Grand)
* LUCENE-7654: ToParentBlockJoinQuery now implements two-phase iteration and
computes scores lazily in order to be faster when used in conjunctions.
(Adrien Grand)
* LUCENE-7667: BKDReader now calls `IntersectVisitor.grow()` on larger
increments. (Adrien Grand)
* LUCENE-7638: Query parsers now analyze the token graph for articulation
points (or cut vertices) in order to create more efficient queries for
multi-token synonyms. (Jim Ferenczi)
* LUCENE-7699: Query parsers now use span queries to produce more efficient
phrase queries for multi-token synonyms. (Matt Webber via Jim Ferenczi)
* LUCENE-7742: Fix places where we were unboxing and then re-boxing
according to FindBugs (Daniel Jelinski via Mike McCandless)
* LUCENE-7739: Fix places where we unnecessarily boxed while parsing
a numeric value according to FindBugs (Daniel Jelinski via Mike
McCandless)
Build
* LUCENE-7653: Update randomizedtesting to version 2.5.0. (Dawid Weiss)
* LUCENE-7665: Remove grouping dependency from the join module.
(Martijn van Groningen)
* SOLR-10023: Add non-recursive 'test-nocompile' target: Only runs unit tests.
Jars are not downloaded; compilation is not updated; and Clover is not enabled.
(Steve Rowe)
* LUCENE-7694: Update forbiddenapis to version 2.3. (Uwe Schindler)
* LUCENE-7693: Replace "org.apache." logic in GetMavenDependenciesTask.
(Daniel Collins, Christine Poerschke)
* LUCENE-7726: Fix HTML entity bugs in Javadocs to be able to build with
Java 9. (Uwe Schindler, Hossman)
* LUCENE-7727: Replace end-of-life Markdown parser "Pegdown" by "Flexmark"
for compatibility with Java 9. (Uwe Schindler)
Other
* LUCENE-7666: Fix typos in lucene-join package info javadoc.
(Tom Saleeba via Christine Poerschke)
* LUCENE-7658: queryparser/xml CoreParser now implements SpanQueryBuilder interface.
(Daniel Collins, Christine Poerschke)
* LUCENE-7715: NearSpansUnordered simplifications.
(Paul Elschot via Adrien Grand)
======================= Lucene 6.4.2 =======================
Bug Fixes
* LUCENE-7676: Fixed FilterCodecReader to override more super-class methods.
Also added TestFilterCodecReader class. (Christine Poerschke)
* LUCENE-7717: The UnifiedHighlighter and PostingsHighlighter were not highlighting
prefix queries with multi-byte characters. TermRangeQuery is affected too.
(Dmitry Malinin, David Smiley)
======================= Lucene 6.4.1 =======================
Build
* LUCENE-7651: Fix Javadocs build for Java 8u121 by injecting "Google Code
Prettify" without adding Javascript to Javadocs's -bottom parameter.
Also update Prettify to latest version to fix Google Chrome issue.
(Uwe Schindler)
Bug Fixes
* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery
with a TermContext is cached. (Adrien Grand)
* LUCENE-7647: Made stored fields reclaim native memory more aggressively when
configured with BEST_COMPRESSION. This could otherwise result in out-of-memory
issues. (Adrien Grand)
* LUCENE-7670: AnalyzingInfixSuggester should not immediately open an
IndexWriter over an already-built index. (Steve Rowe)
======================= Lucene 6.4.0 =======================
API Changes
* LUCENE-7533: Classic query parser no longer allows autoGeneratePhraseQueries
to be set to true when splitOnWhitespace is false (and vice-versa).
* LUCENE-7607: LeafFieldComparator.setScorer and SimpleFieldComparator.setScorer
are declared as throwing IOException (Alan Woodward)
* LUCENE-7617: Collector construction for two-pass grouping queries is
abstracted into a new Grouper class, which can be passed as a constructor
parameter to GroupingSearch. The abstract base classes for the different
grouping Collectors are renamed to remove the Abstract* prefix.
(Alan Woodward, Martijn van Groningen)
* LUCENE-7609: The expressions module now uses the DoubleValuesSource API, and
no longer depends on the queries module. Expression#getValueSource() is
replaced with Expression#getDoubleValuesSource(). (Alan Woodward, Adrien
Grand)
* LUCENE-7610: The facets module now uses the DoubleValuesSource API, and
methods that take ValueSource parameters are deprecated (Alan Woodward)
* LUCENE-7611: DocumentValueSourceDictionary now takes a LongValuesSource
as a parameter, and the ValueSource equivalent is deprecated (Alan Woodward)
New features
* LUCENE-5867: Added BooleanSimilarity. (Robert Muir, Adrien Grand)
* LUCENE-7466: Added AxiomaticSimilarity. (Peilin Yang via Tommaso Teofili)
* LUCENE-7590: Added DocValuesStatsCollector to compute statistics on DocValues
fields. (Shai Erera)
* LUCENE-7587: The new FacetQuery and MultiFacetQuery helper classes
make it simpler to execute drill down when drill sideways counts are
not needed (Emmanuel Keller via Mike McCandless)
* LUCENE-6664: A new SynonymGraphFilter outputs a correct graph
structure for multi-token synonyms, separating out a
FlattenGraphFilter that is hardwired into the current
SynonymFilter. This finally makes it possible to implement
correct multi-token synonyms at search time. See
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
for details. (Mike McCandless)
* LUCENE-5325: Added LongValuesSource and DoubleValuesSource, intended as
type-safe replacements for ValueSource in the queries module. These
expose per-segment LongValues or DoubleValues iterators. (Alan Woodward, Adrien Grand)
* LUCENE-7603: Graph token streams are now handled accurately by query
parsers, by enumerating all paths and creating the corresponding
query/ies as sub-clauses (Matt Weber via Mike McCandless)
* LUCENE-7588: DrillSideways can now run queries concurrently, and
supports an IndexSearcher using an executor service to run each query
concurrently across all segments in the index (Emmanuel Keller via
Mike McCandless)
* LUCENE-7627: Added .intersect methods to SortedDocValues and
SortedSetDocValues to allow filtering their TermsEnums with a
CompiledAutomaton (Alan Woodward, Mike McCandless)
Bug Fixes
* LUCENE-7547: JapaneseTokenizerFactory was failing to close the
dictionary file it opened (Markus via Mike McCandless)
* LUCENE-7562: CompletionFieldsConsumer sometimes throws
NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless)
* LUCENE-7533: Classic query parser: disallow autoGeneratePhraseQueries=true
when splitOnWhitespace=false (and vice-versa). (Steve Rowe)
* LUCENE-7536: ASCIIFoldingFilterFactory used to return an illegal multi-term
component when preserveOriginal was set to true. (Adrien Grand)
* LUCENE-7576: Fix Terms.intersect in the default codec to detect when
the incoming automaton is a special case and throw a clearer
exception than NullPointerException (Tom Mortimer via Mike McCandless)
* LUCENE-6989: Fix Exception handling in MMapDirectory's unmap hack
support code to work with Java 9's new InaccessibleObjectException
that does not extend ReflectiveAccessException in Java 9.
(Uwe Schindler)
* LUCENE-7581: Lucene now prevents updating a doc values field that is used
in the index sort, since this would lead to corruption. (Jim
Ferenczi via Mike McCandless)
* LUCENE-7570: IndexWriter may deadlock if a commit is running while
there are too many merges running and one of the merges hits a
tragic exception (Joey Echeverria via Mike McCandless)
* LUCENE-7594: Fixed point range queries on floating-point types to recommend
using helpers for exclusive bounds that are consistent with Double.compare.
(Adrien Grand, Dawid Weiss)
* LUCENE-7606: Normalization with CustomAnalyzer would only apply the last
token filter. (Adrien Grand)
* LUCENE-7612: Removed an unused dependency from the suggester to the misc
module. (Alan Woodward)
Improvements
* LUCENE-7532: Add back lost codec file format documentation
(Shinichiro Abe via Mike McCandless)
* LUCENE-6824: TermAutomatonQuery now rewrites to TermQuery,
PhraseQuery or MultiPhraseQuery when the word automaton is simple
(Mike McCandless)
* LUCENE-7431: Allow a certain amount of overlap to be specified between the include
and exclude arguments of SpanNotQuery via negative pre and/or post arguments.
(Marc Morissette via David Smiley)
* LUCENE-7544: UnifiedHighlighter: add extension points for handling custom queries.
(Michael Braun, David Smiley)
* LUCENE-7538: Asking IndexWriter to store a too-massive text field
now throws IllegalArgumentException instead of a cryptic exception
that closes your IndexWriter (Steve Chen via Mike McCandless)
* LUCENE-7524: Added more detailed explanation of how IDF is computed in
ClassicSimilarity and BM25Similarity. (Adrien Grand)
* LUCENE-7564: AnalyzingInfixSuggester should close its IndexWriter by default
at the end of build(). (Steve Rowe)
* LUCENE-7526: Enhanced UnifiedHighlighter's passage relevancy for queries with
wildcards and sometimes just terms. Added shouldPreferPassageRelevancyOverSpeed()
which can be overridden to return false to eek out more speed in some cases.
(Timothy M. Rodriguez, David Smiley)
* LUCENE-7560: QueryBuilder.createFieldQuery is no longer final,
giving custom query parsers subclassing QueryBuilder more freedom to
control how text is analyzed and converted into a query (Matt Weber
via Mike McCandless)
* LUCENE-7537: Index time sorting now supports multi-valued sorts
using selectors (MIN, MAX, etc.) (Jim Ferenczi via Mike McCandless)
* LUCENE-7575: UnifiedHighlighter can now highlight fields with queries that don't
necessarily refer to that field (AKA requireFieldMatch==false). Disabled by default.
See UH get/setFieldMatcher. (Jim Ferenczi via David Smiley)
* LUCENE-7592: If the segments file is truncated, we now throw
CorruptIndexException instead of the more confusing EOFException
(Mike Drob via Mike McCandless)
* LUCENE-6989: Make MMapDirectory's unmap hack work with Java 9 EA (b150+):
Unmapping uses new sun.misc.Unsafe#invokeCleaner(ByteBuffer).
Java 9 now needs same permissions like Java 8;
RuntimePermission("accessClassInPackage.jdk.internal.ref")
is no longer needed. Support for older Java 9 builds was removed.
(Uwe Schindler)
* LUCENE-7401: Changed the way BKD trees pick the split dimension in order to
ensure all dimensions are indexed. (Adrien Grand)
* LUCENE-7614: Complex Phrase Query parser ignores double quotes around single token
prefix, wildcard, range queries (Mikhail Khludnev)
* LUCENE-7620: Added LengthGoalBreakIterator, a wrapper around another B.I. to skip breaks
that would create Passages that are too short. Only for use with the UnifiedHighlighter
(and probably PostingsHighlighter). (David Smiley)
Optimizations
* LUCENE-7568: Optimize merging when index sorting is used but the
index is already sorted (Jim Ferenczi via Mike McCandless)
* LUCENE-7563: The BKD in-memory index for dimensional points now uses
a compressed format, using substantially less RAM in some cases
(Adrien Grand, Mike McCandless)
* LUCENE-7583: BKD writing now buffers each leaf block in heap before
writing to disk, giving a small speedup in points-heavy use cases.
(Mike McCandless)
* LUCENE-7572: Doc values queries now cache their hash code. (Adrien Grand)
Other
* LUCENE-7546: Fixed references to benchmark wikipedia data and the Jenkins line-docs file
(David Smiley)
* LUCENE-7534: fix smokeTestRelease.py to run on Cygwin (Mikhail Khludnev)
* LUCENE-7559: UnifiedHighlighter: Make Passage and OffsetsEnum more exposed to allow
passage creation to be customized. (David Smiley)
* LUCENE-7599: Simplify TestRandomChains using Java's built-in Predicate and
Function interfaces. (Ahmet Arslan via Adrien Grand)
* LUCENE-7595: Improve RAMUsageTester in test-framework to estimate memory usage of
runtime classes and work with Java 9 EA (b148+). Disable static field heap usage
checker in LuceneTestCase. (Uwe Schindler, Dawid Weiss)
Build
* LUCENE-7387: fix defaultCodec in build.xml to account for the line ending (hossman)
* LUCENE-7543: Make changes-to-html target an offline operation, by moving the
Lucene and Solr DOAP RDF files into the Git source repository under
dev-tools/doap/ and then pulling release dates from those files, rather than
from JIRA. (Mano Kovacs, hossman, Steve Rowe)
* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9
build 148+. Also update JGit version for working-copy checks. (Uwe Schindler)
======================= Lucene 6.3.0 =======================
API Changes
New Features
* LUCENE-7438: New "UnifiedHighlighter" derivative of the PostingsHighlighter that
can consume offsets from postings, term vectors, or analysis. It can highlight phrases
as accurately as the standard Highlighter. Light term vectors can be used with offsets
in postings for fast wildcard (MultiTermQuery) highlighting.
(David Smiley, Timothy Rodriguez)
* LUCENE-7490: SimpleQueryParser now parses '*' to MatchAllDocsQuery
(Lee Hinman via Mike McCandless)
Bug Fixes
* LUCENE-7507: Upgrade morfologik-stemming to version 2.1.1 (fixes security
manager issue with Polish dictionary lookup). (Dawid Weiss)
* LUCENE-7472: MultiFieldQueryParser.getFieldQuery() drops queries that are
neither BooleanQuery nor TermQuery. (Steve Rowe)
* LUCENE-7456: PerFieldPostings/DocValues was failing to delegate the
merge method (Julien MASSENET via Mike McCandless)
* LUCENE-7468: ASCIIFoldingFilter should not emit duplicated tokens when
preserve original is on. (David Causse via Adrien Grand)
* LUCENE-7484: FastVectorHighlighter failed to highlight SynonymQuery
(Jim Ferenczi via Mike McCandless)
* LUCENE-7476: JapaneseNumberFilter should not invoke incrementToken
on its input after it's exhausted (Andy Hind via Mike McCandless)
* LUCENE-7486: DisjunctionMaxQuery does not work correctly with queries that
return negative scores. (Ivan Provalov, Uwe Schindler, Adrien Grand)
* LUCENE-7491: Suddenly turning on dimensional points for some fields
that already exist in an index but didn't previously index
dimensional points could cause unexpected merge exceptions (Hans
Lund, Mike McCandless)
* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points.
(Hossman)
* LUCENE-7493: FacetCollector.search threw an unexpected exception if
you asked for zero hits but wanted facets (Mahesh via Mike McCandless)
* LUCENE-7505: AnalyzingInfixSuggester returned invalid results when
allTermsRequired is false and context filters are specified (Mike
McCandless)
* LUCENE-7429: AnalyzerWrapper can now modify the normalization chain too and
DelegatingAnalyzerWrapper does the right thing automatically. (Adrien Grand)
* LUCENE-7135: Lucene's check for 32 or 64 bit JVM now works around security
manager blocking access to some properties (Aaron Madlon-Kay via
Mike McCandless)
Improvements
* LUCENE-7439: FuzzyQuery now matches all terms within the specified
edit distance, even if they are short terms (Mike McCandless)
* LUCENE-7496: Better toString for SweetSpotSimilarity (janhoy)
* LUCENE-7520: Highlighter's WeightedSpanTermExtractor shouldn't attempt to expand a MultiTermQuery
when its field doesn't match the field the extraction is scoped to.
(Cao Manh Dat via David Smiley)
Optimizations
* LUCENE-7501: BKDReader should not store the split dimension explicitly in the
1D case. (Adrien Grand)
Other
* LUCENE-7513: Upgrade randomizedtesting to 2.4.0. (Dawid Weiss)
* LUCENE-7452: Block join query exception suggests how to find a doc, which
violates orthogonality requirement. (Mikhail Khludnev)
* LUCENE-7438: Renovate the Benchmark module's support for benchmarking highlighting. All
highlighters are supported via SearchTravRetHighlight. (David Smiley)
Build
* LUCENE-7292: Fix build to use "--release 8" instead of "-release 8" on
Java 9 (this changed with recent EA build b135). (Uwe Schindler)
======================= Lucene 6.2.1 =======================
API Changes
* LUCENE-7436: MinHashFilter's constructor, and some of its default
settings, should be public. (Doug Turnbull via Mike McCandless)
Bug Fixes
* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when
trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one
term. (Thomas Kappler via David Smiley)
* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an
ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs)
with large skips. (yonik)
* LUCENE-7442: MinHashFilter's ctor should validate its args.
(Cao Manh Dat via Steve Rowe)
* LUCENE-7318: Fix backwards compatibility issues around StandardAnalyzer
and its components, introduced with Lucene 6.2.0. The moved classes
were restored in their original packages: LowercaseFilter and StopFilter,
as well as several utility classes. (Uwe Schindler, Mike McCandless)
======================= Lucene 6.2.0 =======================
API Changes
* ScoringWrapperSpans was removed since it had no purpose or effect as of Lucene 5.5.
New Features
* LUCENE-7388: Add point based IntRangeField, FloatRangeField, LongRangeField along with
supporting queries and tests (Nick Knize)
* LUCENE-7381: Add point based DoubleRangeField and RangeFieldQuery for
indexing and querying on Ranges up to 4 dimensions (Nick Knize)
* LUCENE-6968: LSH Filter (Tommaso Teofili, Andy Hind, Cao Manh Dat)
* LUCENE-7302: IndexWriter methods that change the index now return a
long "sequence number" indicating the effective equivalent
single-threaded execution order (Mike McCandless)
* LUCENE-7335: IndexWriter's commit data is now late binding,
recording key/values from a provided iterable based on when the
commit actually takes place (Mike McCandless)
* LUCENE-7287: UkrainianMorfologikAnalyzer is a new dictionary-based
analyzer for the Ukrainian language (Andriy Rysin via Mike
McCandless)
* LUCENE-7373: Directory.renameFile, which did both renaming and fsync
of the directory metadata, has been deprecated; use the new separate
methods Directory.rename and Directory.syncMetaData instead (Robert Muir,
Uwe Schindler, Mike McCandless)
* LUCENE-7355: Added Analyzer#normalize(), which only applies normalization to
an input string. (Adrien Grand)
* LUCENE-7380: Add Polygon.fromGeoJSON for more easily creating
Polygon instances from a standard GeoJSON string (Robert Muir, Mike
McCandless)
* LUCENE-7395: PerFieldSimilarityWrapper requires a default similarity
for calculating query norm and coordination factor in Lucene 6.x.
Lucene 7 will no longer have those factors. (Uwe Schindler, Sascha Markus)
* SOLR-9279: Queries module: new ComparisonBoolFunction base class
(Doug Turnbull via David Smiley)
Bug Fixes
* LUCENE-6662: Fixed potential resource leaks. (Rishabh Patel via Adrien Grand)
* LUCENE-7340: MemoryIndex.toString() could throw NPE; fixed. Renamed to toStringDebug().
(Daniel Collins, David Smiley)
* LUCENE-7382: Fix bug introduced by LUCENE-7355 that used the
wrong default AttributeFactory for new Tokenizers.
(Terry Smith, Uwe Schindler)
* LUCENE-7389: Fix FieldType.setDimensions(...) validation for the dimensionNumBytes
parameter. (Martijn van Groningen)
* LUCENE-7391: Fix performance regression in MemoryIndex's fields() introduced
in Lucene 6. (Steve Mason via David Smiley)
* LUCENE-7395, SOLR-9315: Fix PerFieldSimilarityWrapper to also delegate query
norm and coordination factor using a default similarity added as ctor param.
(Uwe Schindler, Sascha Markus)
* SOLR-9413: Fix analysis/kuromoji's CSVUtil.quoteEscape logic, add TestCSVUtil test.
(AppChecker, Christine Poerschke)
* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup
PositionIncrementAttribute every time. (Mike McCandless, Robert Muir)
Improvements
* LUCENE-7323: Compound file writing now verifies the incoming
sub-files' checkums and segment IDs, to catch hardware issues or
filesytem bugs earlier (Robert Muir, Mike McCandless)
* LUCENE-6766: Index time sorting has graduated from the misc module
to core, is much simpler to use, via
IndexWriter.setIndexSort, and now works with dimensional points.
(Adrien Grand, Mike McCandless)
* LUCENE-5931: Detect when an application tries to reopen an
IndexReader after (illegally) removing the old index and
reindexing (Vitaly Funstein, Robert Muir, Mike McCandless)
* LUCENE-6171: Lucene now passes the StandardOpenOption.CREATE_NEW
option when writing new files so the filesystem enforces our
write-once architecture, possibly catching externally caused
issues sooner (Robert Muir, Mike McCandless)
* LUCENE-7318: StandardAnalyzer has been moved from the analysis
module into core and is now the default analyzer in
IndexWriterConfig (Robert Muir, Mike McCandless)
* LUCENE-7345: RAMDirectory now enforces write-once files as well
(Robert Muir, Mike McCandless)
* LUCENE-7337: MatchNoDocsQuery now scores with 0 normalization factor
and empty boolean queries now rewrite to MatchNoDocsQuery instead of
vice/versa (Jim Ferenczi via Mike McCandless)
* LUCENE-7359: Add equals() and hashCode() to Explanation (Alan Woodward)
* LUCENE-7353: ScandinavianFoldingFilterFactory and
ScandinavianNormalizationFilterFactory now implement MultiTermAwareComponent.
(Adrien Grand)
* LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
control whether to split on whitespace prior to text analysis. Default
behavior remains unchanged: split-on-whitespace=true. (Steve Rowe)
* LUCENE-7276: MatchNoDocsQuery now includes an optional reason for
why it was used (Jim Ferenczi via Mike McCandless)
* LUCENE-7355: AnalyzingQueryParser now only applies the subset of the analysis
chain that is about normalization for range/fuzzy/wildcard queries.
(Adrien Grand)
* LUCENE-7376: Add support for ToParentBlockJoinQuery to fast vector highlighter's
FieldQuery. (Martijn van Groningen)
* LUCENE-7385: Improve/fix assert messages in SpanScorer. (David Smiley)
* LUCENE-7393: Add ICUTokenizer option to parse Myanmar text as syllables instead of words,
because the ICU word-breaking algorithm has some issues. This allows for the previous
tokenization used before Lucene 5. (AM, Robert Muir)
* LUCENE-7409: Changed MMapDirectory's unmapping to work safer, but still with
no guarantees. This uses a store-store barrier and yields the current thread
before unmapping to allow in-flight requests to finish. The new code no longer
uses WeakIdentityMap as it delegates all ByteBuffer reads throgh a new
ByteBufferGuard wrapper that is shared between all ByteBufferIndexInput clones.
(Robert Muir, Uwe Schindler)
Optimizations
* LUCENE-7330, LUCENE-7339: Speed up conjunction queries. (Adrien Grand)
* LUCENE-7356: SearchGroup tweaks. (Christine Poerschke)
* LUCENE-7351: Doc id compression for points. (Adrien Grand)
* LUCENE-7371: Point values are now better compressed using run-length
encoding. (Adrien Grand)
* LUCENE-7311: Cached term queries do not seek the terms dictionary anymore.
(Adrien Grand)
* LUCENE-7396, LUCENE-7399: Faster flush of points.
(Adrien Grand, Mike McCandless)
* LUCENE-7406: Automaton and PrefixQuery tweaks (fewer object (re)allocations).
(Christine Poerschke)
Other
* LUCENE-4787: Fixed some highlighting javadocs. (Michael Dodsworth via Adrien
Grand)
* LUCENE-7334: Update ASM dependency to 5.1. (Uwe Schindler)
* LUCENE-7346: Update forbiddenapis to version 2.2.
(Uwe Schindler)
* LUCENE-7360: Explanation.toHtml() is deprecated. (Alan Woodward)
* LUCENE-7372: Factor out an org.apache.lucene.search.FilterWeight class.
(Christine Poerschke, Adrien Grand, David Smiley)
* LUCENE-7384: Removed ScoringWrapperSpans. And tweaked SpanWeight.buildSimWeight() to
reuse the existing Similarity instead of creating a new one. (David Smiley)
======================= Lucene 6.1.0 =======================
New Features
* LUCENE-7099: Add LatLonDocValuesField.newDistanceSort to the sandbox.
(Robert Muir)
* LUCENE-7140: Add PlanetModel.bisection to spatial3d (Karl Wright via
Mike McCandless)
* LUCENE-7069: Add LatLonPoint.nearest, to find nearest N points to a
provided query point (Mike McCandless)
* LUCENE-7234: Added InetAddressPoint.nextDown/nextUp to easily generate range
queries with excluded bounds. (Adrien Grand)
* LUCENE-7300: The misc module now has a directory wrapper that uses hard-links if
applicable and supported when copying files from another FSDirectory in
Directory#copyFrom. (Simon Willnauer)
API Changes
* LUCENE-7184: Refactor LatLonPoint encoding methods to new GeoEncodingUtils
helper class in core geo package. Also refactors LatLonPointTests to
TestGeoEncodingUtils (Nick Knize)
* LUCENE-7163: refactor GeoRect, Polygon, and GeoUtils tests to geo
package in core (Nick Knize)
* LUCENE-7152: Refactor GeoUtils from lucene-spatial package to
core (Nick Knize)
* LUCENE-7141: Switch OfflineSorter's ByteSequencesReader to
BytesRefIterator (Mike McCandless)
* LUCENE-7150: Spatial3d gets useful APIs to create common shape
queries, matching LatLonPoint. (Karl Wright via Mike McCandless)
* LUCENE-7243: Removed the LeafReaderContext parameter from
QueryCachingPolicy#shouldCache. (Adrien Grand)
Optimizations
* LUCENE-7071: Reduce bytes copying in OfflineSorter, giving ~10%
speedup on merging 2D LatLonPoint values (Mike McCandless)
* LUCENE-7105, LUCENE-7215: Optimize LatLonPoint's newDistanceQuery.
(Robert Muir)
* LUCENE-7097: IntroSorter now recurses to 2 * log_2(count) quicksort
stack depth before switching to heapsort (Adrien Grand, Mike McCandless)
* LUCENE-7115: Speed up FieldCache.CacheEntry toString by setting initial
StringBuilder capacity (Gregory Chanan)
* LUCENE-7147: Improve disjoint check for geo distance query traversal
(Ryan Ernst, Robert Muir, Mike McCandless)
* LUCENE-7153: GeoPointField and LatLonPoint polygon queries now support
multiple polygons and holes, with memory usage independent of
polygon complexity. (Karl Wright, Mike McCandless, Robert Muir)
* LUCENE-7159: Speed up LatLonPoint polygon performance. (Robert Muir, Ryan Ernst)
* LUCENE-7211: Reduce memory & GC for spatial RPT Intersects when the number of
matching docs is small. (Jeff Wartes, David Smiley)
* LUCENE-7235: LRUQueryCache should not take a lock for segments that it will
not cache on anyway. (Adrien Grand)
* LUCENE-7238: Explicitly disable the query cache in MemoryIndex#createSearcher.
(Adrien Grand)
* LUCENE-7237: LRUQueryCache now prefers returning an uncached Scorer than
waiting on a lock. (Adrien Grand)
* LUCENE-7261, LUCENE-7262, LUCENE-7264, LUCENE-7258: Speed up DocIdSetBuilder
(which is used by TermsQuery, multi-term queries and several point queries).
(Adrien Grand, Jeff Wartes, David Smiley)
* LUCENE-7299: Speed up BytesRefHash.sort() using radix sort. (Adrien Grand)
* LUCENE-7306: Speed up points indexing and merging using radix sort.
(Adrien Grand)
Bug Fixes
* LUCENE-7127: Fix corner case bugs in GeoPointDistanceQuery. (Robert Muir)
* LUCENE-7166: Fix corner case bugs in LatLonPoint/GeoPointField bounding box
queries. (Robert Muir)
* LUCENE-7168: Switch to stable encode for geo3d, remove quantization
test leniency, remove dead code (Mike McCandless)
* LUCENE-7301: Multiple doc values updates to the same document within
one update batch could be applied in the wrong order resulting in
the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless)
* LUCENE-7312: Fix geo3d's x/y/z double to int encoding to ensure it always
rounds down (Karl Wright, Mike McCandless)
* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases
where ranges of documents had only a single clause matching while
other ranges had more than one clause matching (Ahmet Arslan,
hossman, Mike McCandless)
* LUCENE-7286: Added support for highlighting SynonymQuery. (Adrien Grand)
* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the
dateline and indexed non-point shapes are much bigger than the heatmap region.
(David Smiley)
* LUCENE-7333: Fix test bug where randomSimpleString() generated a filename
that is a reserved device name on Windows. (Uwe Schindler, Mike McCandless)
Other
* LUCENE-7295: TermAutomatonQuery.hashCode calculates Automaton.toDot().hash,
equivalence relationship replaced with object identity. (Dawid Weiss)
* LUCENE-7277: Make Query.hashCode and Query.equals abstract. (Paul Elschot,
Dawid Weiss)
* LUCENE-7174: Upgrade randomizedtesting to 2.3.4. (Uwe Schindler, Dawid Weiss)
* LUCENE-7205: Remove repeated nl.getLength() calls in
(Boolean|DisjunctionMax|FuzzyLikeThis)QueryBuilder. (Christine Poerschke)
* LUCENE-7210: Make TestCore*Parser's analyzer choice override-able
(Christine Poerschke, Daniel Collins)
* LUCENE-7263: Make queryparser/xml/CoreParser's SpanQueryBuilderFactory
accessible to deriving classes. (Daniel Collins via Christine Poerschke)
* SOLR-9109/SOLR-9121: Allow specification of a custom Ivy settings file via system
property "ivysettings.xml". (Misha Dmitriev, Christine Poerschke, Uwe Schindler, Steve Rowe)
* LUCENE-7206: Improve the ToParentBlockJoinQuery's explain by including the explain
of the best matching child doc. (Ilya Kasnacheev, Jeff Evans via Martijn van Groningen)
* LUCENE-7307: Add getters to the PointInSetQuery and PointRangeQuery queries.
(Martijn van Groningen, Adrien Grand)
Build
* LUCENE-7292: Use '-release' instead of '-source/-target' during
compilation on Java 9+ to ensure real cross-compilation.
(Uwe Schindler)
* LUCENE-7296: Update forbiddenapis to version 2.1.
(Uwe Schindler)
======================= Lucene 6.0.1 =======================
New Features
* LUCENE-7278: Spatial-extras DateRangePrefixTree's Calendar is now configurable, to
e.g. clear the Gregorian Change Date. Also, toString(cal) is now identical to
DateTimeFormatter.ISO_INSTANT. (David Smiley)
Bug Fixes
* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations
should delegate to the wrapped weight. (Martijn van Groningen)
* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand)
* LUCENE-7232: Fixed InetAddressPoint.newPrefixQuery, which was generating an
incorrect query when the prefix length was not a multiple of 8. (Adrien Grand)
* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException
on some valid inputs (Mike McCandless)
* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll()
that led to IllegalStateException being thrown when nothing was wrong.
(David Smiley, yonik)
* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders
match the underlying queries' (lower|upper)Term optionality logic.
(Kaneshanathan Srivisagan, Christine Poerschke)
* LUCENE-7257: Fixed PointValues#size(IndexReader, String), docCount,
minPackedValue and maxPackedValue to skip leaves that do not have points
rather than raising an IllegalStateException. (Adrien Grand)
* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan
Woodward)
* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term
phrase queries. (Eva Popenda, Alan Woodward)
* LUCENE-7293: Don't try to highlight GeoPoint queries (Britta Weber,
Nick Knize, Mike McCandless, Uwe Schindler)
Documentation
* LUCENE-7223: Improve XXXPoint javadocs to make it clear that you
should separately add StoredField if you want to retrieve these
field values at search time (Greg Huber, Robert Muir, Mike McCandless)
======================= Lucene 6.0.0 =======================
System Requirements
* LUCENE-5950: Move to Java 8 as minimum Java version.
(Ryan Ernst, Uwe Schindler)
* LUCENE-6069: Lucene Core now gets compiled with Java 8 "compact1" profile,
all other modules with "compact2". (Robert Muir, Uwe Schindler)
New Features
* LUCENE-6631: Lucene Document classification (Tommaso Teofili, Alessandro Benedetti)
* LUCENE-6747: FingerprintFilter is a TokenFilter that outputs a single
token which is a concatenation of the sorted and de-duplicated set of
input tokens. Useful for normalizing short text in clustering/linking
tasks. (Mark Harwood, Adrien Grand)
* LUCENE-5735: NumberRangePrefixTreeStrategy now includes interval/range faceting
for counting ranges that align with the underlying terms as defined by the
NumberRangePrefixTree (e.g. familiar date units like days). (David Smiley)
* LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field
length computations, to avoid skew from documents that don't have the field.
(Ahmet Arslan via Robert Muir)
* LUCENE-6758: Use docCount+1 for DefaultSimilarity's IDF, so that queries
containing nonexistent fields won't screw up querynorm. (Terry Smith, Robert Muir)
* SOLR-7876: The QueryTimeout interface now has a isTimeoutEnabled method
that can return false to exit from ExitableDirectoryReader wrapping at
the point fields() is called. (yonik)
* LUCENE-6825: Add low-level support for block-KD trees (Mike McCandless)
* LUCENE-6852, LUCENE-6975: Add support for points (dimensionally
indexed values) to index, document and codec APIs, including a
simple text implementation. (Mike McCandless)
* LUCENE-6861: Create Lucene60Codec, supporting points.
(Mike McCandless)
* LUCENE-6879: Allow to define custom CharTokenizer instances without
subclassing using Java 8 lambdas or method references. (Uwe Schindler)
* LUCENE-6881: Cutover all BKD implementations to points
(Mike McCandless)
* LUCENE-6837: Add N-best output support to JapaneseTokenizer.
(Hiroharu Konno via Christian Moen)
* LUCENE-6962: Add per-dimension min/max to points
(Mike McCandless)
* LUCENE-6975: Add ExactPointQuery, to match a single N-dimensional
point (Robert Muir, Mike McCandless)
* LUCENE-6989: Add preliminary support for MMapDirectory unmapping in Java 9.
(Uwe Schindler, Chris Hegarty, Peter Levart)
* LUCENE-7040: Upgrade morfologik-stemming to version 2.1.0.
(Dawid Weiss)
* LUCENE-7048: Add XXXPoint.newSetQuery, to create a query that
efficiently matches all documents containing any of the specified
point values. This is the analog of TermsQuery, but for points
instead. (Adrien Grand, Robert Muir, Mike McCandless)
API Changes
* LUCENE-7094: BBoxStrategy and PointVectorStrategy now support
PointValues (in addition to legacy numeric trie). Their APIs
were changed a little and also made more consistent. PointValues/Trie
is optional, DocValues is optional, stored value is optional.
(Nick Knize, David Smiley)
* LUCENE-6067: Accountable.getChildResources has a default
implementation returning the empty list. (Robert Muir)
* LUCENE-6583: FilteredQuery has been removed. Instead, you can construct a
BooleanQuery with one MUST clause for the query, and one FILTER clause for
the filter. (Adrien Grand)
* LUCENE-6651: AttributeImpl#reflectWith(AttributeReflector) was made
abstract and has no reflection-based default implementation anymore.
(Uwe Schindler)
* LUCENE-6706: PayloadTermQuery and PayloadNearQuery have been removed.
Instead, use PayloadScoreQuery to wrap any SpanQuery. (Alan Woodward)
* LUCENE-6829: OfflineSorter, and the classes that use it (suggesters,
hunspell) now do all temporary file IO via Directory instead of
directly through java's temp dir. Directory.createTempOutput
creates a uniquely named IndexOutput, and the new
IndexOutput.getName returns its name (Dawid Weiss, Robert Muir, Mike
McCandless)
* LUCENE-6917: Deprecate and rename NumericXXX classes to
LegacyNumericXXX in favor of points (Mike McCandless)
* LUCENE-6947: SortField.missingValue is now protected. You can read its
value using the new SortField.getMissingValue getter. (Adrien Grand)
* LUCENE-7028: Remove duplicate method in LegacyNumericUtils.
(Uwe Schindler)
* LUCENE-7052, LUCENE-7053: Remove custom comparators from BytesRef
class and solely use natural byte[] comparator throughout codebase.
This also simplifies API of BytesRefHash. It also replaces the natural
comparator in ArrayUtil by Java 8's Comparator#naturalOrder().
(Mike McCandless, Uwe Schindler, Robert Muir)
* LUCENE-7060: Update Spatial4j to 0.6. The package com.spatial4j.core
is now org.locationtech.spatial4j. (David Smiley)
* LUCENE-7058: Add getters to various Query implementations (Guillaume Smet via
Alan Woodward)
* LUCENE-7064: MultiPhraseQuery is now immutable and should be constructed
with MultiPhraseQuery.Builder. (Luc Vanlerberghe via Adrien Grand)
* LUCENE-7072: Geo3DPoint always uses WGS84 planet model.
(Robert Muir, Mike McCandless)
* LUCENE-7056: Geo3D classes are in different packages now. (David Smiley)
* LUCENE-6952: These classes are now abstract: FilterCodecReader, FilterLeafReader,
FilterCollector, FilterDirectory. And some Filter* classes in
lucene-test-framework too. (David Smiley)
* SOLR-8867: FunctionValues.getRangeScorer now takes a LeafReaderContext instead
of an IndexReader, and avoids matching documents without a value in the field
for numeric fields. (yonik)
Optimizations
* LUCENE-6891: Use prefix coding when writing points in
each leaf block in the default codec, to reduce the index
size (Mike McCandless)
* LUCENE-6901: Optimize points indexing: use faster
IntroSorter instead of InPlaceMergeSorter, and specialize 1D
merging to merge sort the already sorted segments instead of
re-indexing (Mike McCandless)
* LUCENE-6793: LegacyNumericRangeQuery.hashCode() is now less subject to hash
collisions. (J.B. Langston via Adrien Grand)
* LUCENE-7050: TermsQuery is now cached more aggressively by the default
query caching policy. (Adrien Grand)
* LUCENE-7066: PointRangeQuery got optimized for the case that all documents
have a value and all points from the segment match. (Adrien Grand)
Changes in Runtime Behavior
* LUCENE-6789: IndexSearcher's default Similarity is changed to BM25Similarity.
Use ClassicSimilarity to get the old vector space DefaultSimilarity. (Robert Muir)
* LUCENE-6886: Reserve the .tmp file name extension for temp files,
and codec components are no longer allowed to use this extension
(Robert Muir, Mike McCandless)
* LUCENE-6835: Directory.listAll now returns entries in sorted order,
to not leak platform-specific behavior, and "retrying file deletion"
is now the responsibility of Directory.deleteFile, not the caller.
(Robert Muir, Mike McCandless)
Tests
* LUCENE-7009: Add expectThrows utility to LuceneTestCase. This uses a lambda
expression to encapsulate a statement that is expected to throw an exception.
(Ryan Ernst)
Bug Fixes
* LUCENE-7065: Fix the explain for the global ordinals join query. Before the
explain would also indicate that non matching documents would match.
On top of that with score mode average, the explain would fail with a NPE.
(Martijn van Groningen)
* LUCENE-7101: OfflineSorter had O(N^2) merge cost, and used too many
temporary file descriptors, for large sorts (Mike McCandless)
* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for
Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe)
* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance
implementation (Karl Wright via Mike McCandless)
* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called
on queries that could not be extracted. (Adrien Grand)
* LUCENE-7126: Remove GeoPointDistanceRangeQuery. This query was implemented
with boolean NOT, and incorrect for multi-valued documents. (Robert Muir)
* LUCENE-7158: Consistently use earth's WGS84 mean radius wherever our
geo search implementations approximate the earth as a sphere (Karl
Wright via Mike McCandless)
Other
* LUCENE-7035: Upgrade icu4j to 56.1/unicode 8. (Robert Muir)
* LUCENE-7087: Let MemoryIndex#fromDocument(...) accept 'Iterable<? extends IndexableField>'
as document instead of 'Document'. (Martijn van Groningen)
* LUCENE-7091: Add doc values support to MemoryIndex
(Martijn van Groningen, David Smiley)
* LUCENE-7093: Add point values support to MemoryIndex
(Martijn van Groningen, Mike McCandless)
* LUCENE-7095: Add point values support to the numeric field query time join.
(Martijn van Groningen, Mike McCandless)
======================= Lucene 5.5.5 =======================
Changes in Runtime Behavior
* Resolving of external entities in queryparser/xml/CoreParser is disallowed
by default. See SOLR-11477 for details.
Bug Fixes
* LUCENE-7419: Fix performance bug with TokenStream.end(), where it would lookup
PositionIncrementAttribute every time. (Mike McCandless, Robert Muir)
* SOLR-11477: Disallow resolving of external entities in queryparser/xml/CoreParser
by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)
======================= Lucene 5.5.4 =======================
Bug Fixes
* LUCENE-7417: The standard Highlighter could throw an IllegalArgumentException when
trying to highlight a query containing a degenerate case of a MultiPhraseQuery with one
term. (Thomas Kappler via David Smiley)
* LUCENE-7657: Fixed potential memory leak in the case that a (Span)TermQuery
with a TermContext is cached. (Adrien Grand)
* LUCENE-7647: Made stored fields reclaim native memory more aggressively when
configured with BEST_COMPRESSION. This could otherwise result in out-of-memory
issues. (Adrien Grand)
* LUCENE-7562: CompletionFieldsConsumer sometimes throws
NullPointerException on ghost fields (Oliver Eilhard via Mike McCandless)
* LUCENE-7547: JapaneseTokenizerFactory was failing to close the
dictionary file it opened (Markus via Mike McCandless)
* LUCENE-6914: Fixed DecimalDigitFilter in case of supplementary code points.
(Hossman)
* LUCENE-7440: Document id skipping (PostingsEnum.advance) could throw an
ArrayIndexOutOfBoundsException exception on large index segments (>1.8B docs)
with large skips. (yonik)
* LUCENE-7570: IndexWriter may deadlock if a commit is running while
there are too many merges running and one of the merges hits a
tragic exception (Joey Echeverria via Mike McCandless)
Other
* LUCENE-6989: Backport MMapDirectory's unmapping code from Lucene 6.4 to use
MethodHandles. This allows it to work with Java 9 (EA build 150 and later).
(Uwe Schindler)
Build
* LUCENE-7543: Make changes-to-html target an offline operation, by moving the
Lucene and Solr DOAP RDF files into the Git source repository under
dev-tools/doap/ and then pulling release dates from those files, rather than
from JIRA. (Mano Kovacs, hossman, Steve Rowe)
* LUCENE-7596: Update Groovy to version 2.4.8 to allow building with Java 9
build 148+. Also update JGit version for working-copy checks. This does not
fix all issues with Java 9, but allows to build the distribution.
(Uwe Schindler)
* LUCENE-7651: Backport (Lucene 6.4.1) fix for Java 8u121 to allow documentation
build to inject "Google Code Prettify" without adding Javascript to Javadocs's
-bottom parameter. Unfortunately, this fix disables Prettify if Javadocs are
built with Java 7, as there is no generic way in Java 7 to inject Javascript
without breaking Java 8 (and possible paid Java 7 security updates). This
fix also updates Prettify to latest version to work around a Google Chrome
issue. (Uwe Schindler)
======================= Lucene 5.5.3 =======================
(No Changes)
======================= Lucene 5.5.2 =======================
Bug Fixes
* LUCENE-7065: Fix the explain for the global ordinals join query. Before the
explain would also indicate that non matching documents would match.
On top of that with score mode average, the explain would fail with a NPE.
(Martijn van Groningen)
* LUCENE-7111: DocValuesRangeQuery.newLongRange behaves incorrectly for
Long.MAX_VALUE and Long.MIN_VALUE (Ishan Chattopadhyaya via Steve Rowe)
* LUCENE-7139: Fix bugs in geo3d's Vincenty surface distance
implementation (Karl Wright via Mike McCandless)
* LUCENE-7187: Block join queries' Weight#extractTerms(...) implementations
should delegate to the wrapped weight. (Martijn van Groningen)
* LUCENE-7279: JapaneseTokenizer throws ArrayIndexOutOfBoundsException
on some valid inputs (Mike McCandless)
* LUCENE-7219: Make queryparser/xml (Point|LegacyNumeric)RangeQuery builders
match the underlying queries' (lower|upper)Term optionality logic.
(Kaneshanathan Srivisagan, Christine Poerschke)
* LUCENE-7284: GapSpans needs to implement positionsCost(). (Daniel Bigham, Alan
Woodward)
* LUCENE-7231: WeightedSpanTermExtractor didn't deal correctly with single-term
phrase queries. (Eva Popenda, Alan Woodward)
* LUCENE-7301: Multiple doc values updates to the same document within
one update batch could be applied in the wrong order resulting in
the wrong updated value (Ishan Chattopadhyaya, hossman, Mike McCandless)
* LUCENE-7132: BooleanQuery sometimes assigned too-low scores in cases
where ranges of documents had only a single clause matching while
other ranges had more than one clause matching (Ahmet Arslan,
hossman, Mike McCandless)
* LUCENE-7291: Spatial heatmap faceting could mis-count when the heatmap crosses the
dateline and indexed non-point shapes are much bigger than the heatmap region.
(David Smiley)
======================= Lucene 5.5.1 =======================
Bug fixes
* LUCENE-7112: WeightedSpanTermExtractor.extractUnknownQuery is only called
on queries that could not be extracted. (Adrien Grand)
* LUCENE-7188: remove incorrect sanity check in NRTCachingDirectory.listAll()
that led to IllegalStateException being thrown when nothing was wrong.
(David Smiley, yonik)
* LUCENE-7209: Fixed explanations of FunctionScoreQuery. (Adrien Grand)
======================= Lucene 5.5.0 =======================
New Features
* LUCENE-5868: JoinUtil.createJoinQuery(..,NumericType,..) query-time join
for LONG and INT fields with NUMERIC and SORTED_NUMERIC doc values.
(Alexey Zelin via Mikhail Khludnev)
* LUCENE-6939: Add exponential reciprocal scoring to
BlendedInfixSuggester, to even more strongly favor suggestions that
match closer to the beginning (Arcadius Ahouansou via Mike McCandless)
* LUCENE-6958: Improved CustomAnalyzer to take class references to factories
as alternative to their SPI name. This enables compile-time safety when
defining analyzer's components. (Uwe Schindler, Shai Erera)
* LUCENE-6818, LUCENE-6986: Add DFISimilarity implementing the divergence
from independence model. (Ahmet Arslan via Robert Muir)
* SOLR-4619: Added removeAllAttributes() to AttributeSource, which removes
all previously added attributes.
* LUCENE-7010: Added MergePolicyWrapper to allow easy wrapping of other policies.
(Shai Erera)
API Changes
* LUCENE-6997: refactor sandboxed GeoPointField and query classes to lucene-spatial
module under new lucene.spatial.geopoint package (Nick Knize)
* LUCENE-6908: GeoUtils static relational methods have been refactored to new
GeoRelationUtils and now correctly handle large irregular rectangles, and
pole crossing distance queries. (Nick Knize)
* LUCENE-6900: Grouping sortWithinGroup variables used to allow null to mean
Sort.RELEVANCE. Null is no longer permitted. (David Smiley)
* LUCENE-6919: The Scorer class has been refactored to expose an iterator
instead of extending DocIdSetIterator. asTwoPhaseIterator() has been renamed
to twoPhaseIterator() for consistency. (Adrien Grand)
* LUCENE-6973: TeeSinkTokenFilter no longer accepts a SinkFilter (the latter