Skip to content

v4.8.0-beta00012

Pre-release
Pre-release
Compare
Choose a tag to compare
@NightOwl888 NightOwl888 released this 19 Sep 08:22
· 1120 commits to master since this release

This release contains important bug fixes and performance enhancements.

Known Issues

  • After installation, when upgrading from versions of Lucene.Net 4.8.0-beta00009 or higher may require a restart of all instances of Visual Studio in order to reload the code analysis analyzer.

  • The lucene-cli tool requires an appsettings.json file, but none was shipped. Upon running lucene on the command line, the following error will be presented:

    F:\Projects\lucenenet>lucene
    Unhandled exception. System.IO.FileNotFoundException: The configuration file 'appsettings.json' was not found and is not optional. The         physical path is 'C:\Users\shad\.dotnet\tools\.store\lucene-cli\4.8.0-beta00010\lucene-cli\4.8.0-beta00010\tools\netcoreapp3.1\any\appsettings.json'.
    at Microsoft.Extensions.Configuration.FileConfigurationProvider.HandleException(ExceptionDispatchInfo info)
    at Microsoft.Extensions.Configuration.FileConfigurationProvider.Load(Boolean reload)
    at Microsoft.Extensions.Configuration.FileConfigurationProvider.Load()
    at Microsoft.Extensions.Configuration.ConfigurationRoot..ctor(IList`1 providers)
    at Microsoft.Extensions.Configuration.ConfigurationBuilder.Build()
    at Lucene.Net.Cli.Program.Main(String[] args) in D:\a\1\s\src\dotnet\tools\lucene-cli\Program.cs:line 27
    

    Adding a text file named appsettings.json to the location specified in the error message with opening and closing brackets will prevent the exception.

    appsettings.json

    {
    }
    

    IMPORTANT: There must be at least opening and closing curly brackets in the file, or it won't be parsed as valid JSON.

  • J2N versions prior to version 2.0.0-beta-0012 had an infinite recursion bug on Xamarin.Android which caused fatal crashes in Lucene.NET. Upgrading J2N to 2.0.0-beta-0012 or higher will prevent these crashes from occurring.

Change Log

Breaking Changes

  • Lucene.Net.Facet: Renamed LRUHashMap > LruDictionary. Changed all members to be virtual to allow users to provide their own LRU cache.
  • Lucene.Net.Facet.FacetsConfig: Removed ProcessSSDVFacetFields from public API (as was done in Lucene), avoid lock (this)
  • Lucene.Net.Facet.TaxonomyReader: Changed DoClose() to Dispose(bool) and implemented proper dispose pattern. Avoid lock (this).
  • Lucene.Net.Facet.WriterCache: Renamed NameInt32CacheLRU > NameIntCacheLru, NameHashInt32CacheLRU > NameHashInt32CacheLru. Refactored to utilize a generic type internally using composition to avoid boxing/unboxing without exposing the generic closing type publicly. Added public INameInt32CacheLru as a common interface between NameIntCacheLru and NameHashInt32CacheLru.
  • Lucene.Net.Facet.Taxonomy.TaxonomyReader: Restructured ChildrenIterator into ChildrenEnumerator
  • Lucene.Net.Facet.Taxonomy.CategoryPath: Changed FullPathLength from a method to a property
  • Lucene.Net.Facet.DrillSideways: Changed ScoreSubDocsAtOnce from a method to a property
  • Lucene.Net.Facet: Refactored OrdAndValue into a generic struct that can be used in both TopOrdAndSingleQueue and TopOrdAndInt32Queue. Added Insert method to Util.PriorityQueue<T> to allow adding value types without reading the previous value for reuse.
  • Lucene.Net.Analysis.Common.Miscellaneous.CapitalizationFilter: Changed default behavior to use invariant culture instead of the current thread's culture to match Lucene, which seems more natural when using filters inside of analyzers. This also fits more in line with how other filters are selected.
  • #279 - Lucene.Net.Analysis.Compound.Hyphenation.TernaryTree: Renamed Iterator > Enumerator, Keys() > GetEnumerator()
  • #279 - Lucene.Net.Benchmarks.ByTask.Feeds.DirContentSource: Renamed Iterator > Enumerator

Bugs

  • #269 - Removed cast from NGramTokenAnymousInnerClassHelper::IsTokenChar(int) that was causing surrogate pairs to fail in the TestUTF8FullRange() tests of NGramTokenizerTest and EdgeNGramTokenizerTest
  • Fixed potential issue with ArgumentExceptions being thrown from char.ConvertToUtf32(string, int) by reverting back to CodePointAt() method in TestCharTokenizers.TestCrossPlaneNomalization().
  • Lucene.Net.QueryParser.Surround.Query.ComposedQuery::MakeLuceneSubQueriesField(): Added missing using block on enumerator
  • #296 - Fixed surrogate pair and culture-sensitivity issues with many analyzers.
  • Lucene.Net.Analysis.Common: Fixed classes that were originally using invariant culture to do so again. J2N's Character class default is to use the current culture, which had changed from the prior Character class from Lucene.Net.Support that used invariant culture. Fixes TestICUFoldingFilter::TestRandomStrings().
  • Lucene.Net.ICU: Fixed ThaiWordBreaker to account for surrogate pairs. Also added locking to help with thread safety. Note that the class is still not completely thread-safe, but this patch fixes the behavior.
  • Lucene.Net.Spatial.Util.ShapeFieldCache: Removed unnecessary array allocation
  • Lucene.Net.TestFramework: Fixed LineFileDocs to read byte by byte the same way that Lucene does, except using a BufferedStream to improve performance.
  • Lucene.Net.TestFramework: Fixed NightlyAttribute, WeeklyAttribute, AwaitsFixAttribute, and SlowAttribute so they work at the class level
  • Lucene.Net.Analysis.Icu.Segmentation.ICUTokenizer: Corrected call to ICU4N.UChar.IsWhiteSpace() rather than System.Char.IsWhiteSpace(), which may return different results.
  • Lucene.Net.TestFramework.Search.SearchEquivalenceTestBase: Fixed exception when using OpenBitSet.FastGet() instead of OpenBitSet.Get(), since the size of the bit set is unknown.
  • Lucene.Net.Index.DocumentsWriterFlushControl: Fixed issue due to misbehaving locking on Monitor.TryEnter(), the code was restructured to disallow any thread that doesn't have a lock into InternalTryCheckoutForFlush() so the threads do note compete for a lock.
  • #274 - Lucene.Net.Facet: Fixed null reference exception in DrillSidewaysScorer from patch in Lucene 4.10.4 https://issues.apache.org/jira/browse/LUCENE-6001
  • Lucene.Net.Facet.Taxonomy.WriterCache.Cl2oTaxonomyWriterCache: Fixed locking on Dispose() method and made it safe to call dispose multiple times
  • Reviewed and added asserts that existed in Lucene and were missing in Lucene.NET. Effectively, this meant we were missing several test conditions that have now been put into place.
  • Lucene.Net.ICU: Added locking to ICUTokenizer to only allow a single thread to manipulate the BreakIterator at a time. This is a temporary fix to get the tests to pass until a solution is found for making BreakIterator threadsafe.
  • #332 - Lucene.Net.Replicator: Fixed an issue in IndexInputStream that meant the read method could return a number larger than what was passed in for read count or what the buffer could hold, it should instead return the total number of bytes that was read into the buffer, which logically can't be bigger than the buffer it self.
  • Lucene.Net.Tests.Index.TestIndexWithThreads::TestRollbackAndCommitWithThreads(): Must catch and ignore AssertionException, as was done in Lucene
  • Lucene.Net.Search.TopScoreDocCollector: Disabled optimizations on .NET Framework because of float comparison failures on x86 in Release mode. Fixes TestSearchAfter::TestQueries(), TestTopDocsMerge::TestSort_1(), TestTopDocsMerge::TestSort_2().
  • Lucene.Net.Sandbox.Queries.SlowFuzzyTermsEnum: Disabled optimizations on .NET Framework because of float comparison failures on x86 in Release mode. Fixes TestTokenLengthOpt().
  • Lucene.Net.Search.FuzzyTermsEnum: Disabled optimizations for Accept() method on .NET Framework because of float comparison failures on x86 in Release mode. Fixes TestTokenLengthOpt().
  • Fixed several references to J2N.BitConversion that were calling the overload that normalizes NaN when they should have been calling the raw bit conversion instead (as was done in Lucene).
  • #323 - Lucene.Net.Configuration: Removed the IConfigurationRoot interface from the ConfigurationRoot class when targeting a version of Microsoft.Extensions.Configuration less than 2.0. This will allow the end user to upgrade Microsoft.Extensions.Configuration seamlessly to versions 2.0 or higher.
  • #286 - Lucene.Net.CodeAnalysis: Separated CSharp and VisualBasic into different assemblies to prevent cross-language dependency issues when using analyzers

Improvements

  • #261 PERFORMANCE - Fixed FSTTester to delete while iterating forward instead of using .ElementAt() to iterate in reverse, which takes about 3x longer
  • #261 PERFORMANCE - Lucene.Net.Facet.Taxonomy.WriterCache.NameInt32CacheLRU: Changed from Dictionary to ConcurrentDictionary so we can delete items from the cache while forward iterating through it.
  • #261 PERFORMANCE - Lucene.Net.Index.FieldInfos: Changed Builder.FieldInfo() method to TryGetFieldInfo() to optimize check for value
  • Upgraded NuGet dependency J2N to 2.0.0-beta-0009
  • Upgraded NuGet dependency ICU4N to 60.1.0-alpha.352
  • Upgraded NuGet dependency Morfologik.Stemming to 2.1.6-beta-0007
  • #261 PERFORMANCE - Use J2N's ICollection<T>.ToArray() extension method that uses ICollection<T>.CopyTo(), which takes precedence over the LINQ IEnumerable<T>.ToArray() extension method. Benchmarks show about a 1/3 increase in performance.
  • #261 PERFORMANCE - Lucene.Net.Support.IO.FileSupport::CreateTempFile(): Optimized the check for invalid characters
  • Directory.Build.props: Disabled warnings for features that require .NET Standard 2.1
  • #261 PERFORMANCE - Eliminated several calls to FirstOrDefault(), LastOrDefault(), Skip(), First(), and Last()
  • Lucene.Net.Support.ListExtensions: Factored out BinarySearch in favor of implementation from J2N
  • Lucene.Net.Suggest.FreeTextSuggester: Converted from SubList().Clear() to RemoveRange()
  • #261 PERFORMANCE - Changed handling of LineFileDocs to unzip the file to a temp directory once per test run instead of using a MemoryStream to pick a random line from the file on each test. This significantly improves performance of many of the tests.
  • Lucene.Net.Analysis.Icu.Segmentation.ScriptIterator: Removed static constructor and initialized static state inline
  • Converted all explicit Analyzer classes to using the Analyzer.NewAnonymous() method to declare Analyzers inline.
  • #261 PERFORMANCE - Lucene.Net.Tests.Util.TestCollectionUtil: Optimized by using array instead of list for sorting tests
  • #261 PERFORMANCE - Lucene.Net.Util: Switched implementation of DisposableThreadLocal with that from RavenDB, with permission from its maintainers (https://issues.apache.org/jira/browse/LUCENENET-640?focusedCommentId=17033146&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17033146). The new implementation improves GC during several operations.
  • Lucene.Net.TestFramework.Util.LuceneTestCase: Removed TaskMergeScheduler completely from random testing
  • #261 PERFORMANCE - Moved scratch BytesRef instances outside of the loops that they were nested in so they can be reused in each iteration (as was done in Lucene)
  • #261 PERFORMANCE - Lucene.Net.Facet.Taxonomy.CachedOrdinalsReader: Refactored locking to make reads more efficient
  • #261 PERFORMANCE - Lucene.Net.Facet.Taxonomy.Directory.DirectoryTaxonomyReader: Refactored to use ReaderWriterLockSlim to make reads more efficient
  • #261 PERFORMANCE - Lucene.Net.Facet.Taxonomy.Directory.DirectoryTaxonomyWriter: Refactored locking for better efficiency
  • #265 - Lucene.Net.Facet.Taxonomy.WriterCache.Cl2oTaxonomyWriterCache: Added proper dispose pattern
  • #265 - Lucene.Net.Facet.Taxonomy.WriterCache.LruTaxonomyWriterCache: Added proper dispose pattern
  • Lucene.Net.Facet.Taxonomy.Directory.TaxonomyIndexArrays: Changed to use LazyInitializer and avoid lock (this)
  • #261 PERFORMANCE - Lucene.Net.Tests.Facet: Convert int to string in the invariant culture
  • Lucene.Net.Analysis.ICU: Updated Segmentation files to Lucene 8.6.1 to account for the latest features of ICU
  • #261 PERFORMANCE - Lucene.Net.Util.AttributeSource: Eliminated unnecessary try catch and made more efficient by using TryGetValue instead of ContainsKey followed by a lookup
  • Lucene.Net.Util: Streamlined DefaultAttributeFactory to make the get/update process of creating an attribute WeakReference atomic
  • #208 - Switch to simpler LIFO thread to ThreadState allocator during indexing. Technically, this is something from releases/lucene-solr/4.8.1, but profiling indicates it makes a huge difference in multithreaded scenarios
  • SWEEP - Removed unnecessary .NET Framework references from all test projects
  • Converted remaining compilation constants from target platforms to features to make it simpler to change targets. Eliminated references to NETSTANDARD.
  • Inverted logic so FEATURE_STACKTRACE is enabled rather than disabled when the System.Diagnositcs.StackTrace class is available.
  • Lucene.Net.Constants: Refactored to use System.Runtime.InteropServices.RuntimeInformation on .NET Framework
  • Lucene.Net.Expressions: Eliminated .NET settings file and reused JavascriptCompiler.properties file in .NET Framework so we don't have to branch for different target platforms. Simplified reading the settings by using J2N PropertyExtensions.
  • #261 PERFORMANCE - Lucene.Net.Support.AssemblyUtils: restructured to use IEnumerable<T> for deferred execution
  • #279 - Lucene.Net.Index.Terms/TermsEnum, Lucene.Net.Suggest: Refactored iterators into enumerators. Deprecated the iterators.
  • #279 - Lucene.Net.Util.FilterIterator<T>: Converted to FilterEnumerator<T> using a predicate passed into the constructor rather than having to subclass. Deprecated FilterIterator<T>. Swapped only usage in FieldFilterAtomicReader with a LINQ query/yield return, since performance is better.
  • #279 - Lucene.Net.Util.MergedIterator<T>: Converted to MergedEnumerator<T> and deprecated MergedIterator<T>
  • #279 - Lucene.Net.Codecs.Memory.DirectDocValuesConsumer: Renamed IteratorAnonymousInnerClassHelper > Enumerator, IterableAnonymousInnerClassHelper > EnumerableAnonymousInnerClassHelper

New Features

  • Added DeadlockAttribute to identify tests that are known to have threading contention issues and may deadlock during test runs
  • Added ability to turn asserts on in a Release build by using the system property "assert": "true". This is necessary to ensure all of the test conditions are being hit in all builds and to enable more thorough CheckIndex from the command line.
  • Lucene.Net.TestFramework: Added ability to turn off asserts when running tests by ignoring a few tests that require the asserts to be enabled in order to pass. This makes it possible to ensure that Lucene.NET works properly with asserts disabled. This feature didn't exist in Lucene.
  • Lucene.Net.Search.FieldCacheDocIdSet: Added public constructor with predicate parameter for filtering without having to create a subclass