Skip to content

0.8.0

Compare
Choose a tag to compare
@jbaiter jbaiter released this 01 Jun 10:53
· 173 commits to main since this release

The major improvement in this version is compatibility with Solr 9.

Due to a number of API changes in Solr and Lucene, we now have to ship two separate releases,
one for Solr 7 and 8 and one for Solr 9, so please take extra care when downloading to pick
the correct release. In the Package Repository, the Solr 7/8 release will always have version
with the suffix -solr78.

We also changed the package namespaces for all user-facing components so they are easier
to identify and write. What this means is that you will need to change the class="..."
attributes in your solrconfig.xml and schema.xml to match the new package namespaces.
Whenever you previously had de.digitalcollections.solrocr.<other stuff>.ClassName, you
now have to simply write solrocr.ClassName.

New Features:

  • For users running Solr in the Solrcloud mode, the plugin can now be installed via Solr's
    Package Manager:
    $ bin/solr package add-repo dbmdz.github.io https://dbmdz.github.io/solr
    $ bin/solr package install ocrhighlighting  # For Solr 9
    $ bin/solr package install ocrhighlighting:0.8.0-solr78  # For Solr 7 and 8
    
    Note that Solr 7/8 users need to manually specify the version.

API changes:

  • Changed deployment process to use two separate packages, one for Solr 9 and later and one for Solr 7/8, with a -solr78.jar suffix
  • Changed namespace of all user-facing components to simply solrocr and moved all
    user-facing component classes to it:
    • de.digitalcollections.solrocr.lucene.filters.OcrCharFilterFactory
      solrocr.OcrCharFilterFactory
    • de.digitalcollections.solrocr.lucene.filters.ExternalUtf8ContentFilterFactory
      solrocr.ExternalUtf8ContentFilterFactory
    • de.digitalcollections.solrocr.lucene.OcrAlternativesFilterFactory
      solrocr.OcrAlternativesFilterFactory
    • de.digitalcollections.solrocr.lucene.OcrHighlightComponent
      solrocr.OcrHighlightComponent

Bugfixes

  • Fix handling of quoted property values in hOCR title tags. We deviate a bit from the spec
    to be more compatible with existing real-world data: Values like x_source can now either
    be quoted in single- or double-quotes, or not at all, the parser will handle every case.