Release 0.8.0 · dbmdz/solr-ocrhighlighting

The major improvement in this version is compatibility with Solr 9.

Due to a number of API changes in Solr and Lucene, we now have to ship two separate releases,
one for Solr 7 and 8 and one for Solr 9, so please take extra care when downloading to pick
the correct release. In the Package Repository, the Solr 7/8 release will always have version
with the suffix -solr78.

We also changed the package namespaces for all user-facing components so they are easier
to identify and write. What this means is that you will need to change the class="..."
attributes in your solrconfig.xml and schema.xml to match the new package namespaces.
Whenever you previously had de.digitalcollections.solrocr.<other stuff>.ClassName, you
now have to simply write solrocr.ClassName.

New Features:

For users running Solr in the Solrcloud mode, the plugin can now be installed via Solr's
Package Manager:

$ bin/solr package add-repo dbmdz.github.io https://dbmdz.github.io/solr
$ bin/solr package install ocrhighlighting  # For Solr 9
$ bin/solr package install ocrhighlighting:0.8.0-solr78  # For Solr 7 and 8

Note that Solr 7/8 users need to manually specify the version.

API changes:

Changed deployment process to use two separate packages, one for Solr 9 and later and one for Solr 7/8, with a -solr78.jar suffix
Changed namespace of all user-facing components to simply solrocr and moved all
user-facing component classes to it:
- de.digitalcollections.solrocr.lucene.filters.OcrCharFilterFactory
  → solrocr.OcrCharFilterFactory
- de.digitalcollections.solrocr.lucene.filters.ExternalUtf8ContentFilterFactory
  → solrocr.ExternalUtf8ContentFilterFactory
- de.digitalcollections.solrocr.lucene.OcrAlternativesFilterFactory
  → solrocr.OcrAlternativesFilterFactory
- de.digitalcollections.solrocr.lucene.OcrHighlightComponent
  → solrocr.OcrHighlightComponent

Bugfixes

Fix handling of quoted property values in hOCR title tags. We deviate a bit from the spec
to be more compatible with existing real-world data: Values like x_source can now either
be quoted in single- or double-quotes, or not at all, the parser will handle every case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.8.0