0.8.0
The major improvement in this version is compatibility with Solr 9.
Due to a number of API changes in Solr and Lucene, we now have to ship two separate releases,
one for Solr 7 and 8 and one for Solr 9, so please take extra care when downloading to pick
the correct release. In the Package Repository, the Solr 7/8 release will always have version
with the suffix -solr78
.
We also changed the package namespaces for all user-facing components so they are easier
to identify and write. What this means is that you will need to change the class="..."
attributes in your solrconfig.xml
and schema.xml
to match the new package namespaces.
Whenever you previously had de.digitalcollections.solrocr.<other stuff>.ClassName
, you
now have to simply write solrocr.ClassName
.
New Features:
- For users running Solr in the Solrcloud mode, the plugin can now be installed via Solr's
Package Manager:Note that Solr 7/8 users need to manually specify the version.$ bin/solr package add-repo dbmdz.github.io https://dbmdz.github.io/solr $ bin/solr package install ocrhighlighting # For Solr 9 $ bin/solr package install ocrhighlighting:0.8.0-solr78 # For Solr 7 and 8
API changes:
- Changed deployment process to use two separate packages, one for Solr 9 and later and one for Solr 7/8, with a
-solr78.jar
suffix - Changed namespace of all user-facing components to simply
solrocr
and moved all
user-facing component classes to it:de.digitalcollections.solrocr.lucene.filters.OcrCharFilterFactory
→solrocr.OcrCharFilterFactory
de.digitalcollections.solrocr.lucene.filters.ExternalUtf8ContentFilterFactory
→solrocr.ExternalUtf8ContentFilterFactory
de.digitalcollections.solrocr.lucene.OcrAlternativesFilterFactory
→solrocr.OcrAlternativesFilterFactory
de.digitalcollections.solrocr.lucene.OcrHighlightComponent
→solrocr.OcrHighlightComponent
Bugfixes
- Fix handling of quoted property values in hOCR title tags. We deviate a bit from the spec
to be more compatible with existing real-world data: Values likex_source
can now either
be quoted in single- or double-quotes, or not at all, the parser will handle every case.