Skip to content

Annif 1.2

Latest
Compare
Choose a tag to compare
@juhoinkinen juhoinkinen released this 02 Oct 10:17
· 11 commits to main since this release
v1.2.0
3297ce1

This release introduces language detection capabilities in the REST API and CLI, improves 🤗 Hugging Face Hub integration, and also includes the usual maintenance work and minor bug fixes.

The new REST API endpoint /v1/detect-language expects POST requests that contain a JSON object with the text whose language is to be analyzed and a list of candidate languages. Similarly, the CLI has a new command annif detect-language. Annif projects are typically language specific, so a text of a given language needs to be processed with a project intended for that language; the language detection feature can help in this. For details see this Wiki page. The language detection is performed with the Simplemma library by @adbar et al.

The annif download command has a new --trust-repo option, which needs to be used if the repository to download from has not been used previously (that is if the repository does not appear in the local Hugging Face Hub cache). This option is introduced to raise awareness of the risks of downloading projects from the internet; the project downloads should only be done from trusted sources. For more information see the Hugging Face Hub documentation.

This release also includes automation of downloading the NLTK datapackage used for tokenization to simplify Annif installation. Maintenance tasks include upgrading dependencies, including a new version of Simplemma that allows better control over memory usage. The bug fixes include restoring the --host option of the annif run command.

Python 3.12 is now fully supported (previously NN-ensemble and STWFSA backends were not supported on Python 3.12).

Supported Python versions:

  • 3.9, 3.10,. 3.11 and 3.12

Backward compatibility:

  • NN ensemble projects trained with Annif v1.1 or older need to be retrained.
  • For other projects, the warnings by SciKit-learn are harmless.

Enhancements

#659/#799/#800/#801/#802 Language detection in REST API and CLI
#779 Python 3.12 support
#790/#793 Automatically add metadata to Hugging Face Hub repos when uploading projects
#809 Make field widths variable in the projects list of the Hugging Face Hub Model Card
#803 Automate NLTK datapackage punkt_tab download
#807 Add --trust-repo option to download CLI command

Maintenance

#724 Upgrade Simplemma & limit its memory usage
#796 Update dependencies for 1.2 release
#797/#811 Bump the github-actions versions
#805 Upgrade Docker baseimage to Python 3.12

Bug fixes

#788 Add --host option to annif run (credit: @dwinston)
#792 Fix limit parameter not passed to requests by HTTP backend
#808 Fix missing Hugging Face Hub token from preupload_lfs_files() parameters