Skip to content

Instructions, exercises and example data sets for Annif hands-on tutorial


Notifications You must be signed in to change notification settings


Repository files navigation

Annif tutorial

The tutorial includes short video presentatios and hands-on exercises, which can be explored via the outline page. Two example data sets are provided to be used in the exercises.

The tutorial was initially organized at SWIB19 and later updated for DCMI Virtual 2020 and SWIB22, but the materials are freely available for self-study.


You will need a computer with sufficient resources (at least 8GB RAM and 20GB free disk space) to be able to install Annif and complete the exercises. Installation of Annif is one topic of the exercises.

Note also that it might be convenient to have either Docker or VirtualBox installed beforehand. When using Docker desktop (Windows), you might want to increase the available memory for it to 8GB under Settings -> Advanced.

Getting the tutorial materials

To complete the exercises of this tutorial, you will need a local copy of the materials, especially the data sets (unless you use the pre-built VirtualBox VM, which includes them). The easiest way to get them is to either clone this repository or download it as a zip archive from GitHub (click the green "Code" button near the top for clone and download options).

When you have the files locally, you also need to download the example full text documents for either or both data sets. The downloads are automated using make - see the README files for both data sets (yso-nlf, stw-zbw) for details.

Upcoming help sessions

  • To be confirmed

From time to time, we organize (online or in-person) help sessions for people working on the tutorial exercises. To register, you must have watched the videos and at least attempted to complete the exercises. Info will be posted here.

Past help sessions


The tutorial material was created by:

  • Osma Suominen, National Library of Finland
  • Mona Lehtinen, National Library of Finland
  • Juho Inkinen, National Library of Finland
  • Anna Kasprzik, ZBW - Leibniz Information Centre for Economics
  • Moritz Fürneisen, ZBW - Leibniz Information Centre for Economics


The materials created for this tutorial (presentations and exercises) are licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

The data sets were collected from other sources and have their own licensing; see each individual data set for details.