optimization: load a vocabulary only once even if used in different languages #736

osma · 2023-09-22T13:12:11Z

While looking at ways to implement #735, I discovered an opportunity for optimization in the registry code that handles loading of vocabularies. For some reason (probably my mistake) the registry loads vocabularies multiple times, once per language. This amounts to useless work and use of memory.

This PR adjusts the code slightly so that vocabularies are always loaded just once. This was always the intention since the introduction of multilingual vocabularies (#559, PR #600 etc.) and especially PR #610 which implemented vocabularies that are shared between projects.

I benchmarked this with an installation where I have three Finto AI MLLM projects (languages fi, sv, en) that all use the YSO vocabulary, but in different languages. I ran the command

ANNIF_CONFIG=annif.default_config.ProductionConfig /usr/bin/time -v annif list-projects

The idea here is to use ProductionConfig which causes all projects to be loaded on startup, instead of on demand. This means that also the vocabulary is loaded.

Before

(showing selected stats)

        User time (seconds): 13.04
	System time (seconds): 6.13
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.27
	Maximum resident set size (kbytes): 539600

After

	User time (seconds): 12.82
	System time (seconds): 7.26
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.66
	Maximum resident set size (kbytes): 428940

So there's a slight speedup, and the memory usage drops by 110MB. Not bad for a patch that also reduces the amount of code by 3 lines.