optimization: load a vocabulary only once even if used in different languages #736
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While looking at ways to implement #735, I discovered an opportunity for optimization in the registry code that handles loading of vocabularies. For some reason (probably my mistake) the registry loads vocabularies multiple times, once per language. This amounts to useless work and use of memory.
This PR adjusts the code slightly so that vocabularies are always loaded just once. This was always the intention since the introduction of multilingual vocabularies (#559, PR #600 etc.) and especially PR #610 which implemented vocabularies that are shared between projects.
I benchmarked this with an installation where I have three Finto AI MLLM projects (languages fi, sv, en) that all use the YSO vocabulary, but in different languages. I ran the command
The idea here is to use ProductionConfig which causes all projects to be loaded on startup, instead of on demand. This means that also the vocabulary is loaded.
Before
(showing selected stats)
After
So there's a slight speedup, and the memory usage drops by 110MB. Not bad for a patch that also reduces the amount of code by 3 lines.