fix(medcat-service): Add thread safety to medcat processor dependency #200

alhendrickson · 2025-10-28T18:14:43Z

This fixes a threading issue on startup causing multiple medcat processors to be created.

I think this will also fix #141 , though I'm not certain how to recreate that one. This keeps it with lazy initialization instead of on startup, which I'd generally prefer.

Previous Behaviour

Startup the service whilst loading it with 20 parallel requests on an infinite loop. I've been using locust and have the code in the ops repo
It enters get_medcat_processor 20 times, as they have yet to finish and store in the cache
We end up with log lines like this:

2025-10-28 17:14:33,155 [INFO] MedCatProcessor: MedCAT processor is ready
Training was enabled during inference. It was automatically disabled.
2025-10-28 17:14:35,625 [INFO] MedCatProcessor: MedCAT processor is ready
Training was enabled during inference. It was automatically disabled.
2025-10-28 17:14:36,652 [INFO] MedCatProcessor: MedCAT processor is ready
Training was enabled during inference. It was automatically disabled.
2025-10-28 17:14:36,795 [INFO] MedCatProcessor: MedCAT processor is ready

New behaviour

Startup the service whilst loading it with 20 requests at the same time
It still enters get_medcat_processor 20 times, as they have yet to finish and store in the cache
It then enters get_medcat_processor_singleton but each thread hits the lock, so only the first one actually creates the processor
We end up with log lines like this, where only one processor is initialised and used

cogstack-medcat-service-production  | Creating new medcat processor from cache miss
cogstack-medcat-service-production  | 2025-10-28 18:08:39,073 [INFO] MedCatProcessor: Initializing MedCAT processor ...
cogstack-medcat-service-production  | Creating new medcat processor from cache miss
cogstack-medcat-service-production  | Creating new medcat processor from cache miss
cogstack-medcat-service-production  | Creating new medcat processor from cache miss
cogstack-medcat-service-production  | Creating new medcat processor from cache miss
cogstack-medcat-service-production  | Creating new medcat processor from cache miss
cogstack-medcat-service-production  | Creating new medcat processor from cache miss
cogstack-medcat-service-production  | Creating new medcat processor from cache miss
cogstack-medcat-service-production  | Creating new medcat processor from cache miss
cogstack-medcat-service-production  | 2025-10-28 18:08:50,844 [INFO] MedCatProcessor: MedCAT processor is ready

Notes

This documentation from lru_cache explains why this change is required:

The cache is threadsafe so that the wrapped function can be used in multiple threads. This means that the underlying data structure will remain coherent during concurrent updates.
It is possible for the wrapped function to be called more than once if another thread makes an additional call before the initial call has been completed and cached.

So this is saying if I call /process 20 times in parallel, but there's nothing in the cache yet, all 20 get created.

The change in here still uses lru_cache so that subsequent calls dont have to lock. It's using a global tuple with settings as the key, so it is possible to later change the processor if settings changes - this means all the tests still work, though during regular use it should never trigger (if we trust lru_cache)

mart-r

This sounds great to me!
Looks like you've identified why there was a memory leak. And it does make a lot of sense - the processor has to load the model pack, so it won't actually return too quickly.

alhendrickson added 4 commits October 28, 2025 18:03

fix(medcat-service): Add thread safety to medcat processor dependency

a25265c

fix(medcat-service): Add thread safety to medcat processor dependency

1b84e3d

fix(medcat-service): Add thread safety to medcat processor dependency

a9c6684

fix(medcat-service): Add thread safety to medcat processor dependency

a0517c5

mart-r approved these changes Oct 29, 2025

View reviewed changes

alhendrickson merged commit dddf7ce into main Oct 29, 2025
11 checks passed

alhendrickson deleted the fix/medcat-service/thread-safe-processor branch October 29, 2025 11:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(medcat-service): Add thread safety to medcat processor dependency #200

fix(medcat-service): Add thread safety to medcat processor dependency #200

Uh oh!

alhendrickson commented Oct 28, 2025 •

edited

Loading

Uh oh!

mart-r left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(medcat-service): Add thread safety to medcat processor dependency #200

fix(medcat-service): Add thread safety to medcat processor dependency #200

Uh oh!

Conversation

alhendrickson commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Previous Behaviour

New behaviour

Notes

Uh oh!

mart-r left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alhendrickson commented Oct 28, 2025 •

edited

Loading