Skip to content

fix: Fix device state restoration in NamedEntityExtractor and add release …#11259

Merged
sjrl merged 8 commits into
deepset-ai:mainfrom
ritikraj2425:fix-ner-device-state
May 11, 2026
Merged

fix: Fix device state restoration in NamedEntityExtractor and add release …#11259
sjrl merged 8 commits into
deepset-ai:mainfrom
ritikraj2425:fix-ner-device-state

Conversation

@ritikraj2425
Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

Previously, the NamedEntityExtractor (spaCy backend) would unconditionally call spacy.require_cpu() after execution. Since spaCy and Thinc use a global state for device configuration, this would override any pre-existing user configuration (e.g., if the user was using a specific GPU for other parts of their application).

This PR:

  • Captures the current Thinc Ops state at the start of the _select_device context manager.
  • Restores the original Ops state in the finally block instead of forcing a reset to CPU.
  • Removes the outdated TODO regarding device restoration.

How did you test it?

  • Regression Test: Added TestNamedEntityExtractorDeviceRestoration to test/components/extractors/test_named_entity_extractor.py. This test sets a custom attribute on the global Ops object and verifies that it is preserved after the component's internal device switching logic.
  • Unit Tests: Ran the existing NamedEntityExtractor test suite (11 tests passed).
  • Manual Verification: Verified that custom NumpyOps objects are not replaced by fresh instances after component execution.

Checklist

@ritikraj2425 ritikraj2425 requested a review from a team as a code owner May 5, 2026 10:10
@ritikraj2425 ritikraj2425 requested review from sjrl and removed request for a team May 5, 2026 10:10
@vercel
Copy link
Copy Markdown

vercel Bot commented May 5, 2026

@ritikraj2425 is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@sjrl sjrl self-assigned this May 8, 2026
@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented May 8, 2026

Hey @ritikraj2425 is this an issue you were running into with scripts you were running yourself? Also please look at the failing CI and fix the issues there.

@github-actions github-actions Bot added the type:documentation Improvements on the docs label May 8, 2026
@ritikraj2425
Copy link
Copy Markdown
Contributor Author

Hi @sjrl!
Yes, I was running some custom test scripts while exploring how Haystack components manage hardware resources. I noticed that NamedEntityExtractor calls a global spacy.require_cpu() in its finally block, which could unintentionally wipe out device settings for other components sharing the same GPU. I implemented the Thinc Ops state restoration to prevent this global side-effect!

I also pushed a fix for the CI. I had mistakenly placed my new device restoration test in the standard unit tests (where spaCy isn't installed in CI), causing an ImportError during collection. I have now moved it to the e2e pipeline tests where the spaCy backend is properly tested.

Let me know if there's anything else I should adjust!

@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented May 11, 2026

Hey @ritikraj2425 thanks for the changes! Could you fix the mypy issues? https://github.com/deepset-ai/haystack/actions/runs/25561973899/job/75037156133?pr=11259

@ritikraj2425
Copy link
Copy Markdown
Contributor Author

Hey @sjrl!
I synced with main and noticed that some recent upstream changes caused mypy failures in the HuggingFace and OpenAI response components. I've re-added the necessary # type: ignore comments in this PR to bypass the errors and get the CI green again.

Everything should be passing now! Let me know if there's anything else you need.

Comment thread haystack/components/embedders/hugging_face_api_document_embedder.py
Comment thread haystack/components/generators/chat/azure_responses.py Outdated
Comment thread haystack/components/generators/chat/openai_responses.py Outdated
@ritikraj2425
Copy link
Copy Markdown
Contributor Author

Done! I had just added those because they were failing the mypy CI on my branch and I was trying to get the build green, but I've reverted them now to keep this PR strictly scoped to the NER fix.

@sjrl sjrl changed the title Fix device state restoration in NamedEntityExtractor and add release … fix: Fix device state restoration in NamedEntityExtractor and add release … May 11, 2026
@sjrl sjrl removed their assignment May 11, 2026
Copy link
Copy Markdown
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sjrl sjrl enabled auto-merge (squash) May 11, 2026 07:11
@sjrl sjrl merged commit 2121b53 into deepset-ai:main May 11, 2026
22 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] NamedEntityExtractor (spaCy) fails to restore device state after execution

2 participants