Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ONNXMiniLM_L6_V2._init_model_and_tokenizer with tokenizer and model cached properties #1194

Merged
merged 1 commit into from
Jan 5, 2024

Conversation

gsakkis
Copy link
Contributor

@gsakkis gsakkis commented Sep 28, 2023

Description of changes

Summarize the changes made by this PR.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readbility, Modularity, Intuitiveness)

self.DOWNLOAD_PATH, self.EXTRACTED_FOLDER_NAME, "tokenizer.json"
)
@cached_property
def tokenizer(self) -> "Tokenizer":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't race conditions still possible even if this is cached?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ostensibly they are still possible but they are harmless, at worst the model or the tokenizer are initialized more than once but this is idempotent.
The reason the current race condition is problematic is the and in if self.model is None and self.tokenizer is None:

  1. Initially the condition is True as both model and tokenizer are None
  2. Thread A enters the block and initializes the tokenizer
  3. The context switches to thread B before A gets to initialize the model
  4. Thread B evaluates the condition as False (model is None, tokenizer is not None)
  5. Thread B attempts to access self.model.run
  6. AttributeError 💥

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough :) Thanks for the analysis!

@HammadB
Copy link
Collaborator

HammadB commented Oct 6, 2023

Tests are failing - I think because

ImportError: cannot import name 'cached_property' from 'functools' this is not supported on 3.7

@gsakkis
Copy link
Contributor Author

gsakkis commented Oct 7, 2023

Oh 3.7 is still supported? It has reached its end-of-life in June. If you don't want to bump the minimum version to 3.8, you can add cached_property as conditional requirement for 3.7.

@beggers
Copy link
Member

beggers commented Jan 4, 2024

@gsakkis we're not supporting it anymore if you want to push this over the finish line!

@gsakkis
Copy link
Contributor Author

gsakkis commented Jan 5, 2024

@beggers rebased!

@beggers beggers merged commit bdec54a into chroma-core:main Jan 5, 2024
97 checks passed
@beggers
Copy link
Member

beggers commented Jan 5, 2024

🚀 🚀 🚀

@gsakkis gsakkis deleted the issues/1193 branch January 7, 2024 23:41
thorwhalen pushed a commit to thorwhalen/chroma that referenced this pull request Jan 11, 2024
… model cached properties (chroma-core#1194)

## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Fixes chroma-core#1193: race condition in
`ONNXMiniLM_L6_V2._init_model_and_tokenizer`

## Test plan
*How are these changes tested?*

- [x] Tests pass locally with `pytest` for python, `yarn test` for js
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: AttributeError: 'NoneType' object has no attribute 'run'
3 participants