-
Notifications
You must be signed in to change notification settings - Fork 0
USE 112 - refactor model load #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: Many of the CLI commands will require an embedding class and model to work. A decorator was created originally that injected a --model-uri CLI argument, but it also provides a place to load the class itself and become more of a middleware. How this addresses that need: Updates the model_required decorator to also load the embedding model class. This DRY's up the CLI commands that use it and centralizes that logic and conventions for the CLI argument, env vars, and whatnot. Lastly, it is now required to include a 'model_path' when instantiating a model class instance, and this location is used for both download and load. Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-112
| def __init__(self, model_path: str | Path) -> None: | ||
| """Initialize the embedding model with a model path. | ||
| Args: | ||
| model_path: Path where the model will be downloaded to and loaded from. | ||
| """ | ||
| self.model_path = Path(model_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most changes can be traced back to this change. When we instantiate an embedding class, we pass a model_path always, which can be used for any downloading or loading of the model.
This removes the need to pass around paths, and we can assume it's always required.
| def model_required(f: Callable) -> Callable: | ||
| """Middleware decorator for commands that require an embedding model. | ||
| This decorator adds two CLI options: | ||
| - "--model-uri": defaults to environment variable "TE_MODEL_URI" | ||
| - "--model-path": defaults to environment variable "TE_MODEL_PATH" | ||
| The decorator intercepts these parameters, uses the model URI to identify and | ||
| instantiate the appropriate embedding model class with the provided model path, | ||
| and stores the model instance in the Click context at ctx.obj["model"]. | ||
| Both model_uri and model_path parameters are consumed by the decorator and not | ||
| passed to the decorated command function. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in the docstring, this decorator grew a bit in responsibility.
When applied to a CLI command, the arguments are injected, and now we get a nearly fully initialized embedding class instance back, with the model_path set for use in download() or load() if the CLI commands do either of those.
This decorator was also moved in the file, hence the git diff showing all new.
| This CLI command is NOT used during normal workflows. This is used primary | ||
| during development and after model downloading/loading changes to ensure the | ||
| model loads correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated docstring for this CLI command @ehanson8 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change!
| This CLI command is NOT used during normal workflows. This is used primary | ||
| during development and after model downloading/loading changes to ensure the | ||
| model loads correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| @abstractmethod | ||
| def download(self, output_path: str | Path) -> Path: | ||
| """Download and prepare model, saving to output_path. | ||
| def download(self) -> Path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 for the rename to model_path, more descriptive and just fits better throughout the code
Purpose and background context
This PR continues a bit of refactoring, going into the stubbing of actual embeddings creation CLI commands, around how the model is found and loaded for CLI commands.
Updates:
TE_MODEL_DOWNLOAD_PATHtoTE_MODEL_PATHto reflect it's used for both download and loadingmodel_pathto an embedding class instance and save to self, then use for methods likedownload()andload()@model_requiredto have it handle the injection of CLI arguments and the loading of the model, all before you hit the business logic of the CLI commandHow can a reviewer manually see the effects of these changes?
No functional changes, just refactoring.
Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Code review