Huggingface downloader & Simpler log message & InterruptMixin #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request encompasses several enhancements to usability and code refactoring. The primary changes include:
Automatic Model Downloader: In our previous implementation, the
model_path
attribute inmodel_definitions.py
required an actual filename of a model. We have now upgraded this to accept the name of a HuggingFace repository instead. As a result, the specified model is automatically downloaded when needed. For instance, if you defineTheBloke/NewHope-GPTQ
as themodel_path
, the necessary files will be downloaded intomodels/gptq/thebloke_newhope_gptq
. This functionality works similarly for GGML.Simpler Log Message: We've made our log messages more concise when using Completions, Chat Completions, or Embeddings endpoints. These logs will now fundamentally display elapsed time, token usage, and token generations per second.
Improved Responsiveness for Job Cancellation: The
Event
object inSyncManager
now sends an interrupt signal to worker processes. It checks theis_interrupted
property at the most low-level accessible area and tries to cancel the operation.These changes foster more intuitive use of our application and enhance its overall responsiveness. They streamline model handling by allowing automatic downloads from a repository, rather than relying on specific file names. The job cancellation process is now more reactive, potentially saving computing resources and time if a process needs to be halted. Finally, our log messages are now cleaner and more informative, providing essential information for monitoring performance and usage.