-
Notifications
You must be signed in to change notification settings - Fork 0
v1 Deploy Adjustments #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: With the json.dumps() approach, we had a lot of JSON characters and cruft in the final string used for embedding. How this addresses that need: Produces a <field>:<value><newline> repeating structure from the full record. Side effects of this change: * None Relevant ticket(s): * None
Why these changes are being introduced: Now that we can support multiprocessing, we should expose the chunk_size parameter for inference. How this addresses that need: The env var TE_CHUNK_SIZE is added that will set the chunk_size configuration for inference. Side effects of this change: * When multiprocessing is used, and TE_CHUNK_SIZE is set, it will be used for inference. Relevant ticket(s): * None
Why these changes are being introduced: Our initial pass with the embedding class OSNeuralSparseDocV3GTE was to save both the sparse vector and the decoded token:weights. Each sparse vector was the length of the model vocabulary, about 30k, with mostly zeros. While technically this could be used for analysis beyond just the decoded token:weights given to OpenSearch, the data transfer and storage overhead exceeds any known use cases at the moment. How this addresses that need: The OSNeuralSparseDocV3GTE embedding model is updated to not include the sparse vector for the Embedding.embedding_vector property on output. This can easily be turned on later, with an inline code comment showing how to toggle that back on. Side effects of this change: * No sparse vectors are stored for now, storage is decreased. Relevant ticket(s): * None
3810f03 to
ee1c26f
Compare
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved!
| ) | ||
|
|
||
| # get sparse vector embedding for input text(s) | ||
| inference_start = time.perf_counter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the var and logging name change for specificity on what it is tracking
| embedding_vector = sparse_vector.to_dense().tolist() | ||
| # # prepare sparse vector for JSON serialization | ||
| # NOTE: at this time we are NOT including the sparse vector for output. This | ||
| # block can be uncommented in the future to include it when wanted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good approach to this change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought you'd like this update @ehanson8! Glad we have this stubbed if we do want them in the future, but I think your instinct was right to not include them at first. Going to be lots of churn in the embeddings creation for a bit as we tune things.
Purpose and background context
This PR makes a handful of small adjustments after testing in Dev1. While more tweaks are expected, this reflects another pass before full pipeline usage.
It is recommended to view each commit seperately:
full_recordstrategychunk_sizeconfiguration to environment variableOSNeuralSparseDocV3GTEHow can a reviewer manually see the effects of these changes?
Not much to see! There are no longer any sparse vectors in the final output, but this is readily evident from the commits. Normal usage is mostly untouched.
Includes new or updated dependencies?
YES: dependency updates
Changes expectations for external applications?
NO
What are the relevant tickets?
Code review