Skip to content

Conversation

@ghukill
Copy link
Collaborator

@ghukill ghukill commented Nov 14, 2025

Purpose and background context

This PR makes a handful of small adjustments after testing in Dev1. While more tweaks are expected, this reflects another pass before full pipeline usage.

It is recommended to view each commit seperately:

How can a reviewer manually see the effects of these changes?

Not much to see! There are no longer any sparse vectors in the final output, but this is readily evident from the commits. Normal usage is mostly untouched.

Includes new or updated dependencies?

YES: dependency updates

Changes expectations for external applications?

NO

What are the relevant tickets?

  • None

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

Why these changes are being introduced:

With the json.dumps() approach, we had a lot of JSON characters and cruft in the
final string used for embedding.

How this addresses that need:

Produces a <field>:<value><newline> repeating structure from the full record.

Side effects of this change:
* None

Relevant ticket(s):
* None
@ghukill ghukill marked this pull request as ready for review November 14, 2025 14:54
@ghukill ghukill requested a review from a team November 14, 2025 14:55
Why these changes are being introduced:

Now that we can support multiprocessing, we should expose the chunk_size parameter
for inference.

How this addresses that need:

The env var TE_CHUNK_SIZE is added that will set the chunk_size configuration for
inference.

Side effects of this change:
* When multiprocessing is used, and TE_CHUNK_SIZE is set, it will be used for
inference.

Relevant ticket(s):
* None
Why these changes are being introduced:

Our initial pass with the embedding class OSNeuralSparseDocV3GTE was to save both
the sparse vector and the decoded token:weights.  Each sparse vector was the length of the
model vocabulary, about 30k, with mostly zeros.  While technically this could be used for
analysis beyond just the decoded token:weights given to OpenSearch, the data transfer
and storage overhead exceeds any known use cases at the moment.

How this addresses that need:

The OSNeuralSparseDocV3GTE embedding model is updated to not include the sparse vector
for the Embedding.embedding_vector property on output.

This can easily be turned on later, with an inline code comment showing how to toggle
that back on.

Side effects of this change:
* No sparse vectors are stored for now, storage is decreased.

Relevant ticket(s):
* None
@ghukill ghukill force-pushed the v1-deploy-adjustments branch from 3810f03 to ee1c26f Compare November 14, 2025 15:00
Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

)

# get sparse vector embedding for input text(s)
inference_start = time.perf_counter()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the var and logging name change for specificity on what it is tracking

embedding_vector = sparse_vector.to_dense().tolist()
# # prepare sparse vector for JSON serialization
# NOTE: at this time we are NOT including the sparse vector for output. This
# block can be uncommented in the future to include it when wanted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good approach to this change!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought you'd like this update @ehanson8! Glad we have this stubbed if we do want them in the future, but I think your instinct was right to not include them at first. Going to be lots of churn in the embeddings creation for a bit as we tune things.

@ghukill ghukill merged commit 1cfd0af into main Nov 14, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants