Skip to content

Conversation

@ghukill
Copy link
Collaborator

@ghukill ghukill commented Nov 10, 2025

NOTE: the +1,692 −598 line count diff is mostly the addition of a couple of fixtures and dependency churn.

Purpose and background context

This PR is a fairly substantial pivot for our first embedding class OSNeuralSparseDocV3GTE to use the sentence-transformers library vs the lower level transformers library for creating embeddings.

It was decided to revisit sentence-transformers while exploring multiprocessing in USE-137, as sentence-transformers provides more out-of-the-box functionality tailored for our model and our purposes, including parallel processing. The learnings from this were complicated, but ultimately a good solution!

On one hand, it became clear that manual multiprocessing with transformers -- specifically, multiple processes vs threads -- was not demonstrating the performance gain that initial spike code had suggested. Still exploring why this might be the case, but is fairly consistent with articles and blog posts on the topic. That said, because sentence-transformers does expose multiprocessing options more easily, it keeps the door open for tuning later on.

On the other hand, and what tipped the scales, was the simplicity of sentence-transformers versus transformers. Even the hugging face model card leads with an sentence-transformers example.

Assuming then that sentence-transformers is a good fit for performance + ergonomics, what actually changes?

  • drastically reduced code complexity for creating embeddings
  • out-of-the-box support for MPS on Apple Silicon (mostly good for testing)
  • simpler parallel processing options, good for our first pass at this CLI
  • small things like the decoded token weights are pre-sorted by weight

Overall, while transformers was demonstrated to work, revisiting sentence-transformers suggests it's a better fit for us and this application at this time. In addition to a simpler API to work with, testing also suggests better out-of-the-box performance without any complicated multiprocessing or batching.

How can a reviewer manually see the effects of these changes?

For both local and docker tests, set env vars:

HF_HUB_DISABLE_PROGRESS_BARS=true
TE_MODEL_URI=opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
TE_MODEL_PATH=/tmp/te-model

Note the lack of any performance tuning env vars; defaults are generally a good bet at this time.

If you haven't already, ensure to download the model:

uv run --env-file .env embeddings --verbose download-model

Local Run

Invoke CLI via uv:

uv run --env-file .env embeddings \
--verbose \
create-embeddings \
--input-jsonl tests/fixtures/cli_inputs/test-100-records.jsonl \
--strategy full_record \
--output-jsonl /tmp/test.jsonl

If interested, you can also test using MPS on Macs with apple silicon chips (e.g. M1 - M4).

First, run a job with more records to get a baseline time:

uv run --env-file .env embeddings \
--verbose \
create-embeddings \
--input-jsonl tests/fixtures/cli_inputs/test-1000-records.jsonl \
--strategy full_record \
--output-jsonl /tmp/test.jsonl

Total time to complete process: 0:00:40.621678

Then, re-run with the job with the env var TE_TORCH_DEVICE=mps for a fairly significant speedup:

TE_TORCH_DEVICE=mps \
uv run --env-file .env embeddings \
--verbose \
create-embeddings \
--input-jsonl tests/fixtures/cli_inputs/test-1000-records.jsonl \
--strategy full_record \
--output-jsonl /tmp/test.jsonl

Total time to complete process: 0:00:30.934107

Docker Run

First, build the image:

make docker-build

Invoke docker container:

docker run -it \
timdex-embeddings:latest \
--verbose \
create-embeddings \
--strategy full_record \
--input-jsonl /fixtures/cli_inputs/test-100-records.jsonl \
--output-jsonl /tmp/test.jsonl

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

@ghukill ghukill force-pushed the USE-137-sentence-transformers branch from fedbdd2 to 8f5ca49 Compare November 10, 2025 17:04
@ghukill ghukill marked this pull request as ready for review November 10, 2025 17:07
@ghukill ghukill requested a review from a team November 10, 2025 17:07
@ehanson8
Copy link

Then, re-run with the job with the env var TE_TORCH_DEVICE=mps for a fairly significant speedup:

This command worked but did give me a truncated warning, not sure if it's significant:

2025-11-10 14:31:11,460 INFO embeddings.cli.create_embeddings() line 281: Embeddings creation complete.
2025-11-10 14:31:11,461 INFO embeddings.cli._log_command_elapsed_time() line 49: Total time to complete process: 0:00:14.320226
/Users/ehanson/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@ghukill
Copy link
Collaborator Author

ghukill commented Nov 10, 2025

Then, re-run with the job with the env var TE_TORCH_DEVICE=mps for a fairly significant speedup:

This command worked but did give me a truncated warning, not sure if it's significant:

2025-11-10 14:31:11,460 INFO embeddings.cli.create_embeddings() line 281: Embeddings creation complete.
2025-11-10 14:31:11,461 INFO embeddings.cli._log_command_elapsed_time() line 49: Total time to complete process: 0:00:14.320226
/Users/ehanson/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Thanks for surfacing @ehanson8, yes, this is known. I've been doing some reading and it sounds harmless.

Furthermore, in our ECS deployed context, the container would shutdown immediately following the embeddings work anyways so mostly moot in that context.

But good catch!

Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected and this is a lot cleaner overall so a very positive switch in my mind!

Why these changes are being introduced:

With a pivot to sentence-transformers coming, rework the dependencies.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-137
Why these changes are being introduced:

The first approach for the embedding class OSNeuralSparseDocV3GTE used
the `transformers` library for creating embeddings.  Some early exploratory
code seemed to indicate this more low level library would provide more flexibility
in response formats and better performance.

However, when exploring using multiprocessing for the `transformers` library,
the alternate approach of using the `sentence-transformers` library was explored
given it's more out-of-the-box multiprocessing support.  During that exploration,
based on learnings since the original spike code, it was determined that the
`sentence-transformers` library might be a better fit overall for our purposes.

Pivoting to this library will simplify our actual embedding logic, while providing
some out-of-the-box tuning capabilities that should be sufficient for our
purposes.

How this addresses that need:

The embedding class `OSNeuralSparseDocV3GTE` is reworked to use
`sentence-transformers` instead of `transformers` for creating embeddings.

This reducues considerably complexity in the actual creating of embeddings,
while also exposing an API for multiprocessing.  It's worth noting that testing
is indicating that multiprocessing will *not* speed up embeddings, at least for
the contexts we aim to create them, but the `sentence-transformers` library also
better handles parallelism without explicit multiprocessing.

In summary, switching to `sentence-transformers` results in a simpler API
for creating embeddings, better out-of-the-box performance, with an API
that still allows for more tuning later.

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-137
@ghukill ghukill force-pushed the USE-137-sentence-transformers branch from 8f5ca49 to b859edd Compare November 13, 2025 14:07
@ghukill ghukill merged commit acc3f95 into main Nov 13, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants