USE 137 - Use sentence-transformers library #21

ghukill · 2025-11-10T15:46:41Z

NOTE: the +1,692 −598 line count diff is mostly the addition of a couple of fixtures and dependency churn.

Purpose and background context

This PR is a fairly substantial pivot for our first embedding class OSNeuralSparseDocV3GTE to use the sentence-transformers library vs the lower level transformers library for creating embeddings.

It was decided to revisit sentence-transformers while exploring multiprocessing in USE-137, as sentence-transformers provides more out-of-the-box functionality tailored for our model and our purposes, including parallel processing. The learnings from this were complicated, but ultimately a good solution!

On one hand, it became clear that manual multiprocessing with transformers -- specifically, multiple processes vs threads -- was not demonstrating the performance gain that initial spike code had suggested. Still exploring why this might be the case, but is fairly consistent with articles and blog posts on the topic. That said, because sentence-transformers does expose multiprocessing options more easily, it keeps the door open for tuning later on.

On the other hand, and what tipped the scales, was the simplicity of sentence-transformers versus transformers. Even the hugging face model card leads with an sentence-transformers example.

Assuming then that sentence-transformers is a good fit for performance + ergonomics, what actually changes?

drastically reduced code complexity for creating embeddings
out-of-the-box support for MPS on Apple Silicon (mostly good for testing)
simpler parallel processing options, good for our first pass at this CLI
small things like the decoded token weights are pre-sorted by weight

Overall, while transformers was demonstrated to work, revisiting sentence-transformers suggests it's a better fit for us and this application at this time. In addition to a simpler API to work with, testing also suggests better out-of-the-box performance without any complicated multiprocessing or batching.

How can a reviewer manually see the effects of these changes?

For both local and docker tests, set env vars:

HF_HUB_DISABLE_PROGRESS_BARS=true
TE_MODEL_URI=opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
TE_MODEL_PATH=/tmp/te-model

Note the lack of any performance tuning env vars; defaults are generally a good bet at this time.

If you haven't already, ensure to download the model:

uv run --env-file .env embeddings --verbose download-model

Local Run

Invoke CLI via uv:

uv run --env-file .env embeddings \
--verbose \
create-embeddings \
--input-jsonl tests/fixtures/cli_inputs/test-100-records.jsonl \
--strategy full_record \
--output-jsonl /tmp/test.jsonl

If interested, you can also test using MPS on Macs with apple silicon chips (e.g. M1 - M4).

First, run a job with more records to get a baseline time:

uv run --env-file .env embeddings \
--verbose \
create-embeddings \
--input-jsonl tests/fixtures/cli_inputs/test-1000-records.jsonl \
--strategy full_record \
--output-jsonl /tmp/test.jsonl

Total time to complete process: 0:00:40.621678

Then, re-run with the job with the env var TE_TORCH_DEVICE=mps for a fairly significant speedup:

TE_TORCH_DEVICE=mps \
uv run --env-file .env embeddings \
--verbose \
create-embeddings \
--input-jsonl tests/fixtures/cli_inputs/test-1000-records.jsonl \
--strategy full_record \
--output-jsonl /tmp/test.jsonl

Total time to complete process: 0:00:30.934107

Docker Run

First, build the image:

make docker-build

Invoke docker container:

docker run -it \
timdex-embeddings:latest \
--verbose \
create-embeddings \
--strategy full_record \
--input-jsonl /fixtures/cli_inputs/test-100-records.jsonl \
--output-jsonl /tmp/test.jsonl

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

https://mitlibraries.atlassian.net/browse/USE-137

Code review

Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

ehanson8 · 2025-11-10T19:41:27Z

Then, re-run with the job with the env var TE_TORCH_DEVICE=mps for a fairly significant speedup:

This command worked but did give me a truncated warning, not sure if it's significant:

2025-11-10 14:31:11,460 INFO embeddings.cli.create_embeddings() line 281: Embeddings creation complete.
2025-11-10 14:31:11,461 INFO embeddings.cli._log_command_elapsed_time() line 49: Total time to complete process: 0:00:14.320226
/Users/ehanson/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

ghukill · 2025-11-10T19:43:52Z

Then, re-run with the job with the env var TE_TORCH_DEVICE=mps for a fairly significant speedup:

This command worked but did give me a truncated warning, not sure if it's significant:

2025-11-10 14:31:11,460 INFO embeddings.cli.create_embeddings() line 281: Embeddings creation complete.
2025-11-10 14:31:11,461 INFO embeddings.cli._log_command_elapsed_time() line 49: Total time to complete process: 0:00:14.320226
/Users/ehanson/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Thanks for surfacing @ehanson8, yes, this is known. I've been doing some reading and it sounds harmless.

Furthermore, in our ECS deployed context, the container would shutdown immediately following the embeddings work anyways so mostly moot in that context.

But good catch!

ehanson8

Works as expected and this is a lot cleaner overall so a very positive switch in my mind!

Why these changes are being introduced: With a pivot to sentence-transformers coming, rework the dependencies. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-137

Why these changes are being introduced: The first approach for the embedding class OSNeuralSparseDocV3GTE used the `transformers` library for creating embeddings. Some early exploratory code seemed to indicate this more low level library would provide more flexibility in response formats and better performance. However, when exploring using multiprocessing for the `transformers` library, the alternate approach of using the `sentence-transformers` library was explored given it's more out-of-the-box multiprocessing support. During that exploration, based on learnings since the original spike code, it was determined that the `sentence-transformers` library might be a better fit overall for our purposes. Pivoting to this library will simplify our actual embedding logic, while providing some out-of-the-box tuning capabilities that should be sufficient for our purposes. How this addresses that need: The embedding class `OSNeuralSparseDocV3GTE` is reworked to use `sentence-transformers` instead of `transformers` for creating embeddings. This reducues considerably complexity in the actual creating of embeddings, while also exposing an API for multiprocessing. It's worth noting that testing is indicating that multiprocessing will *not* speed up embeddings, at least for the contexts we aim to create them, but the `sentence-transformers` library also better handles parallelism without explicit multiprocessing. In summary, switching to `sentence-transformers` results in a simpler API for creating embeddings, better out-of-the-box performance, with an API that still allows for more tuning later. Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-137

ghukill force-pushed the USE-137-sentence-transformers branch from fedbdd2 to 8f5ca49 Compare November 10, 2025 17:04

ghukill marked this pull request as ready for review November 10, 2025 17:07

ghukill requested a review from a team November 10, 2025 17:07

ehanson8 approved these changes Nov 10, 2025

View reviewed changes

ghukill added 2 commits November 13, 2025 09:06

Update dependencies

0b13f95

Why these changes are being introduced: With a pivot to sentence-transformers coming, rework the dependencies. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-137

ghukill force-pushed the USE-137-sentence-transformers branch from 8f5ca49 to b859edd Compare November 13, 2025 14:07

ghukill merged commit acc3f95 into main Nov 13, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

USE 137 - Use sentence-transformers library #21

USE 137 - Use sentence-transformers library #21

Uh oh!

ghukill commented Nov 10, 2025 •

edited

Loading

Uh oh!

ehanson8 commented Nov 10, 2025

Uh oh!

ghukill commented Nov 10, 2025 •

edited

Loading

Uh oh!

ehanson8 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

USE 137 - Use sentence-transformers library #21

USE 137 - Use sentence-transformers library #21

Uh oh!

Conversation

ghukill commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose and background context

How can a reviewer manually see the effects of these changes?

Local Run

Docker Run

Includes new or updated dependencies?

Changes expectations for external applications?

What are the relevant tickets?

Code review

Uh oh!

ehanson8 commented Nov 10, 2025

Uh oh!

ghukill commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehanson8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ghukill commented Nov 10, 2025 •

edited

Loading

ghukill commented Nov 10, 2025 •

edited

Loading