fix: resolve GPU embedding performance bottleneck in TransformerEmbedder by calebevans · Pull Request #43 · calebevans/cordon

calebevans · 2026-05-28T16:58:55Z

Summary

Pass device directly to the SentenceTransformer constructor instead of using .to(), which left _target_device out of sync and caused the model to potentially move back to CPU before each forward pass on CUDA
Replace manual per-batch encode() loop with a single model.encode() call to eliminate repeated DataLoader/tokenization overhead and enable length-based sorting for optimal padding
Update test mock to return correctly-sized embedding arrays based on input batch size

Benchmark Results

All benchmarks compare the old code (manual batching + .to() device init) against the fixed code (single encode() + constructor device param), using --batch-size 64 and BAAI/bge-base-en-v1.5.

CPU (Apple Silicon, local)

File	Windows	Batches	Old Code	Fixed Code	Speedup
apache_sample.log (2K lines)	500	8	4.24 batch/s	4.90 batch/s	+15.6%
15x apache_sample.log (30K lines)	7,515	118	3.67 batch/s	4.23 batch/s	+15.3%

GPU (NVIDIA RTX 3060, 12GB VRAM, CUDA cu121)

File	Windows	Batches	Old Code	Fixed Code	Speedup
apache_sample.log (2K lines)	500	8	11.09 batch/s	12.39 batch/s	+12%
15x apache_sample.log (30K lines)	7,515	118	15.96 batch/s	18.70 batch/s	+17%

GPU improvement scales with workload size. Fixed code sustains peaks of ~21 batch/s vs ~16.8 batch/s during sustained processing on the large file.

Why the measured GPU improvement is conservative

The device initialization bug (_target_device staying out of sync) may not fully manifest on all hardware/driver combinations. On the reporter's RTX A4000 setup, the real-world improvement may be larger..

Closes #42

Summary by CodeRabbit

Bug Fixes
- Improved error messages when model initialization fails for clearer troubleshooting.
Refactor
- Optimized embedding workflow to handle multi-input encoding in a single, device-aware operation and ensure returned embeddings align with inputs.
Tests
- Updated test fixtures to simulate multiple-input embeddings and validate batching behavior.
Chores
- Bumped package version to 1.1.1.

- Pass device directly to SentenceTransformer constructor instead of using .to(), which left _target_device out of sync and caused the model to be moved back to CPU before each forward pass on CUDA - Replace manual per-batch encode loop with a single model.encode() call to eliminate repeated DataLoader/tokenization overhead and enable length-based sorting - Measured ~15% embedding throughput improvement on CPU; GPU improvement to be measured soon

coderabbitai · 2026-05-28T16:59:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 634449e8-7e43-4e9c-a4ae-9cf4a3fdc761

📥 Commits

Reviewing files that changed from the base of the PR and between 11e8c5b and 8c6b44c.

📒 Files selected for processing (3)

pyproject.toml
src/cordon/__init__.py
src/cordon/embedding/transformer.py

✅ Files skipped from review due to trivial changes (2)

src/cordon/init.py
pyproject.toml

🚧 Files skipped from review as they are similar to previous changes (1)

src/cordon/embedding/transformer.py

📝 Walkthrough

Walkthrough

TransformerEmbedder now constructs SentenceTransformer on the target device and encodes all windows in one call (delegating batching and normalization to sentence-transformers). Tests mock encode to return correctly shaped normalized embeddings. Package version metadata updated to 1.1.1.

Changes

GPU embedding generation optimization

Layer / File(s)	Summary
Device-aware model initialization `src/cordon/embedding/transformer.py`	Removes unused `tqdm` import and updates `TransformerEmbedder.__init__` to pass `device=str(self.device)` to `SentenceTransformer` inside the existing try/except; docstring updated to document device-aware initialization.
Single-call embedding generation with library batching `src/cordon/embedding/transformer.py`, `tests/test_transformer.py`	Refactors `embed_windows()` to collect all window texts and call `model.encode(..., batch_size=self.config.batch_size, normalize_embeddings=True, convert_to_numpy=True)` once, validates returned length, and yields `(window, embedding)` pairs using `zip(..., strict=True)`. Test mock for `SentenceTransformer.encode` updated to `side_effect` producing correctly sized normalized embeddings.
Version metadata updates `pyproject.toml`, `src/cordon/__init__.py`	Bumps package/project version and module `__version__` from `1.1.0` to `1.1.1`.

Sequence Diagram

sequenceDiagram
  participant TransformerEmbedder
  participant SentenceTransformer
  participant Config
  TransformerEmbedder->>SentenceTransformer: __init__(device=str(self.device))
  SentenceTransformer-->>TransformerEmbedder: model instance on device
  TransformerEmbedder->>TransformerEmbedder: collect window texts
  TransformerEmbedder->>SentenceTransformer: encode(texts, batch_size=Config.batch_size, normalize_embeddings=True, convert_to_numpy=True)
  SentenceTransformer-->>TransformerEmbedder: numpy normalized embeddings
  TransformerEmbedder->>TransformerEmbedder: yield (window, embedding) pairs (zip strict)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

calebevans/cordon#35: Modifies sentence-transformers embedding backend in TransformerEmbedder and updates test mocks for SentenceTransformer.encode outputs.

Suggested labels

enhancement

Poem

🐰 One encode to bind them, no loops to repeat,
GPU-ready model sits snug on its seat,
Windows all gathered, embeddings aligned,
Tests hum approving, version bumped and signed,
Hopping off happy with bytes under feet.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: fixing GPU embedding performance by optimizing the TransformerEmbedder, which aligns with the primary objective of resolving the performance bottleneck reported in issue `#42`.
Linked Issues check	✅ Passed	The PR implements all key requirements from issue `#42`: passes device to SentenceTransformer constructor to prevent device sync issues [`#42`], replaces per-batch loops with single encode() call to eliminate tokenization overhead [`#42`], and validates embedding output length with strict validation [`#42`].
Out of Scope Changes check	✅ Passed	All changes are directly related to the performance fix: TransformerEmbedder optimization, test mock updates, and version bumps (1.1.0→1.1.1) are in-scope for a bug fix release.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/issue-42

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cordon/embedding/transformer.py`:
- Line 90: Replace the non-strict zip to fail fast: before yielding, compare
len(window_list) and len(all_embeddings) (the outputs of window_list and
encode()) and raise a ValueError with a clear message if they differ, then use
zip(window_list, all_embeddings, strict=True) (or simply zip(..., strict=True)
if you prefer relying on the built-in check); reference the variables
window_list, all_embeddings and the encode() call so the error message explains
that encode() returned a different row count than inputs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 15616cfb-981e-45a8-b5d7-a8c9dc9fa282

📥 Commits

Reviewing files that changed from the base of the PR and between 035bbf2 and 11e8c5b.

📒 Files selected for processing (2)

src/cordon/embedding/transformer.py
tests/test_transformer.py

- Add explicit length check between window_list and encode() output before yielding results, raising ValueError with a descriptive message if they differ - Switch zip from strict=False to strict=True as a secondary safeguard

sonarqubecloud · 2026-05-28T17:06:06Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai · 2026-05-28T17:06:33Z

Actionable comments posted: 0

calebevans self-assigned this May 28, 2026

calebevans added the bug Something isn't working label May 28, 2026

release: v1.1.1

d26dcf5

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread src/cordon/embedding/transformer.py Outdated

calebevans merged commit be6623e into main May 28, 2026
14 checks passed

calebevans deleted the fix/issue-42 branch May 28, 2026 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve GPU embedding performance bottleneck in TransformerEmbedder#43

fix: resolve GPU embedding performance bottleneck in TransformerEmbedder#43
calebevans merged 3 commits into
mainfrom
fix/issue-42

calebevans commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

calebevans commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark Results

CPU (Apple Silicon, local)

GPU (NVIDIA RTX 3060, 12GB VRAM, CUDA cu121)

Why the measured GPU improvement is conservative

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 28, 2026

Quality Gate passed

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

calebevans commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading