Feat: Improve Translation quality and perfomrance. by SloMR · Pull Request #11 · SloMR/TransLora

SloMR · 2026-04-21T18:14:38Z

This pull request updates both the documentation (README.md) and core CLI implementation to introduce a new pre-translation "cast & register prepass" for better consistency, improve language auto-detection, enhance validation and retry logic, and make the CLI interface clearer and more robust. The changes also adjust default behaviors and add new configuration options.

Key improvements and new features:

Major pipeline and feature enhancements

Added a "cast & register prepass" step to extract characters, recurring terms, and written register before translation, ensuring consistent translation of names and formality across batches. This is now included in both the documentation and the CLI pipeline. [1] [2] [3]
Enabled auto-detection of the source language by default, both in the web and CLI interfaces, simplifying usage for mixed-language files. [1] [2] [3]

Validation and retry improvements

Improved validation logic: on repeated validation failures, batches are recursively split down to single blocks before giving up, increasing robustness for difficult files. [1] [2] [3]
Enhanced retry/back-off logic, including more precise handling of rate limits and batch splitting.

CLI and configuration changes

Changed the default batch size from 15 to 10 for better reliability with small/medium models, and updated the CLI/configuration accordingly. [1] [2] [3]
Added a verbose flag to the CLI to show retry/validation warnings, which are now hidden by default for cleaner output. [1] [2]
Clarified CLI usage and documentation, including clearer flag descriptions and examples for both auto-detect and explicit source language. [1] [2]

Codebase cleanup and minor improvements

Refactored and clarified docstrings and comments for better readability and maintainability. [1] [2] [3] [4]
Updated the roadmap to reflect the new "cast & register prepass" and removed the now-implemented translation memory feature.

- Added functions to handle subtitle blocks without timestamps for easier parsing. - Updated translation logic to enhance handling of subtitle blocks. - Improved model interaction by eliminating noise from timestamps. - Streamlined serialization of subtitle data, boosting translation accuracy.

- Updated subtitle serialization to exclude timestamps, improving translation accuracy. - Added logic to reattach timestamps after translation. - Simplified input format in translation prompt to align with changes.

- Integrates file context scanning for consistent character names and term translations. - Enhances translation accuracy by addressing gendered-pronoun errors. - Introduces a glossary system to improve scene-context translations. - Provides better support for language-specific nuances in subtitle translations.

- Introduces batch splitting on validation failures to enhance translation accuracy. - Adds register consistency for coherent translations across subtitle files. - Ensures non-empty inputs do not produce empty outputs. - Improves context detection for better glossary generation. - Provides more granular retry logic, benefiting subtitle translation stability.

- Introduces context scanning for gendered-pronoun accuracy. - Ensures consistent language register and terminology. - Improves batch handling with retry logic and batch splitting. - Enhances translation fidelity for non-Latin languages.

- Enables automatic detection of source language for translations. - Enhances usability for mixed-language subtitle batches. - Updates UI to include an "Auto-detect" option in language selection. - Improves translation workflow by eliminating the need to specify a source language.

- Introduces a verbose flag to control the visibility of retry and validation warnings. - By default, warnings are suppressed, enhancing user experience during translation tasks. - Verbose mode helps users monitor detailed translation attempts and issues.

- Introduces contextual hints for form fields to guide user input. - Enhances user experience by clarifying field purposes and options. - Adds advice for optional fields and server-specific inputs. - Provides better visual structure and alignment in the UI.

- Decreased default batch size from 15 to 10 for better handling. - Updated help text to reflect batch size adjustment. - Aligns batch size across CLI and web for consistency.

Copilot

Pull request overview

This PR improves subtitle translation consistency and robustness across both the web app and CLI by adding a file-level context prepass (“cast & register”), enabling source-language auto-detection by default, and refining batching/validation/retry behavior.

Changes:

Added a one-shot context scan to extract register/characters/terms and attach a batch-scoped glossary to each translation request (web + CLI).
Switched to a timestamp-free “lite” wire format for LLM calls and added recursive batch splitting on repeated validation failures; updated defaults (batch size 10, source auto-detect) and added CLI --verbose.
Updated UI hints, CLI help text, and README; bumped versions to 0.4.0.

Reviewed changes

Copilot reviewed 29 out of 32 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
web/src/index.html	Removes alternate favicon link while keeping SVG favicon setup.
web/src/app/core/translation.service.ts	Adds context prepass + glossary injection; refactors request building; adds split-on-validation-failure logic.
web/src/app/core/translation-prompt.ts	Updates system prompt for “no timestamps” wire format and supports optional glossary in user message.
web/src/app/core/time-tracker.ts	Refactors comments/docstrings for brevity/clarity (no logic change).
web/src/app/core/subtitle-formats/types.ts	Shortens interface documentation (no functional change).
web/src/app/core/subtitle-formats/index.ts	Shortens parsing documentation (no functional change).
web/src/app/core/srt-parser.ts	Adds `serializeLite`/`parseLite` wire format helpers for timestamp-free translation requests.
web/src/app/core/languages.ts	Condenses comments on language metadata (no functional change).
web/src/app/core/file-types.ts	Clarifies upload parsing comment (no functional change).
web/src/app/core/context-pass.ts	New web context-pass implementation (scan budget, parsing tagged response, batch-scoped glossary rendering).
web/src/app/app.component.ts	Defaults source language to auto-detect and adjusts swap behavior; comment cleanups.
web/src/app/app.component.scss	Adds layout/hint styles for language + advanced form hints.
web/src/app/app.component.html	Adds “Auto-detect” source option and explanatory field hints across advanced/basic sections.
web/public/favicon.ico	Includes favicon asset (supports browsers that request `.ico`).
web/package.json	Bumps web app version to 0.4.0.
web/package-lock.json	Updates lockfile version metadata to 0.4.0.
cli/uv.lock	Bumps CLI package version to 0.4.0 in lock metadata.
cli/translora.py	Makes `--source` optional (auto-detect default), changes default batch size to 10, adds `--verbose`, refreshes help/examples.
cli/tests/test_srt_parser.py	Adds validation coverage for “empty output text” cases.
cli/tests/test_context_pass.py	Adds tests for context-pass parsing/rendering and scan sampling.
cli/pyproject.toml	Bumps CLI package version to 0.4.0.
cli/core/translator.py	Adds per-file context scan step and passes FileContext into batch translation.
cli/core/time_tracker.py	Condenses docstrings (no logic change).
cli/core/srt_parser.py	Adds `serialize_lite`/`parse_lite` and extends validation to catch empty output on non-empty input.
cli/core/prompt.py	Updates system prompt for “no timestamps” wire format + glossary/register guidance.
cli/core/live_status.py	Condenses docstrings/comments (no logic change).
cli/core/formats/types.py	Condenses SubtitleDocument docstring (no functional change).
cli/core/formats/init.py	Condenses module docstring (no functional change).
cli/core/context_pass.py	New CLI context-pass implementation + scan extraction call.
cli/core/config.py	Adds `verbose` flag and makes warnings silent by default (reroutable).
cli/core/batch_runner.py	Adds lite wire format, glossary user message, recursive split on validation failure, and updated retry/warn behavior.
README.md	Documents prepass/context scan, auto-detect source, new defaults/flags, and pipeline updates.

Files not reviewed (1)

web/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T18:19:25Z

+            if len(output) == len(batch):
+                output = [
+                    SubtitleBlock(number=batch[i].number,
+                                  timestamp=batch[i].timestamp,
+                                  text=output[i].text)


Here the parsed lite output is rewritten with the input batch’s numbers/timestamps before validation. That prevents validate_batch from catching incorrect numbering or reordered blocks (those fields get overwritten), which can silently misalign text with timestamps. Prefer validating the returned numbering/order first, and then reattaching timestamps by matching on the returned block number (or only overwriting timestamps).

Copilot · 2026-04-21T18:19:25Z

+    can_split = len(batch) > 1
+    attempts = _ATTEMPTS_BEFORE_SPLIT if can_split else cfg.max_retries
+    hit_validation_failure = False


attempts is set to _ATTEMPTS_BEFORE_SPLIT for any batch with >1 block, which limits retries for transient HTTP/network errors as well as validation failures. This can reduce resilience to 429/5xx spikes. Consider keeping cfg.max_retries for request failures, and only triggering split-after-N when validation keeps failing.

Copilot · 2026-04-21T18:19:26Z

+# Wire format: number + text only. Timestamps are stripped before sending
+# because small models sometimes corrupt them; callers reattach positionally.
+def serialize_lite(blocks: list[SubtitleBlock]) -> str:
+    return "\n\n".join(f"{b.number}\n{b.text}" for b in blocks) + "\n"
+
+
+def parse_lite(content: str) -> list[SubtitleBlock]:


serialize_lite/parse_lite are new parsing/serialization paths but the existing tests don’t appear to cover them (e.g., multi-line text, trailing newlines, invalid blocks). Adding focused tests for these helpers would reduce the risk of wire-format regressions.

Copilot · 2026-04-21T18:19:26Z

+        // Wire format strips timestamps; reattach positionally.
+        if (output.length === inputBlocks.length) {
+          output = output.map((b, i) => ({
+            number: inputBlocks[i].number,
+            timestamp: inputBlocks[i].timestamp,


The code reattaches block numbers/timestamps positionally before calling validateBatch. This means validateBatch can no longer detect a model that outputs incorrect block numbers or reorders blocks (the mapping overwrites those fields), which can silently attach the wrong text to a timestamp. Consider validating the model’s returned numbering/order first, and then reattaching timestamps by matching on the returned block number (or only overwriting timestamps, not numbers).

Copilot · 2026-04-21T18:19:26Z

+    const canSplit = inputBlocks.length > 1;
+    // Splittable batches give up early — halving resolves persistent count
+    // mismatches faster than more retries on the same payload.
+    const attempts = canSplit ? ATTEMPTS_BEFORE_SPLIT : maxRetries;


attempts is reduced to ATTEMPTS_BEFORE_SPLIT for any batch with >1 block, which also limits retries for transient HTTP/network errors (not just validation mismatches). This makes large batches much more likely to fail on temporary 429/5xx. Consider using maxRetries for request failures, and only switching to split-after-N logic when repeated validation failures occur.

Copilot · 2026-04-21T18:19:27Z

+// Wire format: number + text only. Timestamps are stripped before sending
+// because small models sometimes corrupt them; callers reattach positionally.
+export function serializeLite(blocks: SubtitleBlock[]): string {
+  return blocks.map((b) => `${b.number}\n${b.text}`).join('\n\n') + '\n';
+}


New wire-format helpers (serializeLite/parseLite) are introduced here but there are no unit tests covering their round-trip behavior, multi-line text handling, or invalid input tolerance. Since this module already has a spec file, adding tests for these functions would help prevent subtle parsing regressions.

SloMR added 15 commits April 21, 2026 18:21

CLI: Enhance subtitle translation by refining parsing

7553672

- Updated subtitle serialization to exclude timestamps, improving translation accuracy. - Added logic to reattach timestamps after translation. - Simplified input format in translation prompt to align with changes.

Web: Enhance translation consistency

8593999

- Introduces context scanning for gendered-pronoun accuracy. - Ensures consistent language register and terminology. - Improves batch handling with retry logic and batch splitting. - Enhances translation fidelity for non-Latin languages.

Web/CLI: Adjust batch size for improved performance

529011e

- Decreased default batch size from 15 to 10 for better handling. - Updated help text to reflect batch size adjustment. - Aligns batch size across CLI and web for consistency.

Doc: Update README

acdd885

Web: Remove unneeded favicon.

05a3af5

CLI: Remove unneeded comments.

d262658

Web: Remove unneeded comments.

06e92cb

Web: Bump version to 0.4.0

947aa06

CLI: Bump version to 0.4.0

eb20dce

SloMR self-assigned this Apr 21, 2026

Copilot AI review requested due to automatic review settings April 21, 2026 18:14

Copilot started reviewing on behalf of SloMR April 21, 2026 18:14 View session

SloMR merged commit 51d84ff into main Apr 21, 2026
5 checks passed

SloMR deleted the feat/ImproveTranslation branch April 21, 2026 18:15

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Improve Translation quality and perfomrance.#11

Feat: Improve Translation quality and perfomrance.#11
SloMR merged 15 commits into
mainfrom
feat/ImproveTranslation

SloMR commented Apr 21, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SloMR commented Apr 21, 2026

Major pipeline and feature enhancements

Validation and retry improvements

CLI and configuration changes

Codebase cleanup and minor improvements

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants