Translator

Translates content from a CSV, Excel, plain-text, or HTML file — or directly from a URL — into one or more target languages using a local LLM via Ollama or the OpenRouter API. Each translation is back-translated into the source language and scored for round-trip quality. Results are written back to the output file.

Three model roles are configured independently and may be the same or different models:

Role	Purpose
Translator	Forward translation: source → target
Back-translator	Reverse translation: target → source
Evaluator	Scores semantic similarity of original vs back-translation (0.0–1.0)

Prerequisites

uv

Install uv — the package manager used to install and run the tool:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Ollama (only needed if you plan to use local Ollama mode)

NOTE The Ollama mode works for demonstration purposes but is not a good translation solution due to throughput limitations, OpenRouter based translation is reccommended for productive use.

Install Ollama and pull at least one model:

ollama pull phi4-mini

The tool will attempt to start Ollama automatically if it is not already running. If you prefer to start it yourself, run ollama serve before launching the tool.

OpenRouter API key (cloud mode only)

If you want to use OpenRouter instead of Ollama, create an account at openrouter.ai and obtain an API key. Then set it as an environment variable:

# macOS / Linux — add to ~/.bashrc or ~/.zshrc to make permanent
export OPENROUTER_API_KEY=your-key-here

# Windows (Command Prompt) — sets permanently for your user account
setx OPENROUTER_API_KEY your-key-here

Installation

IMPORTANT Do not unzip the file into a folder managed with One Drive, copy to a logical location on your C drive and unzip there, otherwise UV and the libraries will get very confused and fail to work!

Clone or unzip the project files into a new folder. Navigate to the folder in a command window and then install all dependencies from the lockfile:

uv sync --frozen

This creates a .venv in the project directory and installs exact locked versions with SHA256 hash verification.

Usage

# navigate to your environment first
cd C:\Location\of\translator

# activate your UV environment
.venv\Scripts\activate

# Cloud models via OpenRouter (default config - recommended)
uv run translator --input phrases.csv --output results.csv

# Local models via Ollama (alternate config)
uv run translator --input phrases.csv --output results.csv --config config_ollama.toml

All options

Flag	Default	Description
`--input`	(optional)	Input file path (`.csv`, `.xlsx`, `.txt`, `.html`) or a URL. Omit to process all files in `input/`.
`--output`	(optional)	Output file path. Omit to overwrite `--input`, or write to `output/` in folder mode. Not used for HTML/URL input.
`--source-column`	`phrase`	CSV/Excel column containing source text.
`--config`	`./config.toml`	Path to config file.
`--verbose`		Enable debug-level console output.
`--quiet`		Suppress all console output except errors.
`--log-file`	`log_file.txt`	Path to log file.

All model, language, backend, and threshold settings are configured in config.toml — see Configuration below.

Note on first run

On first run, the tool downloads evaluation model weights (~2–3 GB total) from HuggingFace for the sentence-transformer and BERTScore models. This is a one-time download; subsequent runs use the cached weights. Any configured Ollama models not already present are also pulled automatically.

Where to put your files

Single file: pass --input and optionally --output directly:

uv run translator --input phrases.csv --output results.csv

Batch mode: place all your files in an input/ folder next to the project, then run with no flags. Translated files are written to output/ automatically:

translator/
├── input/
│   ├── phrases.csv
│   ├── marketing.xlsx
│   └── legal.txt
└── output/       ← created automatically
    ├── phrases.csv
    ├── marketing.xlsx
    └── legal.txt

uv run translator

HTML and URL translation

Pass a URL or a local .html/.htm file as --input and the tool will translate every visible text node on the page:

# Translate a web page directly from its URL
uv run translator --input https://example.com/about

# Translate a local HTML file
uv run translator --input page.html

For each target language the tool writes two files into output/:

File	Description
`{stem}_{lang}.html`	Full translated HTML with all markup preserved
`{stem}_summary.xlsx`	One row per text node: original, translation, back-translation, score

The --output flag is not used for HTML/URL input — output is always written to output/.

Translation cache

Short phrases (≤ cache_max_words words, default 5) are cached in a local SQLite database at cache/translations.db. On subsequent runs any cached phrase is served instantly with no LLM call, which makes re-translating pages with repeated navigation text (menus, labels, headings) much faster. Short phrases that score below the quality threshold are accepted and cached without retry — a two-word label is unlikely to improve on a second attempt.

To start fresh, delete the cache/ folder:

rm -rf cache/

To disable caching entirely, set max_words = 0 in the [cache] section of your config file.

Input format

The input CSV must have a column containing the source phrases (default column name: phrase).

id,phrase
1,The early bird catches the worm
2,Better late than never

Excel (.xlsx, .xls) and plain-text (.txt) files are also supported.

Output format

The original columns are preserved and three columns are appended per target language:

id,phrase,fr_translation,fr_back,fr_score,de_translation,de_back,de_score
1,The early bird catches the worm,Le lève-tôt attrape le ver,The early bird catches the worm,0.923,...

Column	Description
`{lang}_translation`	Forward translation produced by the Translator model
`{lang}_back`	Back-translation produced by the Back-translator model
`{lang}_score`	Round-trip quality score 0.0–1.0

Scores near 1.0 indicate the meaning was very well preserved. Scores below 0.6 suggest the translation may be unreliable.

For plain-text input, the output file begins with an average score line followed by each language's translation.

Configuration

All model, language, backend, and threshold settings live in config.toml. Pass --config path/to/file.toml to use a different config. Every key listed below is required — the tool will report all missing keys if any are absent rather than silently using defaults.

[models]
backend = "ollama"          # "ollama" or "openrouter"
translator = "phi4-mini"
backtranslator = "phi4-mini"
evaluator = "phi4-mini"
request_timeout = 60        # seconds per LLM call; raise for slow models

[ollama]
base_url = "http://localhost:11434"

[openrouter]
# Set OPENROUTER_API_KEY as an environment variable (see Prerequisites above)

[scoring]
quality_threshold = 0.7     # score at which the retry loop exits early
retries = 3                 # max additional attempts per phrase/chunk
pass_threshold = 0.8        # informational: score considered passing
warn_threshold = 0.6        # informational: score considered acceptable
max_chunk_chars = 4000      # split inputs longer than this; 0 to disable

[cache]
max_words = 5               # cache phrases with this many words or fewer; 0 to disable

[languages]
source = "en"
targets = ["fr", "de", "nl", "fr-BE", "nl-BE"]

Four ready-made configs are included:

File	Backend	Models
`config.toml`	Ollama (local)	phi4-mini for all roles
`config_ollama.toml`	Ollama (local)	phi4-mini for all roles
`config_or_cheap.toml`	OpenRouter	Claude Sonnet / GPT / DeepSeek
`config_or_pricey.toml`	OpenRouter	Claude Opus / GPT / Gemini

Running tests

The project has a full test suite (unit and integration). Tests are excluded from the distribution zip to keep it lightweight, but are available in the full repository.

uv run pytest                  # Unit tests only (default)
uv run pytest -m integration   # Integration tests (requires live Ollama / OpenRouter)
uv run pytest -m ""            # Everything

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
input		input
output		output
src/translator		src/translator
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
completed_work.md		completed_work.md
config.toml		config.toml
config_ollama.toml		config_ollama.toml
config_or_cheap.toml		config_or_cheap.toml
config_or_pricey.toml		config_or_pricey.toml
experiment.py		experiment.py
main.py		main.py
plan.md		plan.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Translator

Prerequisites

uv

Ollama (only needed if you plan to use local Ollama mode)

OpenRouter API key (cloud mode only)

Installation

Usage

All options

Note on first run

Where to put your files

HTML and URL translation

Translation cache

Input format

Output format

Configuration

Running tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Translator

Prerequisites

uv

Ollama (only needed if you plan to use local Ollama mode)

OpenRouter API key (cloud mode only)

Installation

Usage

All options

Note on first run

Where to put your files

HTML and URL translation

Translation cache

Input format

Output format

Configuration

Running tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages