A CLI tool for translating book chapters with LLMs.
cipher is built for long-form translation workflows where consistency matters across many chapters. It combines profile-based provider configuration, glossary injection, validation, repair retries, and checkpointed run state so you can translate iteratively instead of treating every run as a one-shot batch job.
It is especially suited for serialized web novels and other chapter-based books, but the workflow also fits any markdown-based long-form source text.
A cipher book project is a directory containing:
- raw source chapters
- translated output
- a canonical glossary
- a style guide
- internal state used for resumability and rerun planning
For each chapter, cipher:
- loads the raw markdown
- selects glossary terms using
smartorfullinjection - sends the chapter, glossary, and style guide to the configured model
- validates the returned translation
- attempts one repair pass if validation fails
- writes accepted output atomically
- merges any newly discovered glossary terms
- saves run and chapter state under
.cipher/
This makes later runs safer and more explainable, especially when the glossary grows over time.
cargo install --git https://www.github.com/siddhj2206/cipher.gitcipher uses profiles to choose a provider and model.
cipher profile newThis interactive flow lets you:
- create or reuse a provider
- enter or reuse an API key
- choose a model
- optionally set the profile as default
Built-in providers currently include gemini and openai, and you can also add custom OpenAI-compatible providers.
You can inspect profiles with:
cipher profile list
cipher profile show myprofile
cipher profile test myprofileFrom scratch:
cipher init my-bookFrom an EPUB:
cipher import my-book.epubYou can also initialize a book with a profile or imported glossary:
cipher init my-book --profile myprofile
cipher init my-book --from other-book
cipher init my-book --import-glossary terms.jsonPlace source markdown files in raw/:
my-book/
raw/
001.md
002.md
003.md
cipher translate my-bookTranslated chapters are written to tl/.
cipher status my-bookThis shows the latest recorded run metadata and chapter summary.
my-book/
config.json # Book configuration
glossary.json # Canonical glossary
style.md # Style guide injected into prompts
raw/ # Source chapters
001.md
002.md
...
tl/ # Translated output
001.md
002.md
...
.cipher/ # Internal run state, chapter state, glossary state, backups
Translate a book. If book_dir is omitted, the current directory is used.
cipher translate
cipher translate my-book
cipher translate my-book --profile fast
cipher translate my-book --overwrite
cipher translate my-book --fail-fast
cipher translate my-book --rerun
cipher translate my-book --rerun-affected-glossary
cipher translate my-book --rerun-affected-chaptersCurrent translate flags:
--profile <name>: override the book/global profile for this run--overwrite: retranslate even when output already exists--fail-fast: stop on the first failed chapter--rerun: retranslate chapters whose tracked source or glossary-relevant inputs changed--rerun-affected-glossary: retranslate chapters whose glossary-relevant inputs changed since the tracked baseline--rerun-affected-chapters: retranslate chapters whose raw markdown changed since the last tracked chapter state
Default behavior:
- chapters are discovered from
raw/ - chapter order is stable and numeric-first
- existing outputs are skipped unless overwrite or rerun logic applies
- output is validated before being accepted
- failed API calls retry with exponential backoff
- validation failures get one repair attempt
- accepted outputs are written atomically
- overwriting creates timestamped backups in
.cipher/backups/
Show the latest recorded run state for a book.
cipher status my-bookStatus currently includes:
- profile, provider, and model used for the last run
- start/update/finish timestamps
- chapter counts for translated, skipped, failed, and pending
- tracking counts for smart-tracked chapters, smart fallback-to-full chapters, legacy primary full-tracked chapters, approximate legacy fallback, exported-term tracking, and source hashes
- a list of failed chapters with short error previews
Create a new book scaffold.
cipher init my-book
cipher init my-book --profile myprofile
cipher init my-book --from other-book
cipher init my-book --import-glossary terms.jsonImport an EPUB into a new book directory.
cipher import novel.epub
cipher import novel.epub --forceCurrent import behavior:
- creates a book directory alongside the EPUB
- extracts chapters into
raw/ - converts HTML to markdown
- skips very small/empty chapters
- initializes the standard book scaffold
Manage the canonical glossary.
cipher glossary list my-book
cipher glossary import my-book new-terms.json
cipher glossary export my-book backup.jsonManage profiles.
cipher profile new
cipher profile list
cipher profile show myprofile
cipher profile set-default myprofile
cipher profile test myprofileRun diagnostics.
cipher doctor
cipher doctor my-bookWithout a book directory, doctor checks global configuration.
With a book directory, it checks book layout and effective profile resolution.
Global configuration is stored using XDG config directories. On Linux, the current path resolves to:
~/.config/cipher/cipher/config.json
It contains:
- providers
- API keys
- profiles
- default profile
The current implementation stores API keys as plain text in this config. Improving secret storage is planned.
Each book contains a portable config.json:
{
"profile": "",
"raw_dir": "raw",
"out_dir": "tl",
"glossary_path": "glossary.json",
"style_path": "style.md",
"glossary_injection": "smart"
}Profile resolution order:
--profile- book
config.json - global default profile
The glossary is a JSON array of terms:
[
{
"term": "Starship",
"og_term": "星空舰",
"definition": "The main character's vessel"
},
{
"term": "River Map",
"og_term": "山河图",
"definition": "An ancient artifact containing a sealed dimension",
"notes": "Sometimes referred to as 'The Map' in casual dialogue"
}
]Fields:
term: translated term to enforceog_term: original-language term used for matchingdefinition: explanation/contextnotes: optional extra guidance
Glossary behavior:
- canonical source of truth is
glossary.json - merges are deterministic
- duplicate terms are skipped during merge/import
- new terms returned by successful chapters are appended after dedupe
Book config now treats smart as the canonical mode.
smart- select relevant glossary terms for the current chapter- legacy
fullconfig values are deprecated and treated assmart
smart is the default.
Current smart-mode behavior:
- matches glossary terms against the chapter text using deterministic selection logic
- always includes terms with empty
og_term - falls back to full glossary when too few matches are found
- legacy primary full-tracking state is migrated opportunistically when a successful smart-era run proves it is equivalent to smart fallback tracking
If present, style.md is injected into every translation request.
Use it for:
- tone
- narration style
- dialogue conventions
- recurring translation preferences
- rules that are broader than glossary terms
Before output is accepted, cipher validates it.
Current checks include:
- non-empty output
- heading presence/shape
- balanced code fences
- JSON/schema leakage detection
- rejection of raw structured response artifacts leaking into prose
If validation fails:
- the failure is recorded
- one repair request is attempted using the original text, failed translation, and validation errors
- the repaired output is validated again
- if it still fails, the chapter is marked failed
cipher stores internal state under .cipher/ so runs are resumable and future rerun decisions can be more informed.
Current tracked state includes:
- run metadata
- per-chapter result state
- glossary-state snapshots
- chapter glossary usage
- exported glossary term fingerprints
--rerun-affected-glossary uses tracked state to detect when a chapter should be rerun because glossary-relevant inputs changed.
Current support includes:
- direct comparison between saved chapter glossary state and the current expected glossary usage for tracked chapters
- changed glossary fingerprints for previously selected terms
- changed fingerprints for exported terms
- smart-selection changes when newly relevant or removed terms alter the effective injected set
- fallback-to-full behavior changes when smart selection now recovers or degrades
- forward-only incremental replanning for remaining chapters when new glossary terms are discovered mid-run
For tracked chapters, the global glossary baseline is no longer the primary rerun decision input. It is kept mainly for legacy untracked approximation and run-level baseline commits.
These are different tools:
--overwritemeans redo outputs regardless of tracked equivalence--rerunmeans rerun chapters whose tracked source or glossary inputs changed--rerun-affected-glossarymeans rerun chapters whose tracked glossary inputs became stale--rerun-affected-chaptersmeans rerun chapters whose tracked raw source became stale
Current file-safety behavior:
- accepted outputs are written atomically
- overwriting creates backups in
.cipher/backups/ - glossary and state are saved incrementally during runs
This keeps runs resumable and reduces the chance of corrupted outputs after interruptions.
A few areas are intentionally still evolving:
- API keys are not yet stored in a proper secret store
- dry-run rerun preview is not implemented yet
- status output does not yet expose all tracked-vs-approximate rerun details
- repair and glossary extraction are still more coupled than they should be long-term
Useful commands while working on the project:
cargo build
cargo check
cargo fmt
cargo test
cargo run -- translate ./test-book
cargo run -- status ./test-book
cargo run -- doctor ./test-book