A pure-Python text diff library and command-line tool built around the classic Longest Common Subsequence algorithm.
The project intentionally implements the LCS dynamic-programming table by hand.
It does not use difflib, Hirschberg, Myers diff, or an external diff library.
- Library API for comparing two lists of lines
argparseCLI- Classic
O(n x m)LCS table with backtrace - Edit script using
EQUAL,DELETE, andINSERT - Unified, inline, side-by-side, and summary output
- Configurable unified context lines
- UTF-8 input with clean decode errors
- Simple binary-file detection (first 4 KiB scanned for NUL bytes)
- One-sided stdin using
- - Exit codes:
0identical,1different,2usage error,3runtime error - Optional comparison flags for whitespace and case
- Optional word-level diff for paired delete/insert lines
python -m diff_tool.cli old.txt new.txt
python -m diff_tool.cli old.txt new.txt --format summary
python -m diff_tool.cli old.txt new.txt --stat
python -m diff_tool.cli old.txt new.txt --format side-by-side --width 120
python -m diff_tool.cli old.txt new.txt --ignore-case
python -m diff_tool.cli old.txt new.txt --ignore-trailing-space
Get-Content old.txt | python -m diff_tool.cli - new.txtIf installed as a package, the console script is:
pip install -e .
diff-tool old.txt new.txt| Flag | Description |
|---|---|
--version |
Print version and exit |
--format |
unified, side-by-side, inline, or summary |
--stat |
Shortcut for summary output |
-U, --context |
Unified context lines (default: 3) |
--ignore-trailing-space |
Ignore trailing whitespace when comparing |
--ignore-all-space |
Collapse all whitespace when comparing |
--ignore-blank-lines |
Skip blank lines during comparison and output |
--ignore-case |
Case-insensitive comparison |
--word-diff |
Token-level highlights for paired changed lines |
--color |
auto, always, or never |
--no-color |
Disable ANSI color |
--width |
Side-by-side total width (minimum 40) |
| Code | Meaning |
|---|---|
0 |
Inputs are identical |
1 |
Inputs differ |
2 |
CLI usage error (invalid flags or arguments) |
3 |
Runtime error (missing file, encoding, binary input, algorithm guard) |
The hidden --max-table-cells guard rejects inputs that would allocate an
oversized DP table. Values below 1 are rejected.
Sample files in examples/ demonstrate a small text change:
python -m diff_tool.cli examples/old.txt examples/new.txt
python -m diff_tool.cli examples/old.txt examples/new.txt --word-difffrom diff_tool import diff_lines
from diff_tool.models import DiffOptions
result = diff_lines(
["The quick brown fox"],
["The quick red fox"],
DiffOptions(word_diff=True),
)
for edit in result.edits:
print(edit.op, edit.old_text, edit.new_text)Given sequences of length n and m, the engine builds a DP table where:
table[i][j] = LCS length of prefixes a[:i] and b[:j]
Matching items extend the diagonal; mismatches take the row/column maximum.
Backtrace recovers an edit script of EQUAL, DELETE, and INSERT steps.
- Time:
O(n x m) - Space:
O(n x m)for the full table - Tie-breaking: documented in ADR 0001
Formatters render the edit script, not a verbatim replay of both files. See ADR 0006 for the full policy.
| Format | Output when files match | Exit code |
|---|---|---|
unified (default) |
Empty string (silent) | 0 |
inline / side-by-side |
Full file with context prefixes | 0 |
summary / --stat |
Stats block (Status: identical) |
0 |
When lines compare equal under normalization flags, formatters show the old
side text (old_text) for EQUAL rows. This keeps output stable when
--ignore-case or --ignore-trailing-space is enabled. The underlying
Edit still stores both originals for library callers.
With --ignore-blank-lines, blank lines are removed from the edit script.
They do not appear in unified, inline, or side-by-side output. Raw line
counts in summary stats still include those blank lines.
Version 1 is library + CLI only. A Django + HTMX showcase is intentionally not
part of this build. Very large files are also a non-goal because the selected
algorithm uses an O(n x m) table.
Word diff applies only to paired delete/insert lines within a change block. Consecutive deletes are paired with consecutive inserts in order; extra unpaired lines are shown without token highlights.
Tokens are split on whitespace, so punctuation attached to a word (for example
brown,) is treated as part of that token.
Highlighted token output rejoins words with a single space, so spacing-only changes may still appear as line-level delete/insert pairs without preserving original gaps between words.
old_line_count/new_line_count: raw input line countsinsert_count: number ofINSERToperations in the edit scriptdelete_count: number ofDELETEoperationsequal_count: number ofEQUALoperationschanged_count: for each non-equal run, countmax(deletes, inserts)similarity:equal_count / max(comparable_old_count, comparable_new_count) * 100
When --ignore-blank-lines is enabled, blank lines are omitted from comparison
and formatted output, but still appear in the raw line counts above. Similarity
is computed from the filtered edit script.
diff_tool/ Library and CLI (stdlib only)
cli.py argparse entry point and exit codes
engine.py normalize → LCS → hunks → stats → word diff
lcs.py DP table and backtrace
hunks.py Context grouping for unified output
normalize.py Comparison keys vs display text
word_diff.py Token-level LCS for changed lines
io.py UTF-8 files, stdin, binary detection
formatters/ unified, inline, side-by-side, summary
tests/ Pytest suite (145 tests, ~100% coverage)
conftest.py Shared helpers
test_*.py Layer tests (LCS, engine, formatters, CLI, …)
fixtures/ Golden expected outputs
docs/adr/ Architecture decision records
examples/ Sample inputs (old.txt, new.txt)
The suite exercises the library and CLI by layer: LCS table and backtrace, normalization, hunks, stats, word diff, all formatters, I/O errors, and integration examples with golden unified output.
pip install -e ".[dev]"
python -m pytest
python -m pytest --cov=diff_tool --cov-report=term-missingCoverage is enforced at 95% minimum on diff_tool/ (currently ~100% on
library modules; the CLI __main__ guard is excluded). CI runs the same
pytest + coverage command on Python 3.11–3.13.
| Test module | Focus |
|---|---|
test_lcs.py, test_backtrace.py |
DP table, lcs_steps, direct backtrace |
test_engine.py |
diff_lines, options, reconstruction property |
test_normalize_and_color.py |
Keys, ANSI color policy |
test_hunks.py, test_hunks_stats.py |
Hunk merge/split, stats formulas |
test_word_diff.py |
Token diff and render markers |
test_formatters.py |
All output formats and color |
test_io.py |
Files, stdin, encoding, binary |
test_cli.py |
Exit codes, flags, errors |
test_examples.py |
examples/ + golden fixture |
test_models.py, test_errors.py, test_package.py |
API surface |
Install editable with dev tools:
pip install -e .
pip install -r requirements-dev.lock
# or non-editable tools only:
pip install -r requirements-dev.lockRuntime install has no third-party dependencies (requirements.txt is
documentation only). Dev dependencies are declared in pyproject.toml
under [project.optional-dependencies.dev] with floor and upper bounds.
Exact resolved versions for reproducible installs are pinned in
requirements-dev.lock.
python -m pytest
python -m pytest --cov=diff_tool --cov-report=term-missing
python -m ruff check .
python -m ruff format --check .
python -m mypy diff_toolCI runs tests with coverage, ruff (check + format), and mypy on Python
3.11–3.13. See .github/workflows/ci.yml.
See CHANGELOG.md.
MIT — see LICENSE.