Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 47 additions & 25 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,43 +33,65 @@ jobs:
with:
dotnet-version: '8.0.x'

- name: Install dependencies
- name: Determine runtime identifier
id: rid
shell: bash
run: |
python -m pip install --upgrade pip
pip install hatch
case "${{ runner.os }}" in
Linux) echo "rid=linux-x64" >> "$GITHUB_OUTPUT" ;;
Windows) echo "rid=win-x64" >> "$GITHUB_OUTPUT" ;;
macOS) echo "rid=osx-arm64" >> "$GITHUB_OUTPUT" ;;
esac

- name: Build engine binaries
run: python build_differ.py
- name: Build engine binaries for this platform
run: python build_differ.py ${{ steps.rid.outputs.rid }}

- name: Install packages (editable)
run: pip install -e packages/core -e packages/ooxmlpowertools -e packages/docxodus pytest

- name: Run tests
run: hatch run test
run: python -m pytest tests/ -v

build:
build-core:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
submodules: recursive
python-version: '3.11'
- run: pip install build twine
- name: Build core sdist + wheel
run: python -m build packages/core --outdir dist
- name: Check distributions
run: twine check dist/*

- name: Set up Python
uses: actions/setup-python@v5
build-engine-wheels:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
include:
- { os: ubuntu-latest, rids: "linux-x64 linux-arm64" }
- { os: windows-latest, rids: "win-x64 win-arm64" }
- { os: macos-latest, rids: "osx-x64 osx-arm64" }
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Set up .NET
uses: actions/setup-dotnet@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '8.0.x'

- name: Install build dependencies
run: |
python -m pip install --upgrade pip
pip install hatch hatchling

- name: Build package
run: hatch build

- name: Check package
- run: pip install build hatchling twine
- name: Build per-platform engine wheels
shell: bash
run: |
pip install twine
twine check dist/*
for rid in ${{ matrix.rids }}; do
python build_differ.py "$rid"
python -m build --wheel --no-isolation packages/ooxmlpowertools --outdir dist
python -m build --wheel --no-isolation packages/docxodus --outdir dist
done
- name: Check wheels
run: twine check dist/*
79 changes: 63 additions & 16 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
name: Upload Python Package

# Builds and publishes all three packages on a tagged release:
# python-redlines (core, pure-Python sdist + wheel)
# python-redlines-ooxmlpowertools (per-platform engine wheels)
# python-redlines-docxodus (per-platform engine wheels)

on:
release:
types: [published]
Expand All @@ -8,28 +13,70 @@ permissions:
contents: read

jobs:
deploy:

build-core:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install build
- name: Build core sdist + wheel
run: python -m build packages/core --outdir dist
- uses: actions/upload-artifact@v4
with:
name: dist-core
path: dist/*

build-engine-wheels:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
include:
- { os: ubuntu-latest, rids: "linux-x64 linux-arm64" }
- { os: windows-latest, rids: "win-x64 win-arm64" }
- { os: macos-latest, rids: "osx-x64 osx-arm64" }
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Python
uses: actions/setup-python@v3
- uses: actions/setup-python@v5
with:
python-version: '3.x'
- name: Setup .NET
uses: actions/setup-dotnet@v3
python-version: '3.11'
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '8.0.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install hatch hatchling
- name: Build package
run: hatch build
- name: Publish package
- run: pip install build hatchling
- name: Build per-platform engine wheels
shell: bash
run: |
hatch publish -u "__token__" -a ${{ secrets.PYPI_API_TOKEN }}
for rid in ${{ matrix.rids }}; do
python build_differ.py "$rid"
python -m build --wheel --no-isolation packages/ooxmlpowertools --outdir dist
python -m build --wheel --no-isolation packages/docxodus --outdir dist
done
- uses: actions/upload-artifact@v4
with:
name: dist-${{ matrix.os }}
path: dist/*

publish:
needs: [build-core, build-engine-wheels]
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v4
with:
path: dist
merge-multiple: true
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install twine
- name: Check distributions
run: twine check dist/*
- name: Publish to PyPI
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: twine upload dist/*
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ csproj/obj/*
docxodus/**/bin/*
docxodus/**/obj/*

# Engine binary archives (built by build_differ.py, embedded in wheels by CI)
packages/*/src/*/_binaries/*.tar.gz
packages/*/src/*/_binaries/*.zip

# C extensions
*.so

Expand Down
108 changes: 71 additions & 37 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,68 +4,102 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Project Overview

Python-Redlines is a Python wrapper around compiled C# binaries that generate `.docx` redline/tracked-changes documents by comparing two Word files. The Python layer handles platform detection, binary extraction, temp file management, and subprocess execution.
Python-Redlines generates `.docx` redline/tracked-changes documents by comparing two Word files. A pure-Python wrapper drives compiled C# (.NET 8) engine binaries; the Python layer handles platform detection, binary extraction, temp file management, and subprocess execution.

Two comparison engines are available:
- **XmlPowerToolsEngine** — wraps Open-XML-PowerTools WmlComparer (original engine)
- **DocxodusEngine** — wraps Docxodus, a modernized .NET 8.0 fork with better move detection

## Commands

```bash
# Run tests
hatch run test
## Monorepo structure — three published packages

# Run a single test
hatch run test tests/test_openxml_differ.py::test_run_redlines_with_real_files
This repo publishes **three** PyPI packages, each with its own `pyproject.toml` under `packages/`:

# Run tests with coverage
hatch run cov
| Directory | PyPI name | Contents | Wheel |
|---|---|---|---|
| `packages/core` | `python-redlines` | Pure-Python wrapper (`engines.py`) | `py3-none-any` |
| `packages/ooxmlpowertools` | `python-redlines-ooxmlpowertools` | Open-XML-PowerTools binary | per-platform |
| `packages/docxodus` | `python-redlines-docxodus` | Docxodus binary | per-platform |

# Type checking
hatch run types:check
Engine binaries are **optional dependencies**. Users install an engine via an extra:
`pip install python-redlines[docxodus]`, `[ooxmlpowertools]`, or `[all]`. The core
package has no binaries; each binary package ships one platform's compiled binary as a
prebuilt wheel, so end users never compile anything.

# Build C# binaries for all platforms (requires .NET 8.0 SDK)
hatch run build
The repo root is **not** an installable project — its `pyproject.toml` holds only
shared pytest/coverage config.

# Build Python package (triggers C# build via custom hook)
hatch build
## Commands

# Initialize Docxodus submodule (required before building)
```bash
# Initialize the Docxodus submodule (required before building its engine)
git submodule update --init --recursive
```

## Architecture
# Build engine binaries for one or more platforms (requires .NET 8.0 SDK).
# RIDs: linux-x64 linux-arm64 win-x64 win-arm64 osx-x64 osx-arm64
python build_differ.py linux-x64
python build_differ.py --all

# Install all three packages editable for development
pip install -e packages/core -e packages/ooxmlpowertools -e packages/docxodus pytest

The system uses a two-layer wrapper pattern with a shared base class:
# Run tests (from repo root)
python -m pytest tests/
python -m pytest tests/test_openxml_differ.py::test_run_redlines_with_real_files

1. **Python layer** (`src/python_redlines/engines.py`):
- `BaseEngine` — shared logic for binary extraction, subprocess invocation, and temp file management
- `XmlPowerToolsEngine(BaseEngine)` — sets constants for the Open-XML-PowerTools binary (`dist/`, `bin/`, `redlines`)
- `DocxodusEngine(BaseEngine)` — sets constants for the Docxodus binary (`dist_docxodus/`, `bin_docxodus/`, `redline`)
# Build a package wheel
python -m build packages/core
python -m build --wheel packages/docxodus # needs an archive in _binaries/ first
```

Both engines expose `run_redline(author_tag, original, modified, **kwargs)`. `DocxodusEngine` overrides `_build_command()` to translate kwargs (e.g. `detect_moves`, `detail_threshold`) into CLI flags for the Docxodus binary. `XmlPowerToolsEngine` uses the legacy 4-positional-arg format and ignores kwargs.
## Architecture

2. **C# binaries**:
1. **Core Python layer** (`packages/core/src/python_redlines/engines.py`):
- `BaseEngine` — locates the engine binary in its companion package via
`importlib.resources`, extracts the platform archive once into a writable
user cache dir (`platformdirs.user_cache_dir`), and runs it via subprocess.
- `XmlPowerToolsEngine` / `DocxodusEngine` — subclasses declaring `BINARY_PACKAGE`,
`BINARY_BASE_NAME`, and `EXTRA_NAME`.
- `EngineNotInstalledError` — raised on instantiation if the companion binary
package is missing, with the `pip install` command to fix it.

Both engines expose `run_redline(author_tag, original, modified, **kwargs)`.
`DocxodusEngine` overrides `_build_command()` to translate kwargs (e.g. `detect_moves`,
`detail_threshold`) into CLI flags. `XmlPowerToolsEngine` uses the legacy
4-positional-arg format and ignores kwargs.

2. **Binary packages** ship one platform archive under
`src/<pkg>/_binaries/<rid>.tar.gz` (or `.zip` for Windows). The archive is
gitignored; CI builds it. The hatchling build hook `hatch_build.py` reads which
RID archive is present and stamps the wheel's platform tag accordingly.

3. **C# sources**:
- `csproj/Program.cs` — Open-XML-PowerTools CLI tool
- `docxodus/tools/redline/Program.cs` — Docxodus CLI tool (git submodule)

Pre-compiled binaries for 6 platform targets (linux/win/osx x x64/arm64) are stored as archives in `src/python_redlines/dist/` and `src/python_redlines/dist_docxodus/`, included in the wheel. The build script `build_differ.py` compiles both engines using `dotnet publish`.
`build_differ.py` compiles an engine for a given RID with `dotnet publish` and
writes a single flat archive into the corresponding binary package's `_binaries/`.

## Build & release flow

- A binary-package wheel must contain **exactly one** platform archive. Each
`build_differ.py <rid>` invocation clears old archives, so build one RID, build
the wheel, repeat.
- `.github/workflows/ci.yml` — tests on each OS (native RID) + builds all wheels.
- `.github/workflows/python-publish.yml` — on release, builds per-platform engine
wheels across 3 OS runners, the core sdist+wheel, and publishes all three packages.

## Key Files
## Version management

- `src/python_redlines/engines.py` — BaseEngine, XmlPowerToolsEngine, and DocxodusEngine classes
- `src/python_redlines/__init__.py` — Exports all engine classes
- `src/python_redlines/__about__.py` — Single source of truth for package version
- `csproj/Program.cs` — Open-XML-PowerTools C# comparison utility
- `docxodus/` — Docxodus git submodule (tools/redline/ contains the CLI)
- `build_differ.py` — Cross-platform C# build orchestration for both engines
- `hatch_run_build_hook.py` — Hatch build hook that triggers C# compilation
- `tests/fixtures/` — Test `.docx` files (original, modified, expected_redline)
`packages/core/src/python_redlines/__about__.py` is the single source of truth.
The two binary packages read it via `[tool.hatch.version] path = "../core/..."`,
so all three always share one version. Bump only that file.

## Testing Notes

Tests must be run from the project root (fixtures use relative paths like `tests/fixtures/original.docx`). The XmlPowerToolsEngine integration test validates that comparing the fixture documents produces exactly 9 revisions. Docxodus uses a different stdout format (`"revision(s) found"` vs `"Revisions found: 9"`).
Tests live in repo-root `tests/` and must be run from the repo root (fixtures use
relative paths like `tests/fixtures/original.docx`). They require all three packages
installed and the binaries built for the current platform. The XmlPowerToolsEngine
integration test validates exactly 9 revisions on the fixture documents.

## Stdout Format Differences

Expand Down
Loading
Loading