Universal Code Database

UCDB converts legal documents into a portable SQLite database with Akoma Ntoso / LegalDocML as the canonical XML format.

The database stores canonical Akoma Ntoso XML per expression and derives query, diff, blame, search, and downstream data surfaces from normalized node tables.

See docs/architecture/020-akoma-ntoso-design.md for the design rationale.

Pipeline

source document or Akoma Ntoso XML
-> normalized legal model
-> Akoma Ntoso / LegalDocML XML
-> SQLite works + expressions + nodes
-> diff / blame / search / exports

The storage artifact is a single SQLite database. It includes:

source hashes and processing provenance;
canonical Akoma Ntoso XML and canonical hash;
normalized structural nodes with eId, num, heading, text, XML fragment, text hashes, hierarchy, and document ordering;
revision summaries and node-level changes;
line-level provenance for ucdb query blame;
FTS5 trigram search for CJK-friendly substring matching;
placeholders for RAG chunks and reproducible exports.

Install

pip install ucdb
# or
uv tool install ucdb

For local development:

uv sync
uv run ucdb --help

Quick Start

ucdb init

# Import canonical Akoma Ntoso XML produced elsewhere.
ucdb import-akn ./law.xml --work-id civil-code --version 2026-04-29 --no-schema

ucdb query works
ucdb query expressions civil-code
ucdb query nodes 1
ucdb query search "契約" --work-id civil-code
ucdb query akn 1

AI-assisted processing is still available for PDF/DOCX/ODT/TXT/Markdown inputs:

export OPENAI_API_KEY=sk-...
ucdb process ./input

Input repositories are scanned as:

./input/<work-id>/<version-label>/<document>.{pdf,docx,odt,txt,md}

Configuration

Environment variable	Purpose	Default
`UCDB_DB`	Default SQLite path	`ucdb.sqlite3`
`OPENAI_API_KEY`	API key for AI-assisted normalization	required for `process`
`OPENAI_BASE_URL`	OpenAI-compatible endpoint	OpenAI default
`UCDB_MODEL`	Model used for structured extraction	`gpt-5.4-mini`
`UCDB_AKN_XSD`	Optional Akoma Ntoso XSD path for strict validation	off
`UCDB_JSON`	Emit JSON summaries for process/import commands	off

Main Commands

ucdb init
ucdb scan <root>
ucdb process <root>
ucdb process-one <file> --work-id ... --version ...
ucdb import-akn <xml> --work-id ... --version ...
ucdb export json <expression-id>
ucdb export rag <expression-id>
ucdb export markdown <expression-id>
ucdb export html <expression-id>
ucdb serve

ucdb query works
ucdb query expressions <work-id>
ucdb query nodes <expression-id>
ucdb query node <node-id> [--xml]
ucdb query search <text> [--work-id ...]
ucdb query akn <expression-id>
ucdb query revisions <work-id>
ucdb query revision <revision-id>
ucdb query diff <change-id>
ucdb query diff-expressions <work-id> --from <v1> --to <v2> [--node-eid ...]
ucdb query blame <work-id> <node-eid> [--version ...]
ucdb query history <work-id> <node-eid>
ucdb query log

Core Modules

model.py       normalized LegalDocument / LegalNode dataclasses
akn.py         Akoma Ntoso parser, serializer, and validation helpers
tw_profile.py  Taiwan profile helpers and identifier normalization
db.py          SQLite schema and data access
ingest.py      Akoma Ntoso XML -> normalized tables
ai.py          OpenAI-compatible extraction -> normalized model -> AKN XML
process.py     pipeline orchestration
revisions.py   structural node diff engine
blame.py       line-level provenance
web.py         read-only browser data layer and HTTP server
exporters.py   JSON, RAG JSONL, Markdown, and HTML renderers

Development

Run the current no-network regression tests:

uv run python tests/test_history.py
uv run python tests/test_web.py

The fixture imports ten Akoma Ntoso snapshots and verifies search, arbitrary version-pair diff, repeal/re-enactment handling, line blame, history, and web store queries.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
docs/architecture		docs/architecture
src/ucdb		src/ucdb
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
USAGE.md		USAGE.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universal Code Database

Pipeline

Install

Quick Start

Configuration

Main Commands

Core Modules

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Universal Code Database

Pipeline

Install

Quick Start

Configuration

Main Commands

Core Modules

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages