AI-friendly semantic markup for source code contracts. Methodology + reference validator + per-language profiles. Origin: bsdOS, extracted 2026-06-13.
sema is a methodology for placing structured, natural-language
descriptions of what code does and why directly inside source files.
The two simultaneous goals:
- Top-down. The markup gives the LLM a hidden plan before it writes code. When it reads a function contract before the body, it generates more accurate code because it is in the native format of the training distribution.
- Bottom-up. The markup gives RAG agents stable semantic coordinates for navigating a codebase. No need for vector embeddings to find "the validation function" — the agent finds it by anchor.
The markup does not replace code, does not duplicate code, does not explain code to humans. It is built into the code and exists for the mechanics of LLM work with context.
Eight independent foundations (full text in spec/sema.md §2):
- Sparse attention. LLM attention is non-uniform. Structured markers give stable anchors.
- RAG anchors. Markers survive index rebuilds; embeddings don't.
- Scope-lock.
START_X/END_Xmarkers make scope explicit, prevent "fix the deliberate decision" bugs. - SFT distribution. Contracts before bodies match the (prompt, response) training template.
- Structure recovery. Hierarchical markers recover module structure from flat token streams.
- Semantic accumulators. Contracts at the top of files accumulate as the agent reads.
- Diff-patch anchoring.
START_X/END_Xsurvive rebases; JSON}doesn't. - Distillation. Contracts in code distill "why" knowledge into the artifact itself.
These are not best-practices to choose between. They are structural responses to a structural problem (sparse attention, context loss between sessions, regression to mean). They work because the LLM is what it is, not because of a clever trick.
sema/
├── spec/
│ └── sema.md 1430 lines, full theory + rule set + golden samples
├── profiles/
│ ├── bsdOS.md 228 lines, operating profile for bsdOS
│ └── README.md how to write a profile for your project
├── tools/
│ ├── sema-check.sh 71 lines, POSIX sh validator
│ └── README.md install + CI integration
├── examples/ golden samples per language
│ ├── rust/
│ └── zig/
├── .github/
│ └── workflows/
│ └── check.yml CI: run sema-check on PR
├── README.md this file
├── CHANGELOG.md
└── LICENSE MIT
The 1430-line spec is the source of truth. Profiles adapt it to specific
projects (which files to mark, which functions get full vs compressed
contracts, naming conventions). The tool validates START_X / END_X
anchor pairing.
# Either
cat spec/sema.md
# Or load it into your agent's context:
# (Claude Code, Cursor, etc. — depends on your tool)The spec is structured so a human can read it top-to-bottom in 30-40 minutes, or an LLM can read it as rules and apply them.
If you have a profile (e.g., profiles/bsdOS.md),
read it. It tells you:
- which files in YOUR project need contracts
- which functions get full vs compressed forms
- naming conventions for anchors
- the project's tolerance for mass-annotation
If you don't have a profile, follow the spec's "Minimum Contract Fields" (§3) and write a profile as you go.
cp tools/sema-check.sh /usr/local/bin/sema-check
chmod +x /usr/local/bin/sema-checkOr invoke it directly:
./tools/sema-check.shAdd to your CI (see .github/workflows/check.yml for an example).
Annotation-first. Before you write a function, write its contract. The contract says what the function does, what it takes, what it returns, and what side effects it has. Then write the body.
// function_name:start
// purpose: ...
// input: ...
// output: ...
// sideEffects: ...
fn function_name(...) -> ... { ... }
// function_name:endOr for trivial helpers:
// CONTRACT: parse → validate → store
fn helper() { ... }Or for one-liners:
// ensureThreadState: creates thread if missing
fn ensure_thread_state() { ... }The spec §6 explains when to use which form.
The spec is language-agnostic. The bsdOS profile is for one specific Rust + Zig codebase. To use sema in YOUR project:
- Read
spec/sema.md§6 (operational rules). - Read
profiles/README.md(how to write a profile). - Write your profile in
profiles/<your-project>.md. - Add golden samples in
examples/<your-language>/from your codebase. - Run the validator in CI.
This repository accepts profiles via PR. See
CONTRIBUTING.md for the contribution flow.
This methodology was developed as part of the bsdOS
project (privacy-first mobile OS on FreeBSD 15.1, June 2026). The
bsdOS profile (profiles/bsdOS.md) is the original and most-tested
application. After 100% of bsdOS source files (97/97) reached full
contract coverage in June 2026, the methodology was extracted to a
standalone repository.
Citation: see CITATION.cff (TODO: add before v1.0 publish).
MIT. See LICENSE.
- Spec:
spec/sema.md - bsdOS profile:
profiles/bsdOS.md - Validator:
tools/sema-check.sh - bsdOS (the project this came from): https://github.com/bzdos/sema