Tant is a statically-typed, compiled programming language with first-class information-flow control. Taint qualifiers are part of the type system — the compiler statically proves that tainted data cannot flow into clean sinks, and that implicit flows through control-flow branches are tracked correctly.
The compiler is written in C23 (~3 months of independent work). The novel portion of the project, the type system and static analysis pipeline, is complete. Code generation is planned for a future phase. A bytecode compiler and serializer was implemented, but project goals have since shifted and will not be continued. Existing work can be seen in /bytecode-implementation.
Most languages treat security as a runtime concern: sanitize inputs, validate at the boundary, hope nothing slips through. Tant makes information-flow a compile-time guarantee. If the program compiles, the taint policy is provably enforced — no runtime checks required.
This is the core idea behind academic languages like Jif and FlowCaml. Tant brings that into a practical compiled language with a clean syntax.
Every value in Tant carries a qualifier drawn from a three-level security lattice:
clean ≤ agnostic ≤ tainted
clean— data that is verified, safe to use anywhereagnostic— data of unknown or unverified origin (default)tainted— data that is explicitly untrusted (e.g., user input, external sources)
The ordering is strict: a tainted value can never flow into a clean sink. A clean value can flow into anything.
The compiler enforces the lattice at every assignment, declaration, and function call boundary.
// ACCEPTED: clean flows into agnostic — clean ≤ agnostic.
process: (val: num agnostic) -> num agnostic {
return val;
}
const input: num clean = 7;
mut result: num agnostic = process(input);
// REJECTED: tainted flows into clean — tainted ≰ clean.
mut user_input: num tainted = 42;
const safe_value: num clean = user_input;
// Error: Taint violation: cannot initialize clean variable 'safe_value' with tainted value.
This is where most taint systems fail. Consider:
mut secret: num tainted = 1;
mut output: num clean = 0;
secret == 1 => {
output = 99; // output's value depends on secret — implicit flow
}
Tant tracks a program counter qualifier through every branch. When execution enters a block guarded by a tainted condition, all assignments inside that block are treated as if they carry that taint — even if the assigned value is a literal. The above is rejected:
Error: Taint violation: cannot assign tainted value to clean variable 'output'.
This catches the class of implicit channel attacks that explicit-flow-only systems miss entirely.
Taint is viral through binary and logical operations. The qualifier of a compound expression is the join (least upper bound) of its operands:
mut a: num clean = 1;
mut b: num tainted = 2;
mut c: num agnostic = a + b; // join(clean, tainted) = tainted — REJECTED
// tainted ≰ agnostic
Tant uses => for conditionals (implies) and !=> for else (not-implies), making the logical structure of branching explicit. This syntax was inspired by propositional logic and implication statements.
// Variable declarations
mut x: num agnostic = 10;
const name: string clean = "tant";
// Multi-binding declaration
mut a: num agnostic, mut b: num agnostic = 1, 2;
// Functions: name: (params) -> return_type qualifier { body }
add: (x: num clean, y: num clean) -> num clean {
return x + y;
}
// Conditionals
x > 5 => {
x = x - 1;
} !=> {
x = x + 1;
}
// While loops
x > 0 => { // while x > 0
x = x - 1;
}
The compiler is a single-pass pipeline with a two-pass semantic analysis phase:
Source text
│
▼
Scanner Hand-written lexer. Produces a flat TokenStream.
│ Zero heap allocations per token — tokens are
│ pointer-length pairs into the source buffer.
▼
Parser Pratt parser for expressions (precedence climbing).
│ AST nodes are allocated from a chunked arena —
│ no per-node malloc, cache-friendly layout.
▼
Semantic Two-pass analyzer.
Analyzer Pass 1: Hoist all global function declarations.
│ Pass 2: Full traversal — type checking, scope
│ resolution, taint enforcement, PC tracking.
▼
[TAC generation → RISC-V emission — planned]
Arena allocator for AST nodes. Rather than malloc-ing each node individually, the parser allocates from 1024-node chunks. This keeps the AST cache-local and makes cleanup a single free-walk rather than a recursive destructor.
Taint as a lattice, not a boolean. Most taint implementations are binary (tainted/untainted). Tant's three-level lattice (clean/agnostic/tainted) allows gradated policies — a function can accept agnostic inputs without accepting fully tainted ones. The join and leq operations are the formal lattice operations, not ad-hoc logic.
Program counter propagation. The program_counter token threaded through analyze_node represents the implicit taint of the current execution context. Every branch that checks a tainted condition raises the PC for its body, infecting all assignments regardless of the assigned value's own qualifier.
Clean header separation. Public API headers (include/) expose only what callers need. Internal headers (src/*/parser_internal.h, src/semantic/semantic_internal.h) expose implementation details only within their subsystem.
make
./tant path/to/file.tantCompiled with -Wall -Wextra -Werror -pedantic. AddressSanitizer and UBSan are enabled in debug builds.
The test suite is organized as explicit accept/ and reject/ directories — programs that must compile cleanly and programs that must be rejected with specific errors.
| Phase | Status |
|---|---|
| Scanner | ✅ Complete |
| Pratt Parser | ✅ Complete |
| Semantic Analysis (types, scopes, taint) | ✅ Complete |
| TAC Generation | 📋 Planned |
| RISC-V Emission | 📋 Planned |
The taint system is directly influenced by the information-flow control literature:
- Sabelfeld & Myers, Language-Based Information-Flow Security (2003) — the foundational survey
- Volpano, Smith & Irvine, A Sound Type System for Secure Flow Analysis (1996) — formal basis for PC-style implicit flow tracking