Skip to content

djbertolo/tant-programming-language

Repository files navigation

Tant

Tant is a statically-typed, compiled programming language with first-class information-flow control. Taint qualifiers are part of the type system — the compiler statically proves that tainted data cannot flow into clean sinks, and that implicit flows through control-flow branches are tracked correctly.

The compiler is written in C23 (~3 months of independent work). The novel portion of the project, the type system and static analysis pipeline, is complete. Code generation is planned for a future phase. A bytecode compiler and serializer was implemented, but project goals have since shifted and will not be continued. Existing work can be seen in /bytecode-implementation.


Why Tant exists

Most languages treat security as a runtime concern: sanitize inputs, validate at the boundary, hope nothing slips through. Tant makes information-flow a compile-time guarantee. If the program compiles, the taint policy is provably enforced — no runtime checks required.

This is the core idea behind academic languages like Jif and FlowCaml. Tant brings that into a practical compiled language with a clean syntax.


The Taint System

The Lattice

Every value in Tant carries a qualifier drawn from a three-level security lattice:

clean  ≤  agnostic  ≤  tainted
  • clean — data that is verified, safe to use anywhere
  • agnostic — data of unknown or unverified origin (default)
  • tainted — data that is explicitly untrusted (e.g., user input, external sources)

The ordering is strict: a tainted value can never flow into a clean sink. A clean value can flow into anything.

Explicit Flow (Direct Assignment)

The compiler enforces the lattice at every assignment, declaration, and function call boundary.

// ACCEPTED: clean flows into agnostic — clean ≤ agnostic.
process: (val: num agnostic) -> num agnostic {
    return val;
}

const input: num clean = 7;
mut result: num agnostic = process(input);
// REJECTED: tainted flows into clean — tainted ≰ clean.
mut user_input: num tainted = 42;
const safe_value: num clean = user_input;
// Error: Taint violation: cannot initialize clean variable 'safe_value' with tainted value.

Implicit Flow (Control-Flow Tracking)

This is where most taint systems fail. Consider:

mut secret: num tainted = 1;
mut output: num clean = 0;

secret == 1 => {
    output = 99;  // output's value depends on secret — implicit flow
}

Tant tracks a program counter qualifier through every branch. When execution enters a block guarded by a tainted condition, all assignments inside that block are treated as if they carry that taint — even if the assigned value is a literal. The above is rejected:

Error: Taint violation: cannot assign tainted value to clean variable 'output'.

This catches the class of implicit channel attacks that explicit-flow-only systems miss entirely.

Taint Propagation in Expressions

Taint is viral through binary and logical operations. The qualifier of a compound expression is the join (least upper bound) of its operands:

mut a: num clean   = 1;
mut b: num tainted = 2;
mut c: num agnostic = a + b;  // join(clean, tainted) = tainted — REJECTED
                               // tainted ≰ agnostic

Language Syntax

Tant uses => for conditionals (implies) and !=> for else (not-implies), making the logical structure of branching explicit. This syntax was inspired by propositional logic and implication statements.

// Variable declarations
mut x: num agnostic = 10;
const name: string clean = "tant";

// Multi-binding declaration
mut a: num agnostic, mut b: num agnostic = 1, 2;

// Functions: name: (params) -> return_type qualifier { body }
add: (x: num clean, y: num clean) -> num clean {
    return x + y;
}

// Conditionals
x > 5 => {
    x = x - 1;
} !=> {
    x = x + 1;
}

// While loops
x > 0 => {  // while x > 0
    x = x - 1;
}

Compiler Architecture

The compiler is a single-pass pipeline with a two-pass semantic analysis phase:

Source text
    │
    ▼
Scanner          Hand-written lexer. Produces a flat TokenStream.
    │            Zero heap allocations per token — tokens are
    │            pointer-length pairs into the source buffer.
    ▼
Parser           Pratt parser for expressions (precedence climbing).
    │            AST nodes are allocated from a chunked arena —
    │            no per-node malloc, cache-friendly layout.
    ▼
Semantic         Two-pass analyzer.
Analyzer         Pass 1: Hoist all global function declarations.
    │            Pass 2: Full traversal — type checking, scope
    │            resolution, taint enforcement, PC tracking.
    ▼
[TAC generation → RISC-V emission — planned]

Design decisions worth noting

Arena allocator for AST nodes. Rather than malloc-ing each node individually, the parser allocates from 1024-node chunks. This keeps the AST cache-local and makes cleanup a single free-walk rather than a recursive destructor.

Taint as a lattice, not a boolean. Most taint implementations are binary (tainted/untainted). Tant's three-level lattice (clean/agnostic/tainted) allows gradated policies — a function can accept agnostic inputs without accepting fully tainted ones. The join and leq operations are the formal lattice operations, not ad-hoc logic.

Program counter propagation. The program_counter token threaded through analyze_node represents the implicit taint of the current execution context. Every branch that checks a tainted condition raises the PC for its body, infecting all assignments regardless of the assigned value's own qualifier.

Clean header separation. Public API headers (include/) expose only what callers need. Internal headers (src/*/parser_internal.h, src/semantic/semantic_internal.h) expose implementation details only within their subsystem.


Building

make
./tant path/to/file.tant

Compiled with -Wall -Wextra -Werror -pedantic. AddressSanitizer and UBSan are enabled in debug builds.

The test suite is organized as explicit accept/ and reject/ directories — programs that must compile cleanly and programs that must be rejected with specific errors.


Status

Phase Status
Scanner ✅ Complete
Pratt Parser ✅ Complete
Semantic Analysis (types, scopes, taint) ✅ Complete
TAC Generation 📋 Planned
RISC-V Emission 📋 Planned

Further reading

The taint system is directly influenced by the information-flow control literature:

About

A statically-typed compiler language that enforces explicit and implicit data flow security at compile-time.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages