CodeAxiom

https://codeaxiom.avixosec.xyz | contact@avixosec.xyz

CodeAxiom trains coding models through executable verification.

The goal is to train a specialized coding model that writes, edits, tests, and repairs code. The model does not exist yet. This repository is the public plan, verifier demo, and GPU-credit package for the first training sprint.

The current priority is GPU credit outreach. No training run is planned until credits, trial access, or an approved budget is available.

The training direction is simple:

task -> code or patch -> compile -> run tests -> inspect failure -> improve -> verify

The first serious training phase needs B200-class GPU compute. A small self-funded card budget is reserved for Modal access tests only if it helps close larger credits.

Current state

This repository includes:

local verified-run demo
YAML task format
noop backend
prepared-patch backend
local toy repository
result, log, patch, report, plan, and trace artifacts
benchmark and compute plans
GPU-credit funding brief
public site
CI for demo tasks

The current runner is local and offline. It is not a trained model and it does not call model APIs.

Why executable verification

Raw code data teaches syntax and patterns.

CodeAxiom is built around feedback from execution:

compiler result
unit tests
patch apply result
runtime errors
repository-level checks
benchmark harnesses
failure repair attempts

A generated answer is not enough. A code model should be measured by code that runs.

Target benchmarks

CodeAxiom targets coding benchmarks as evaluation goals, not claimed results.

Benchmark	What it tests
LiveCodeBench	fresh algorithmic coding tasks
HumanEval+	Python function correctness
MBPP+	small Python programs with stronger tests
MultiPL-E	multilingual code generation
Aider Code Editing Benchmark	editing existing files
RepoQA	long-context repository understanding
SWE-bench Verified	real repository bug fixing
SWE-agent style tasks	agentic edit and shell loop
SWE-bench Long or SWE-bench Pro	long-horizon software engineering

See docs/benchmark-plan.md and eval/benchmark-matrix.md.

Training stack

The planned training stack:

Baseline evaluation of the selected base model.
Core SFT on public and licensed coding data.
Edit SFT for patches and existing-file changes.
Multilingual compiler loop for Python, JavaScript, TypeScript, Java, C++, Go, and Rust.
Long repository training for RepoQA and SWE-style tasks.
Execution feedback training from compiler and test failures.
Patch search with many candidate fixes.
CodeWorldModel for patch pass prediction.
Verifier-guided RL on narrow task families.
One-shot distillation from verified search winners.

See docs/training-roadmap.md.

GPU credit sprint

The first target is to get GPU credits in the first week, ideally within 48 hours.

Fast starter ask:

$5k to $10k GPU credits

Direct hardware ask:

2 to 4 B200 GPUs for 3 to 7 days

Fallback for a first proof checkpoint:

H100, H200, GB200, A100 80GB, or equivalent GPU credits

See docs/compute-plan.md and docs/funding-brief.md.

Card budget

A $30 card budget is not enough for meaningful model training.

Use it only on Modal, and only for:

account verification
trial activation
short GPU access checks
a tiny smoke run if it supports a credit application

With the listed prices, $30 buys about:

11.9 hours on A100 80GB
7.6 hours on H100
6.6 hours on H200
4.8 hours on B200

The current plan is to preserve card budget, use it only on Modal if needed, and focus on GPU credits first.

Local verifier demo

Install dependencies:

python -m pip install -r requirements.txt

Run a task that should stay failed:

python agent/run_task.py examples/tasks/toy_noop.yaml

Run a task that should pass after a prepared patch:

python agent/run_task.py examples/tasks/toy_fix.yaml

On Windows:

py -m pip install -r requirements.txt
py .\agent\run_task.py .\examples\tasks\toy_noop.yaml
py .\agent\run_task.py .\examples\tasks\toy_fix.yaml

Expected result:

toy_noop -> failed
toy_fix -> passed

Each run writes:

artifacts/runs/<run_id>.json
artifacts/logs/<run_id>.log
artifacts/patches/<run_id>.patch
artifacts/reports/<run_id>.md
artifacts/plans/<run_id>.md
artifacts/traces/<run_id>.jsonl
artifacts/tmp/<run_id>/

Generated artifacts are ignored by Git.

Data policy

The first training phase uses public and licensed datasets. Private user code is not training data by default.

Benchmark data must stay separate from training data. Every public score needs a contamination note.

See docs/data-policy.md and docs/contamination-policy.md.

Files worth reading

ONE_PAGER.md                               short funding summary
docs/codeaxiom-model-training-plan.md      implementation handoff plan
docs/benchmark-plan.md                     target benchmark plan
docs/compute-plan.md                       GPU credit and $30 smoke-run plan
docs/contamination-policy.md               benchmark leakage rules
docs/funding-brief.md                      GPU credit application brief
docs/training-roadmap.md                   model training stages
docs/architecture.md                       system shape
docs/roadmap.md                            fast execution roadmap
docs/limitations.md                        current limits
eval/benchmark-matrix.md                   benchmark matrix
agent/README.md                            verifier demo notes
examples/tasks/                            demo task files
site/                                      static public site

Scope

CodeAxiom is not a trained model yet. It is a public training plan, verifier demo, and GPU-credit package.

No benchmark score is claimed before evaluation.

Contact

contact@avixosec.xyz

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeAxiom

Current state

Why executable verification

Target benchmarks

Training stack

GPU credit sprint

Card budget

Local verifier demo

Data policy

Files worth reading

Scope

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
agent		agent
artifacts		artifacts
docs		docs
eval		eval
examples		examples
scripts		scripts
site		site
training		training
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
ONE_PAGER.md		ONE_PAGER.md
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CodeAxiom

Current state

Why executable verification

Target benchmarks

Training stack

GPU credit sprint

Card budget

Local verifier demo

Data policy

Files worth reading

Scope

Contact

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages