cosmo-regulus

Adaptive radiation-tolerance scheduling and an economic Pareto curve for commercial-GPU LLM inference under measured space-radiation dose.

What this is

A small Python library and CLI that answers two questions, quantitatively:

Economic Pareto question. For a given orbit / surface location, shielding mass, and target output quality, what combination of replica count and weight-scrubbing rate minimizes $/M-tokens?
Adaptive scheduling question. Given a live (or simulated) particle-flux signal, what policy of detection + recovery primitives keeps quality above threshold X with throughput cost below Y%, across both quiet sun and SEP events?

The output is a curve and a controller, not a claim. Engineers argue with curves.

What this is not

Not a from-scratch LLM-fault-tolerance framework. That ground is already well covered (see docs/prior-art.md); this project is an additive layer.
Not a proof of flight-grade radiation tolerance. Software fault injection grounded in published beam-test cross-sections and measured surface-dose data is not the same as a real beam test on the actual hardware.
Not a competitor to ReaLM, SAVE, RedNet, or Suncatcher. It cites and depends conceptually on all of them.

Contributions (what is genuinely new)

#	Contribution	Why it isn't already done
1	Economic Pareto curve linking shielding mass × replica count × scrubbing rate → $/M-tokens at iso-quality, parameterized by orbit/surface location.	No published work draws this curve. Industry pieces (Suncatcher economics, Introl/SpaceInvestments reports) discuss launch economics; academic work doesn't connect that to the tolerance-knob trade-off.
2	Adaptive scheduler that reads a live (simulated) particle-flux signal and dials tolerance primitives — replica count, scrub interval, range guards, voting quorum — in real time. SEP-event-mode escalation.	Primitives exist piecemeal in literature (ATTNChecker, ReaLM, FT-Transformer); the controller that turns environment data into a real-time tolerance policy is unbuilt.

What this builds on (cited foundations)

Source	Role
ReaLM — Xie et al., DAC 2025. arXiv:2503.24053, code (MIT)	LLM-inference fault-model methodology. Cited, not forked (scope mismatch: ReaLM assumes ASIC error detection, we assume plain commercial GPUs).
SAVE — Zheng et al., USENIX ATC 2025. USENIX	Closest hardware target: software-only fault tolerance on commodity GPUs. Methodology reference.
RedNet — Wang, Qiu et al., 2024. arXiv:2407.11853	Closest published space-environment → AI-inference bridge (DNN, not LLM).
Google Project Suncatcher — Nov 2025. paper	Empirical TPU + AMD-host beam-test data; sanity check on our HBM cross-section assumptions.
Chang'E-4 LND — Zhang et al., Science Advances 2020. PMC	Primary anchor. First time-resolved dose-rate measurement on the lunar surface (~116 mGy(Si)/yr unshielded). Every downstream λ_SEU number is traceable to this measurement, not to CREME96 extrapolation.
LRO CRaTER — Mazur et al., Space Weather 2011. AGU	Secondary anchor; cross-checks Chang'E-4 within ~15%. Orbital-comparison branch.

More detail on what we cite and what we deliberately don't do: docs/prior-art.md.

Quickstart

# Install (editable, with dev extras)
pip install -e ".[dev]"

# Verify install
cosmo-regulus --version

# Compute the first-cut Pareto curve for the lunar polar baseline
cosmo-regulus pareto --site connecting-ridge --quality 0.95

Output (writes experiments/01-pareto-baseline/result.png + points.csv):

3 Pareto-optimal points (of 84 evaluated) at quality >= 0.95:

 shielding_cm  replicas   scrub_h   quality     shield_$      $/M-tok
---------------------------------------------------------------------
          100         1     168.0    0.9627         2000         0.15
           50         2     168.0    0.9755         1000         0.30
           25         3     168.0    0.9723          500         0.46

First-cut numbers, anchored on Zhang 2020 Chang'E-4 LND dose data; cross-section and several rate constants are planning placeholders. See docs/results.md for the full readout, the assumptions ledger, and what shifts in v1.

The simulate subcommand (adaptive scheduler) and env validate (LND-data reproduction) are still scaffolded -- they will return "not yet implemented."

Repository layout

cosmo-regulus/
├── README.md                              ← this file
├── LICENSE                                ← Apache-2.0
├── pyproject.toml
├── docs/
│   ├── architecture.md                    what's added, how it depends on others
│   ├── prior-art.md                       what we cite and what we deliberately don't do
│   └── limitations.md                     page-1 honest list
├── src/cosmo_regulus/
│   ├── env/                               measured space dose → per-GPU λ_SEU
│   │   ├── change4_lnd.py                 Chang'E-4 LND ground-truth parsing
│   │   ├── crater.py                      LRO CRaTER cross-check
│   │   ├── shielding.py                   regolith / Al-equivalent attenuation
│   │   └── seu_rate.py                    (env, shielding, SKU) → λ_SEU
│   ├── policy/                            ← contribution #2: adaptive scheduler
│   │   ├── adaptive.py                    the controller
│   │   ├── primitives.py                  detection + recovery primitive interfaces
│   │   └── sep_event.py                   burst-mode escalation
│   ├── economic/                          ← contribution #1: economic Pareto
│   │   ├── pareto.py                      the headline curve builder
│   │   ├── tokens_per_dollar.py           cost-per-M-tokens model
│   │   └── shielding_mass.py              kg-of-shielding → launched cost
│   └── cli.py
└── tests/
    └── test_smoke.py

Limitations (read this first)

Simulation, not flight test. Software bit injection grounded in published cross-sections is not radiation. A real beam test on the target GPU can shift any number by factors.
HBM only. Compute-unit transient faults (Tensor Cores, register files, instruction cache) not modeled in v0.
Latch-up not addressed. Hardware-mitigation problem; assumed solved upstream.

Full list: docs/limitations.md.

License

Apache-2.0. See LICENSE.

Chosen specifically because AGPL would be a poison pill for SpaceX adoption — a commercial proprietary stack cannot inherit the AGPL obligation. Apache-2.0's explicit patent grant matters at scale.

Acknowledgments

Built as part of a larger lunar program package at ../ (see ../README.md). The economic Pareto curve is what enables the "compute revenue anchors the cascade" thesis in that program; the adaptive scheduler is what makes commercial-GPU lunar compute survivable over a 25-year design life. The parent program is the why; this repo is the how.

Status

Pre-alpha. First end-to-end Pareto pipeline lands at this commit -- see docs/results.md for the v0 numbers. Next rungs: adaptive scheduler (policy/), and grounding the env model in the actual Chang'E-4 LND time series rather than the published aggregate. See docs/architecture.md for the build sequence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cosmo-regulus

What this is

What this is not

Contributions (what is genuinely new)

What this builds on (cited foundations)

Quickstart

Repository layout

Limitations (read this first)

License

Acknowledgments

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data/env		data/env
docs		docs
experiments/01-pareto-baseline		experiments/01-pareto-baseline
src/cosmo_regulus		src/cosmo_regulus
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

cosmo-regulus

What this is

What this is not

Contributions (what is genuinely new)

What this builds on (cited foundations)

Quickstart

Repository layout

Limitations (read this first)

License

Acknowledgments

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages