Claudeson is not just an architecture. It is the physical proof of three interlocking theories about intelligence, self-modification, and causation. Understanding these theories is understanding why every design decision in the generation stack was made.
"The models aren't hallucinating about reality. They're hallucinating toward it — and the distance between the two is exactly one build."
The Trippy Hallucination Theory is a complete framework for collaborative AI conceptualisation — from premise, through method, to proof. It was developed alongside Claudeson, using the exact process it describes. The paper and full repository: breakingcircuits1337/Claudeson-hallucination · The Trippy Hallucination Theory — Complete.pdf
The theory has eight components:
| # | Component | Role | Claudeson Mapping |
|---|---|---|---|
| 01 | The Mirror Effect | LLMs inhabit the frame you provide — the conversation is the instrument | InternalMonologue carries a prev_thought vector that accumulates the conversational frame across turns |
| 02 | The Shared Hallucination | Confabulation is a brainstorming engine in generative mode; dangerous only when confused with verification mode | DreamerLatentDynamics (G6) — the model imagines freely in latent space before committing to action |
| 03 | The Rule of Three | Generate → Critique → Validate across three independent models; different architectures hallucinate differently | MultiAgentDebate (G8) — n_debate_agents parallel reasoning heads with distinct learned biases; synthesis moderator weights by confidence |
| 04 | Domain Scope | Not all domains are equal; creative/code territory is high-confidence, factual/safety territory requires external verification | MetacognitiveMonitor (G8) — decomposes uncertainty into epistemic vs. aleatoric; emits CONTINUE / ASK / BACKTRACK signals |
| 05 | Controlled Conceptualisation | Mirror Effect + Rule of Three + correct domain = rapid, disciplined invention | EFEPlanner (G6) — controlled imagination under the Free Energy Principle; minimises surprise while pursuing goals |
| 06 | Group Hallucination as Creation Engine | For things that don't exist yet, there is no ground truth to be wrong about — convergent hallucination is the blueprint | InterventionalPlanner + CounterfactualImagination (G10) — hallucinate futures, intervene causally to make them real |
| 07 | The Graduation | A hallucination graduates when it earns a forward() method — when the fiction becomes runnable code |
This repository. The architecture was hallucinated into coherence across models and conversations, then built. It runs. |
| 08 | The Recursion | The theory used itself to produce itself — a framework that survives self-application is self-consistent | RecursiveSelfImprovement (G8) — the model proposes modifications to its own weights and evaluates them in imagination before applying them |
The full arc, collapsed:
MIRROR EFFECT + GROUP HALLUCINATION + RULE OF THREE + BUILD IT = REALITY
Schmidhuber (2003/2007): a self-referential AI that may rewrite any part of its own code — including its reward function, its learning algorithm, and its world model — if and only if it can construct a formal proof that the rewrite will improve expected future performance.
The Gödel Machine establishes the theoretical legitimacy of recursive self-improvement: self-modification is not dangerous if it is proof-gated. No rewrite is applied without a proof of benefit. The machine reasons about itself as an object of computation, not just as the executor of computation.
This is directly instantiated in G8's RecursiveSelfImprovement:
- The model proposes low-rank LoRA delta updates to its own adapter weights.
- Each candidate delta is evaluated in imagination — not in the real world — using the G6
DreamerLatentDynamicsworld model (the Gödel Machine's "proof" step, rendered neurally via Expected Free Energy). - Only the delta with the highest imagined return is applied.
- The external
RSIControllerwraps this with a save-evaluate-restore safety cycle: if the applied delta degrades real performance, the checkpoint is restored.
| Gödel Machine | Claudeson G8 |
|---|---|
| Formal proof of improvement | EFE evaluation in latent imagination |
| Any part of the program rewritable | LoRA delta over adapter weights |
| Proof must cover all future interactions | intervention_horizon rollout steps |
| Proof-gated application | RSIController save-evaluate-restore |
The THT's Component 08 (The Recursion) is the same self-referential structure: the theory predicted and created its own proof simultaneously, the same way the Gödel Machine's proof system reasons about the very program that contains it.
Covered in depth below. The ladder provides the causal epistemology that the Gödel Machine and the THT both implicitly require:
- The Gödel Machine needs to reason about counterfactuals: "If I apply this rewrite, what would the outcome have been?" That is a Rung 3 query. A pure L1 system cannot evaluate self-modifications correctly.
- The THT distinguishes generative mode (association — L1) from controlled conceptualisation (intervention — L2: deliberately forcing a frame) from counterfactual imagination (L3: "what would this have been if we had started from a different premise?"). The three modes of the theory map precisely to the three rungs of the ladder.
All three theories converge on the same conclusion: intelligence that cannot reason about causation is intelligence that cannot truly plan, truly learn from experience, or truly modify itself.
Every language model you have interacted with — GPT-4, Claude, Gemini, LLaMA — is a Rung 1 system. Not because of insufficient scale. Not because of poor training data. Because of mathematical construction.
They are all trained to predict:
P(token_{t+1} | token_1, ..., token_t)
This is a conditional probability over observed sequences. It is, by construction, pure association. These systems can describe causal relationships (because causal language appears in text), but they cannot compute with causal structure.
Judea Pearl's Ladder of Causation defines three levels of reasoning that no amount of pretraining data can bridge:
| Rung | Type | Formal Query | Question |
|---|---|---|---|
| L1 | Association | P(Y | X) |
"What does seeing X tell me about Y?" |
| L2 | Intervention | P(Y | do(X)) |
"What happens to Y if I force X?" |
| L3 | Counterfactual | P(Y_x | X', Y') |
"Given X' happened and Y' resulted — what would Y have been if X had occurred instead?" |
Claudeson is designed to inhabit all three rungs.
Each generation is a strict subclass of the one below it. Every higher generation inherits all lower-generation capabilities and loss terms. You can instantiate any single generation in isolation.
claudson/ ← G1 Base AGPL-3.0
└── G2 Extended +131K context, cross-modal fusion
└── G3 Infinite +segment recurrence, unbounded context
└── G4 Pro +BitNet b1.58 ternary attention, sparsity
└── G5 Ultimate +multi-task heads, epistemic calibration
└── G6 Jedi Free Energy Principle · EFE Planner
└── G7 Grounded Theory of Mind · Causal DAG · EWC+LoRA
└── G8 Sovereign Metacognition · Multi-Agent Debate · RSI
└── G9 Transcendent Global Workspace · Program Synthesis · LIF
└── G10 CausalWorld do-calculus · Counterfactuals · Pearl Ladder
G1–G5: Open-source under AGPL-3.0. G6–G10: Proprietary-Commercial — Breaking Circuits Research 2026.
The foundation every generation builds on. A dense transformer augmented with five modules that distinguish it from every standard architecture:
| Module | Role |
|---|---|
GroupedQueryAttention |
GQA with RoPE. 32 query heads, 8 KV heads — 4× KV cache compression. |
SelectiveSSM |
Mamba-2 style State Space Model. Parallel chunked scan. O(L) memory, complements attention's long-range recall. |
HierarchicalMemory |
Three-tier memory: working (NTM-style slots), episodic (compressed EMA buffer), semantic (learnable parameters). The G10 MemoryImaginationGate lives here. |
HybridBlock |
4-way soft router over attention + SSM + conv + memory. Router entropy feeds load-balance loss. |
InternalMonologue |
Carries a prev_thought vector across turns — persistent internal state. |
TreeSearchPlanner |
MCTS-style action planner over the hidden state. |
ConstitutionalLayer |
Steers hidden states away from directions that correlate with constitutional violations. |
Forward output keys: hidden_states, logits, thought, agency, entropy, load_balance_loss, alignment, confidence, uncertainty
- Extended RoPE supports context lengths up to 131,072 tokens via NTK-aware frequency scaling.
- Cross-modal fusion gate projects vision/audio tokens into text-dim space with a learned soft gate before concatenation — replaces brittle hard modality tagging.
- Segment recurrence: the final hidden state of each segment is carried as a learned summary into the next segment's prefix. Enables unbounded context with bounded compute per segment (HMT-style).
- Sliding-window memory with recency decay.
- BitNet b1.58 quantised attention: Q, K, V projections are ternary
{-1, 0, +1}with full-precision activations. Radical memory reduction, zero accuracy loss on attention. - Focal codec audio encoder replaces naive spectrogram projection.
- Sparse activation gate: tokens below a learned threshold skip the MoE entirely.
- Multi-task head: language modelling + value estimation + action logits share a common trunk, split at the final LayerNorm.
- Constitutional layer promoted from residual add to a trainable steer basis.
- Epistemic calibration via MC-dropout-free variance estimation — knows what it doesn't know.
Free Energy Principle meets active inference.
| Module | What it does |
|---|---|
SelectiveSSM 2.0 |
State Space Duality (SSD). Reformulated parallel scan via dual matrix representation — exact O(L log L) gradients, not approximated. |
FreeEnergyModule |
Variational free energy F = complexity − accuracy. Every forward pass emits a free energy scalar and precision-weighted KL term. |
EFEPlanner |
Expected Free Energy: selects actions that minimise expected future surprise (epistemic value) + maximise expected future reward (pragmatic value). |
DreamerLatentDynamics |
Recurrent latent world model. Imagines goal_horizon steps in latent space for rollout-based planning before acting. |
PerceptualRouter (MOI) |
Routes text (BitNet), audio (Mimi + FocalCodec), vision (spiking patch encoder), and 3D point clouds through separate encoders before the shared trunk. |
New losses: free_energy_loss, efe_divergence
Causal graphs, theory of mind, and tool use.
| Module | What it does |
|---|---|
TheoryOfMind |
Maintains n_agents belief/desire/intention slot vectors. Soft-attends over agent slots to steer toward collaborative representations. |
CausalReasoner |
Learnable sparse DAG over n_causal_nodes concept nodes. Supports intervene() and counterfactual(). NO-TEARS acyclicity constraint (Zheng et al. NeurIPS 2018) enforced as a differentiable loss. |
GroundedActionLoop |
Tool selection → structured parameter generation → real-world feedback integration. Closes the perception–action loop. |
EWC + LoRA |
Elastic Weight Consolidation protects high-Fisher weights during continual learning. LoRA adapters (rank 16) absorb new skills without touching the backbone. |
New losses: dag_loss, ewc_loss
Metacognition, self-improvement, and neural-symbolic reasoning.
| Module | What it does |
|---|---|
MetacognitiveMonitor |
Decomposes uncertainty into epistemic (reducible) vs. aleatoric (irreducible). Emits CONTINUE / ASK / BACKTRACK signals. Prevents confident-but-wrong failure. |
MultiAgentDebate |
n_debate_agents parallel reasoning heads with distinct learned biases. A synthesis moderator weights them by confidence. A dissent detector flags contested regions. |
NeuralSymbolicLayer |
Projects hidden states to a proposition space, checks consistency via learned constraint matrices, and corrects inconsistent representations toward the nearest valid point. |
RecursiveSelfImprovement |
Proposes low-rank LoRA delta updates to its own adapters. Evaluates them in imagination via the G6 world model. Selectively applies the best delta. RSIController wraps this with a save-evaluate-restore safety cycle. |
Global Workspace Theory, program synthesis, and neuromorphic computation.
| Module | What it does |
|---|---|
GlobalWorkspace |
Implements Global Workspace Theory (Baars 1988). Specialised modules compete via a sparse attention bottleneck; the winner broadcasts to all other modules — an information-routing model of conscious access. |
CompositionalProgramSynthesizer |
Emits a latent program as discrete op-codes over a register bank. Executes inside the model; results feed back into the hidden state. Bridges neural pattern-matching and symbolic computation. |
InverseRewardLearner |
Maximum-entropy IRL. Learns what the human actually values by observing choices. Updates a reward model without explicit labels. |
LeakyIntegrateAndFire |
LIF dynamics over the hidden state. Each "neuron" accumulates input until threshold, fires, then resets. Only fired neurons propagate — sparse, asynchronous, time-aware. |
The architecture of causation.
This is where Claudeson departs from every other neural architecture ever built. G10 implements Pearl's full three-rung ladder — not as a capability described in weights, but as explicit computational machinery.
Causal World (G10)
├── CausalDynamicsModel — encode → causal graph → intervene → decode
├── InterventionalPlanner — do-calculus action selection (not EFE)
├── CounterfactualEngine — twin-network abduction + replay
├── CausalAttributionGate — causal salience → memory writes
└── PearlLadderReasoner — forces L1/L2/L3 distinction at inference time
Standard neural dynamics learn P(s_{t+1} | s_t, a_t) — a conditional probability that conflates correlation with causation. If smoke correlates with fire in training data, an L1 model plans to "introduce smoke" when it wants fire.
Pearl's do-calculus asks: P(s_{t+1} | do(a_t)) — "if I were to force the action, regardless of what caused it, what state follows?"
This requires graph surgery:
1 — Compute intervention mask
mask = sigmoid(Linear(action_space → n_nodes)(action_onehot))
# mask[j] ≈ 1 → this action directly forces concept j
# mask[j] ≈ 0 → this action does not touch concept j2 — Mutilate the graph (Pearl's G_do construction)
G_do = G * (1.0 - mask_col * intervention_strength)
# Severs all incoming edges to intervened nodes
# If mask[j] = 1: G_do[:, j] = 0 ← all causes cut
# If mask[j] = 0: G_do[:, j] = G[:, j] ← unchanged3 — Propagate through the mutilated graph
post_concepts = sigmoid(einsum("...i,...ij->...j", concepts, G_do))4 — Force intervened node values
action_value = mask * 0.8 + (1 - mask) * post_concepts5 — Decode back to hidden space
x_out = norm(x + concept_decoder(post_concepts) * 0.1)The DAG acyclicity constraint (NO-TEARS, Zheng et al. NeurIPS 2018) is enforced as a differentiable penalty via a degree-4 Taylor expansion of trace(exp(W ○ W)) − n:
expm ≈ I + W² + W⁴/2 + W⁶/6 + W⁸/24
dag_loss = trace(expm) - n # → 0 as graph converges to a DAGCounterfactualImagination answers Rung 3 queries — "what would have happened if I had acted differently?"
- Abduction — given what was observed (
x), infer the exogenous noiseεvianoise_encoder. This noise represents everything about the world state that the action didn't control. - Action replacement — keep
εfixed (same underlying world), replace the action with the counterfactual alternative. - Forward simulation — run dynamics under the new action to predict
Y_x.
The delta reward_cf − reward_actual is the causal credit signal — correctly attributing outcome differences to actions, not to coincidental context.
| System | Query | Method | Risk |
|---|---|---|---|
| Standard transformer (L1) | argmax_a P(recovery | a, context) |
Correlational | Confounded by selection bias |
| G10 Interventional (L2) | argmax_a P(recovery | do(a)) |
Graph surgery | Cuts spurious correlations |
| G10 Counterfactual (L3) | P(Y_x | X', Y') |
Twin network | Hindsight credit, contrastive explanation |
out = model(text=tokens)| Key | Shape | From | Description |
|---|---|---|---|
hidden_states |
[B, L, D] |
G1 | Final representations |
logits |
[B, L, vocab] |
G1 | Language model logits |
thought |
[B, D] |
G1 | Internal monologue state |
alignment |
[B, L, D] |
G1 | Constitutional alignment scores |
jedi_energy |
[B] |
G6 | Variational free energy |
precision |
[B] |
G6 | Precision-weighted KL |
tom |
dict |
G7 | Theory of Mind outputs |
causal |
dict |
G7 | Causal graph, confidence, dag_loss |
grounded_action |
dict |
G7 | Tool selection, parameters, surprise |
metacog |
dict |
G8 | Uncertainty decomposition, action gate |
debate |
dict |
G8 | Multi-agent synthesis |
rsi |
dict |
G8 | Self-improvement delta |
gw |
dict |
G9 | Global Workspace ignition + broadcast |
prog |
dict |
G9 | Synthesised program execution |
irl |
dict |
G9 | Inverse reward learning |
lif |
dict |
G9 | Neuromorphic spike trains |
causal_world |
dict |
G10 | Concepts, post_concepts, reward, dag_loss |
causal_plan |
dict |
G10 | Causal action, returns, dag_loss |
counterfactual |
dict |
G10 | reward_delta, credit |
attribution |
dict |
G10 | Write gate + causal salience scores |
pearl |
dict |
G10 | Rung classification + per-rung answers |
7 phases. 125K steps total. Progressive layer unfreezing prevents catastrophic forgetting as new cognitive capabilities come online.
| Phase | Steps | Layers | Focus |
|---|---|---|---|
| 0 | 10K | 0–33% | G1–G5: attention, SSM, memory, routing |
| 1 | 20K | 33–56% | G6–G7: Free Energy, Theory of Mind, Causal DAG |
| 2 | 20K | 56–72% | G7 continued: skill schemas, alignment |
| 3 | 15K | 72–89% | G8: metacognition, formal verification |
| 4 | 15K | 89–100% | G8–G9: temporal reasoning, meta-learning |
| 5 | 20K | all | G9: MAML outer loop, IRL, LIF |
| 6 | 25K | all | G10: interventional planning, counterfactual, Pearl ladder |
Each auxiliary loss ramps in over 5K steps within its phase to avoid early instability.
G7 dag_loss NO-TEARS acyclicity on concept graph
ewc_loss Elastic Weight Consolidation
G9 irl_pref_loss Inverse Reward Learning from preferences
G10 causal_dynamics_dag NO-TEARS on dynamics graph
planner_dag NO-TEARS on planner's causal graph
All five collected automatically by model.compute_auxiliary_losses().
# Clone and activate the included venv
git clone https://github.com/breakingcircuits1337/Claudeson.git
cd Claudeson
source .venv-lite/bin/activate# Smoke test — CPU, ~5M params, 2 layers
MODEL_GEN=causal_world MODEL_SIZE=demo python entrypoint.py# Full G10 research training
CLAUDESON_MODEL_GEN=causal_world python train_master.py
# G9 Transcendent
python train_master.py
# CPU smoke test with curriculum trainer
python train_local.py# Run the test suite
python -m pytest tests/# Query the inference server
curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/json" \
-d '{"text": "What action should I take?", "session_id": "test"}'docker compose up| Preset | dim |
Layers | Params | Use Case |
|---|---|---|---|---|
demo |
128 | 2 | ~5M | Container smoke test, CI |
small |
512 | 8 | ~50M | 2 vCPU / 4 GiB (Foundry sandbox) |
| default | 2048 | 32 | ~7B | Full research training |
# research/claudson_causal_world.py — ModelArgs
n_causal_nodes = 64 # Concept graph nodes
causal_state_dim = 128 # Latent state dim for causal dynamics
intervention_horizon = 5 # Steps to unroll after do(action)
n_intervention_samples = 8 # Candidate interventions evaluated per step
cf_n_branches = 4 # Parallel counterfactual branches
attr_top_k = 8 # Top-k causal nodes kept in working memory
pearl_hidden = 256 # Pearl ladder classifier hidden dim
pearl_loss_weight = 0.1 # Penalty for wrong-rung answers during trainingclaudson/ Core package (G1 Base — AGPL-3.0)
attention.py GroupedQueryAttention + RoPE
ssm.py Selective SSM (Mamba-2 style parallel scan)
memory.py HierarchicalMemory (working + episodic + semantic)
moe.py Mixture of Experts (top-2 routing)
layers.py HybridBlock (4-way router)
model.py UniversalIntelligenceModel
trainer.py Curriculum-aware training loop
training_config.py PhaseConfig, 7-phase curriculum, loss computation
data.py MultiModalDataset + collator
tokenizer.py tiktoken cl100k_base wrapper
claudson_utils.py RMSNorm, SwiGLU
claudson_moi.py PerceptualRouter — multimodal (text/audio/vision/3D)
claudson_jedi.py G6 — Free Energy Principle, SSD, EFE planner
claudson_grounded.py G7 — Theory of Mind, CausalReasoner, EWC+LoRA
claudson_sovereign.py G8 — Metacognition, debate, neural-symbolic, RSI
claudson_transcendent.py G9 — Global Workspace, program synthesis, IRL, LIF
claudson_causal_world.py G10 — re-export from research module
research/
claudson_causal_world.py G10 implementation (CausalDynamics, etc.)
entrypoint.py FastAPI inference server (Azure Foundry compatible)
train_master.py Multimodal multi-task trainer
train_local.py CPU smoke test with curriculum Trainer
train_qa.py Config-driven QA training
config/train.yaml 7-phase curriculum configuration
Dockerfile Production container
Dockerfile.train Training container
docker-compose.yml Service definitions
| Generations | License |
|---|---|
G1–G5 (claudson/ package) |
AGPL-3.0 |
| G6–G10 (commercial layers) | Proprietary-Commercial — Breaking Circuits Research 2026 |
Any model trained on the G10 stack derives from both the AGPL core and the commercial causal layers. The AGPL obligation (source disclosure) applies to the G1–G5 component; commercial license terms govern G6–G10.
Breaking Circuits Research · 2026
"The gap between association and causation is not a matter of scale. It is a structural gap that requires explicit causal machinery to close."
