feat: MoE gate topology + expert clustering + scaffold cross-reference by AdaWorldAPI · Pull Request #58 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-03-30T08:24:44Z

What

Extends causal_diff.rs with MoE gate topology analysis — the other half of the reasoning reverse-engineering pipeline.

New functions

`extract_gate_topology(bgz7_path)`

Finds ffn_gate_inp tensors in a bgz7 file. Each row = one expert's activation fingerprint as Base17. For Maverick: 128 rows per MoE block.

`cluster_experts(fingerprints, threshold)`

Pairwise L1 between all experts within each block. Connected-component grouping finds structurally interchangeable expert groups. At 123,000× compression on expert weights, we expect >90% of pairs to be redundant.

`cross_reference_gate_scaffold(clusters, scaffold_blocks)`

The key insight connector:

Attention scaffold (from Qwen3.5 diff): blocks where Q+O shifted → reasoning circuit
Gate redundancy (from Maverick topology): blocks where experts are interchangeable → routing dominates
Cross-reference: scaffold blocks WITH high redundancy → reasoning changes work THROUGH the router, not the experts

Tests

test_maverick_gate_topology — loads all 18 Maverick bgz7 shards, extracts gates, clusters
test_cross_reference_gate_scaffold — full pipeline: Qwen3.5 diff → scaffold blocks → Maverick gates → routing dominance check

The loop that closes

Maverick 123,000× → experts are commodity (gate topology)
Qwen3.5 Q+O shift → reasoning is routing (attention diff)
Cross-reference   → reasoning = routing at both scales
NARS truth        → first observed evidence for the stack

…ss-ref extract_gate_topology() — pulls ffn_gate_inp Base17 rows from bgz7, one row per expert. Each row IS the expert's structural identity. cluster_experts() — pairwise L1 between experts within each block, connected-component grouping of structurally interchangeable experts. At threshold=500, Maverick's 123,000× compression predicts >90% redundancy. cross_reference_gate_scaffold() — links attention scaffold blocks (Q+O shifted from Qwen3.5 diff) with gate redundancy per block. Routing-dominated blocks = reasoning changes work through the router, not through the expert weights. Tests: - test_maverick_gate_topology: load all 18 Maverick bgz7 shards - test_cross_reference_gate_scaffold: full pipeline connecting Qwen3.5 attention diff with Maverick gate structure

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54ed7eedf3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-30T08:27:55Z

+        if !t.name.contains("gate_inp") && !t.name.contains("gate.weight") {
+            continue;


Restrict gate tensor matching to router-only names

Narrowing logic here is too broad: matching "gate.weight" pulls in dense FFN gate tensors (e.g., blk.{i}.ffn_gate.weight is a SiLU MLP gate, not a router gate), so extract_gate_topology will treat thousands of FFN rows as experts and feed them into cluster_experts. That corrupts redundancy conclusions and can make the O(n²) adjacency allocation/computation explode for normal dense blocks, especially when running the Maverick shard pipeline.

Useful? React with 👍 / 👎.

AdaWorldAPI merged commit e900ad0 into master Mar 30, 2026
5 of 14 checks passed

AdaWorldAPI deleted the claude/gate-topology branch March 30, 2026 08:24

chatgpt-codex-connector Bot reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: MoE gate topology + expert clustering + scaffold cross-reference#58

feat: MoE gate topology + expert clustering + scaffold cross-reference#58
AdaWorldAPI merged 1 commit into
masterfrom
claude/gate-topology

AdaWorldAPI commented Mar 30, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if !t.name.contains("gate_inp") && !t.name.contains("gate.weight") {
		continue;

Conversation

AdaWorldAPI commented Mar 30, 2026

What

New functions

extract_gate_topology(bgz7_path)

cluster_experts(fingerprints, threshold)

cross_reference_gate_scaffold(clusters, scaffold_blocks)

Tests

The loop that closes

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`extract_gate_topology(bgz7_path)`

`cluster_experts(fingerprints, threshold)`

`cross_reference_gate_scaffold(clusters, scaffold_blocks)`