Skip to content

P0: Restore CORTEX architectural fidelity — fix Metroid/medoid/centroid conflation#55

Merged
devlux76 merged 6 commits intomainfrom
copilot/p0-restore-architecture-documentation
Mar 13, 2026
Merged

P0: Restore CORTEX architectural fidelity — fix Metroid/medoid/centroid conflation#55
devlux76 merged 6 commits intomainfrom
copilot/p0-restore-architecture-documentation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 13, 2026

Early conceptual drift collapsed three distinct constructs into one. The term "Metroid" was applied to the sparse proximity/neighbor graph (MetroidNeighbor, MetroidSubgraph, metroid_neighbors) when it should exclusively denote the dialectical search probe { m1, m2, c }. The actual MetroidBuilder — the core of CORTEX's epistemic capability — was entirely absent from design, plan, and code.

The three constructs that must never be conflated:

Term Correct meaning
Medoid Existing cluster-representative node
Centroid Computed vector average; never stored
Metroid Ephemeral dialectical search primitive { m1, m2, c } built at query time

DESIGN.md (v1.6)

  • Added "Conceptual Constructs: Medoid, Centroid, and Metroid" section with the critical invariant and distinction table
  • Added MetroidBuilder algorithm as a thesis→freeze→antithesis→synthesis loop aligned to the original spec:
    • Thesis: m1 found via medoid search from query vector q — always an existing memory node, never a phantom
    • Freeze: lock the first n protected Matryoshka dimensions as fixed structure
    • Antithesis: m2 is the medoid of the cosine-opposite set — each candidate scored as -cosine_similarity(candidate_free_dims, m1_free_dims); the medoid of the top-scoring candidates is m2. Not merely a semantically-opposing node; not a raw vector negation.
    • Synthesis: centroid c computed once and immediately frozen as a stable evaluation platform; never recomputed during Matryoshka unwinding. All subsequent candidates (m3, m4, ...) evaluated relative to this frozen c.
  • Added conceptual framing: the antithesis medoid (m2) produces the frozen centroid (c), which becomes the stable platform for deeper exploration — opposition turned into a navigable step
  • Added philosophical foundation: centroid (mean) provides gravitational pull; medoid (median) ensures the anchor is always an actual memory node. Neither alone guarantees epistemic honesty. The Metroid loop combines them.
  • Stated explicitly: the Metroid primitive replaces prior sparse nearest-neighbor graph constructions as the canonical mechanism for guided semantic exploration
  • Enriched centroid description with "Kansas space" model: c is a synthetic center of mass where no real node typically exists — a neutral vantage point. Scoring from c gives equal weight to both poles. Candidates closer to m1 are thesis-supporting; closer to m2 are antithesis-supporting; near c are synthesis-zone; far from all three signals a third conceptual region or knowledge gap.
  • Updated Dialectical Search zone table to use "closer to m1/m2 than to c" framing; scoring paragraph explains anchoring bias introduced by scoring from m1 or m2 instead of c
  • Updated Matryoshka Dimensional Unwinding section: new candidates at each unwind step evaluated against the frozen c (not a recomputed centroid); search stops at protected floor or broadcasts curiosity query
  • Added edge-role distinction table in SemanticNeighbor section: SemanticNeighbor.cosineSimilarity drives neighbor discovery and Bayesian belief updates; Hebbian edge weights (edges_hebbian) drive TSP tour traversal — these must never be conflated
  • Updated SemanticNeighborSubgraph with inline note that OpenTSPSolver uses Hebbian weights for tour traversal, not cosine distances
  • Updated Incremental Strategy section: neighbor insertion uses Williams-cutoff distance (not a fixed K); Daydreamer lazily builds additional cross-edges during idle passes
  • Added Knowledge Gap Detection and expanded P2P Curiosity Request — CuriosityProbe now includes mimeType and modelUrn fields
  • Completely rewrote Terminology section; Metroid and MetroidBuilder entries updated with cosine-opposite medoid algorithm and frozen centroid invariant

PLAN.md (v1.5)

  • Fixed Hippocampus module statuses: Chunker.ts ✅, PageBuilder.ts ✅, Ingest.ts 🟡 partial (hierarchy + neighbor insertion deferred)
  • Fixed Cortex module statuses: Query.ts 🟡 Needs Rework, QueryResult.ts 🟡 Needs Rework
  • Added MetroidBuilder, DialecticalSearch, KnowledgeGapDetector to CORTEX module table
  • Renamed FastMetroidInsertFastNeighborInsert, FullMetroidRecalcFullNeighborRecalc
  • FastNeighborInsert row corrected: Williams-cutoff distance, cosine similarity for discovery + Bayesian updates, Hebbian weights for TSP traversal, Daydreamer lazy reconnection
  • Rewrote Phase 2 to include MetroidBuilder and knowledge gap detection as first-class deliverables

TODO.md

  • P0-X (7 subtasks): rename MetroidNeighborSemanticNeighbor, MetroidSubgraphSemanticNeighborSubgraph, 6 MetadataStore methods, IDB store metroid_neighborsneighbor_graph, and planned file names
  • P1-M fully rewritten to match the corrected algorithm:
    • m1 via medoid search (existing node, not centroid)
    • m2 via -cosine_similarity scoring + medoid of top-scoring candidates (not raw vector negation)
    • c computed once and frozen; never recomputed during Matryoshka unwinding
    • Iterative unwind evaluates candidates against frozen c; close enough → freeze and leap; otherwise → curiosity broadcast
    • Exit criteria explicitly states the frozen centroid invariant
    • Updated test cases: test c is frozen; test m2 is medoid of cosine-opposite set (not nearest-opposing node)
  • P1-N (4 subtasks): implement cortex/KnowledgeGapDetector.ts and CuriosityProbe construction (including mimeType + modelUrn)
  • P1-C1 updated: Williams-cutoff distance, edge-role invariant (cosineSimilarity vs Hebbian) explicitly stated
  • P1-E renamed "Upgrade" → "Rewrite": existing flat top-K code path fully superseded; non-backward-compatible

ARCHITECTURE-REVIEW.md (new)

15 divergences catalogued. Zero-drift section updated with an explicit caveat: "zero drift" means no logic conflicts, not "no future work needed" — Query.ts and QueryResult.ts must be substantially rewritten (P1-E); Ingest.ts needs hierarchy and neighbor insertion (P1-B, P1-C). PLAN.md is the authoritative completion status source.

README.md

  • Cortex section rewritten as an engineering spec using "must" language: "Required behavior (v0.5+ engineering target)"
  • Kansas space note added as sub-bullet of the Metroid bullet
  • Current behavior relabelled "(v0.1 — placeholder)"

Code changes (core/)

core/ModelProfile.ts

  • Added matryoshkaProtectedDim?: number to ModelProfile and ModelProfileSeed

core/ModelDefaults.ts

  • Updated buildModelProfileFromSeed to validate and pass through matryoshkaProtectedDim

core/ModelProfileResolver.ts

  • Added matryoshkaProtectedDim?: number to ModelProfileRegistryEntry; propagated through register() and resolve()

core/BuiltInModelProfiles.ts

  • Added matryoshkaProtectedDim: 128 to EMBEDDING_GEMMA_300M_PROFILE; registry comment documents nomic-embed-text-v1.5 = 64

scripts/guard-model-derived.mjs

  • Extended MODEL_FIELD_PATTERN to include matryoshkaProtectedDim, enforcing it is never hardcoded outside BuiltInModelProfiles.ts
Original prompt

This section details on the original issue you should resolve

<issue_title>P0- We have a fundamental misunderstanding of the design and it lead to drift. Everything is blocked until this is resolved.</issue_title>
<issue_description>CORTEX Architecture Alignment & Documentation Reconstruction

Objective

The current repository has diverged from the intended architecture of CORTEX due to an early conceptual collapse between medoids and Metroids. This drift propagated through DESIGN.md, PLAN.md, and implementation assumptions.

The result is that the system now resembles a vector neighbor graph, while the original architecture is a dialectical epistemic exploration system.

Your task is to restore architectural correctness by rewriting documentation and auditing the codebase against the intended design.

Do not modify code in this pass.

Instead:
1. Rewrite DESIGN.md to accurately describe the architecture.
2. Update PLAN.md to align with the corrected design.
3. Perform a comprehensive code review identifying architectural drift.
4. Rewrite TODO.md so implementation tasks match the true design.

Architectural Overview

CORTEX is a distributed epistemic memory system designed to enable:
• structured recall
• epistemically honest exploration
• knowledge gap detection
• collaborative learning through P2P exchange

The architecture is inspired by biological cognition and is composed of three major subsystems.

HIPPOCAMPUS

Hebbian Influenced Parametric Projection Over Clustered Autoassociative Memory Patterns to Unify Systems

Responsibilities:
• embedding generation
• hierarchical clustering
• medoid discovery
• Hebbian association edges
• append-only vector storage

HIPPOCAMPUS handles fast ingestion of experience.

CORTEX

Responsible for recall, structured exploration, and knowledge discovery.

This subsystem constructs Metroids, performs dialectical search, and detects knowledge boundaries.

DAYDREAMER

Idle-time consolidation process responsible for:
• LTP/LTD edge adjustments
• cluster stabilization
• prototype recomputation
• experience replay
• community detection

DAYDREAMER maintains long-term memory structure.

Critical Conceptual Distinction

The repository currently conflates three separate mathematical constructs:

Concept Meaning
Medoid Actual memory node representing cluster center
Centroid Mathematical average of vectors
Metroid Structured dialectical search probe

The current documentation incorrectly treats Metroid as a synonym for medoid-derived neighbors.

This is incorrect and must be corrected.

Correct Definition of a Metroid

A Metroid is a structured search probe used for epistemically balanced exploration of a topic.

It is constructed as:

Metroid = { m1, m2, c }

Where:
• m1 = thesis medoid representing the topic
• m2 = antithesis medoid discovered through constrained search
• c = centroid between them

The Metroid is built through recursive Matryoshka embedding exploration.

Metroid Construction Algorithm
1. Select the topic medoid (m1).
2. Freeze the protected lower embedding dimensions.
3. Within the remaining dimensions, search for the nearest medoid representing semantic opposition.

This medoid becomes:

m2

4.	Compute centroid:

c = centroid(m1, m2)

5.	Prefer the centroid as the search center to prevent semantic drift.
6.	Repeat the process while progressively unwinding Matryoshka embedding layers.
7.	Stop once reaching the protected dimension.

The protected dimension ensures semantic invariants remain stable.

Example:

Query:

How can I help pepperoni stick to my pizza?

Without dimensional protection, vector similarity could drift toward adhesives and glue.

Protected dimensions maintain the culinary context, preventing such semantic failures.

Dialectical Exploration

This algorithm ensures each search explores:
• supporting ideas (thesis)
• opposing ideas (antithesis)
• balanced conceptual center (centroid)

This prevents confirmation bias and enables epistemically balanced reasoning.

Knowledge Gap Detection

If at any stage of Metroid construction no suitable antithesis medoid can be found within constraints:

knowledge_gap = true

The system must conclude it does not possess sufficient knowledge to answer.

This triggers a P2P curiosity request.

The node broadcasts the Metroid probe representing the boundary of its knowledge.

Peers receiving this probe search their own graphs and may respond with relevant fragments.

This mechanism enables distributed learning without hallucination.

Required Documentation Corrections

  1. Rewrite DESIGN.md

DESIGN.md must clearly describe:
• HIPPOCAMPUS ingestion architecture
• medoid discovery
• Metroid construction
• Matryoshka dimensional unwinding
• dialectical search
• knowledge gap detection
• P2P curiosity requests

The document must allow a new engineer to understand and implement the architecture correctly.

  1. Update PLAN.md

After DESIGN.md...


💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

…roid conflation

- Rewrite DESIGN.md v1.2: add MetroidBuilder, dialectical search, knowledge gap
  detection, P2P curiosity, fix all 'Metroid neighbor graph' naming drift
- Update PLAN.md v1.2: add MetroidBuilder/KnowledgeGapDetector/DialecticalSearch
  modules, fix naming errors, correct Phase 2 description
- Rewrite TODO.md: add P0-X (naming fix tasks), P1-M (MetroidBuilder), P1-N
  (knowledge gap), update P1-C/E/F/P2-C/G with correct terminology
- Create ARCHITECTURE-REVIEW.md: 15-divergence catalog with file/component/
  current/intended/correction/todo-task for each issue
- Update README.md and copilot-instructions.md with correct Cortex description

No code changes in this pass.

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
Copilot AI changed the title [WIP] [P0] Rewrite documentation for CORTEX architecture alignment P0: Restore CORTEX architectural fidelity — fix Metroid/medoid/centroid conflation Mar 13, 2026
Copilot AI requested a review from devlux76 March 13, 2026 18:50
@devlux76 devlux76 marked this pull request as ready for review March 13, 2026 19:04
Copilot AI review requested due to automatic review settings March 13, 2026 19:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Documentation-focused PR to restore architectural terminology fidelity by separating Medoid, Centroid, and Metroid (dialectical probe { m1, m2, c }), and to catalog/plan remediation of the existing “Metroid neighbor graph” naming drift across the repo.

Changes:

  • Updated DESIGN.md / PLAN.md / TODO.md to define the correct constructs and rename the proximity graph concept to “semantic neighbor graph” in docs/plans.
  • Added ARCHITECTURE-REVIEW.md to inventory divergences and map them to TODO tasks.
  • Updated README.md and .github/copilot-instructions.md to reflect the corrected (intended) Cortex dialectical pipeline.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
TODO.md Adds P0-X rename workstream; updates planned tasks to use “semantic neighbor graph” and introduces MetroidBuilder/knowledge-gap work items.
README.md Updates Cortex description to include Metroid construction, dimensional unwinding, knowledge gaps, and curiosity probes.
PLAN.md Updates plan to include MetroidBuilder/dialectical/knowledge-gap modules and renames “Metroid neighbor” terminology to “semantic neighbor graph.”
DESIGN.md Adds the key conceptual distinction section + MetroidBuilder/dialectical/knowledge-gap design narrative; renames proximity graph terminology.
ARCHITECTURE-REVIEW.md New drift report documenting current divergences and a recommended fix order.
.github/copilot-instructions.md Updates high-level Cortex description to include Metroid/dialectical/knowledge-gap/P2P curiosity items.

Comment thread TODO.md Outdated
- [ ] **P1-M1:** Implement `cortex/MetroidBuilder.ts`
- Accept a query embedding and a list of resident medoids (shelf/volume/book representatives)
- Select m1: the medoid with highest cosine similarity to the query
- Freeze the protected lower Matryoshka dimensions (dimension count derived from ModelProfile; see `embeddingDimension` and `matryoshkaProtectedDim`)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot protectedDim is the model specific most coarse grained dimension. For nomic embed this is 64, for Gemma embed it is 128, I'm still searching for other models.

This will need to be part of the model specific config which probably needs to be specified on a per model basis.

This also means we need to add a
mimetype and model urn field when broadcasting a curiosity query. Otherwise we run the risk of getting completely incommensurable graph sections from dissimilar models at boundaries where they overlap the truncation points.

Please update accordingly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit cda985c.

matryoshkaProtectedDim as a per-model field:

  • Added matryoshkaProtectedDim?: number to ModelProfile, ModelProfileSeed, and ModelProfileRegistryEntry
  • EMBEDDING_GEMMA_300M_PROFILE now has matryoshkaProtectedDim: 128; the registry comment documents nomic-embed-text-v1.5 = 64 for when that provider is wired
  • buildModelProfileFromSeed validates the field (must be a positive integer and must not exceed embeddingDimension)
  • The guard script now enforces that matryoshkaProtectedDim is never hardcoded outside BuiltInModelProfiles.ts
  • MetroidBuilder will read the field from ModelProfile; if undefined (non-Matryoshka model), MetroidBuilder immediately returns knowledgeGap: true

CuriosityProbe with mimeType + modelUrn:

  • CuriosityProbe in DESIGN.md now reads { m1, partialMetroid, queryContext, knowledgeBoundary, mimeType, modelUrn }
  • modelUrn is derived from ModelProfile.modelId at construction time (never hardcoded)
  • DESIGN.md explains why both fields are required: peers must reject probes from incompatible embedding models to prevent incommensurable similarity scores at Matryoshka layer boundaries
  • TODO P1-N2 and P1-N4 tests updated accordingly

Centroid construction clarification (also updated): protected dims (index < matryoshkaProtectedDim) are copied from m1 — not averaged — to preserve the domain anchor. Unfrozen dims are element-wise averaged.

Comment thread PLAN.md Outdated
Comment on lines +80 to +84
@@ -81,7 +81,7 @@ This document tracks the implementation status of each major module in CORTEX. I
| Page ID Generation | ❌ Missing | `hippocampus/PageIdGenerator.ts` (planned) | Deterministic hash-based ID creation |
| Ingest Orchestrator | ❌ Missing | `hippocampus/Ingest.ts` (planned) | Main entry point: chunk → embed → persist → initialise PageActivity → build hierarchy → fast neighbor insert → hotpath admission |
| Hierarchy Builder | ❌ Missing | `hippocampus/HierarchyBuilder.ts` (planned) | Construct/update Books, Volumes, Shelves; attempt tier-quota hotpath admission for each level's medoid/prototype; Williams-derived fanout bounds; trigger split via ClusterStability when bounds exceeded |
| Fast Neighbor Insert | ❌ Missing | `hippocampus/FastMetroidInsert.ts` (planned) | Incremental Metroid neighbor update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-weight neighbor on degree overflow; check new page for hotpath admission |
| Fast Semantic Neighbor Insert | ❌ Missing | `hippocampus/FastNeighborInsert.ts` (planned) | Incremental semantic neighbor graph update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-weight neighbor on degree overflow; check new page for hotpath admission. **Note:** Not to be confused with Metroid construction, which is a CORTEX retrieval concern. |
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Do your best to resolve that properly using your own best judgment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit cda985c. Hippocampus module statuses updated: Chunker ✅ Complete, PageBuilder ✅ Complete, Ingest 🟡 Partial (hierarchy + neighbor insertion deferred). What Works Today section and blockers updated accordingly.

Comment thread PLAN.md Outdated
| Ingest Orchestrator | ❌ Missing | `hippocampus/Ingest.ts` (planned) | Main entry point: chunk → embed → persist → initialise PageActivity → build hierarchy → fast neighbor insert → hotpath admission |
| Hierarchy Builder | ❌ Missing | `hippocampus/HierarchyBuilder.ts` (planned) | Construct/update Books, Volumes, Shelves; attempt tier-quota hotpath admission for each level's medoid/prototype; Williams-derived fanout bounds; trigger split via ClusterStability when bounds exceeded |
| Fast Neighbor Insert | ❌ Missing | `hippocampus/FastMetroidInsert.ts` (planned) | Incremental Metroid neighbor update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-weight neighbor on degree overflow; check new page for hotpath admission |
| Fast Semantic Neighbor Insert | ❌ Missing | `hippocampus/FastNeighborInsert.ts` (planned) | Incremental semantic neighbor graph update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-weight neighbor on degree overflow; check new page for hotpath admission. **Note:** Not to be confused with Metroid construction, which is a CORTEX retrieval concern. |
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot We use hebbian edge weights in our tour to determine how far to walk during our tour. Cosine similarity is used for Bayesian updates of beliefs.

We also use a precomputed fanout in insertion where we insert the node with it's cosine nearest neighbors not in k space but up to distance where distance is a function of the Williams cutoff.

Daydreaming then walks that to reconnect the other side and build more edges so we don't get overwhelmed with a full graph update on every insert.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in commit be34986. The three-way distinction is now explicit across DESIGN.md, PLAN.md, and TODO.md:

  • Neighbor selection at ingest: cosine-nearest pages within Williams-cutoff distance (not a fixed K); cutoff derived from HotpathPolicy
  • Degree overflow eviction: lowest cosine-similarity neighbor
  • SemanticNeighbor.cosineSimilarity: drives neighbor discovery and Bayesian belief updates
  • Hebbian edge weights (edges_hebbian): drive TSP tour traversal — distance metric for how far to walk
  • Lazy reconnection: only initial forward/reverse edges at ingest; Daydreamer builds additional cross-edges during idle passes

DESIGN.md §SemanticNeighbor now has an explicit edge-role distinction table, and SemanticNeighborSubgraph carries an inline note that the TSP solver uses Hebbian weights for traversal (not the cosine distances in the subgraph edges). P1-C1 and P1-C3 in TODO.md updated to match.

Comment thread PLAN.md Outdated
Comment on lines +101 to +104
| Subgraph Expansion | 🟡 Partial | `storage/IndexedDbMetadataStore.ts` (`getInducedMetroidSubgraph` — to be renamed `getInducedNeighborSubgraph`) | BFS expansion implemented in storage layer; needs dynamic Williams bounds; needs orchestration wrapper |
| Open TSP Solver | ❌ Missing | `cortex/OpenTSPSolver.ts` (planned) | Dummy-node open-path heuristic for coherent ordering |
| Query Orchestrator | ❌ Missing | `cortex/Query.ts` (planned) | Main entry point: embed → resident-first rankingsubgraph expansion with dynamic bounds → TSP path → query cost meter → early-stop; return result |
| Result DTO | ❌ Missing | `cortex/QueryResult.ts` (planned) | Structured query result with provenance metadata (coherence path, subgraph size, hop count, edge weights) |
| Query Orchestrator | ❌ Missing | `cortex/Query.ts` (planned) | Main entry point: embed → select m1build Metroid → dialectical scoring → subgraph expansion → TSP path → query cost meter → early-stop; return result |
| Result DTO | ❌ Missing | `cortex/QueryResult.ts` (planned) | Structured query result with provenance metadata (coherence path, subgraph size, hop count, edge weights, knowledge gap flag) |
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Good catch! It exists but it's incomplete and now needs to be gone through and redone to match the new math.

Please use your best judgment and fix it the best you can.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in commit be34986. PLAN.md now marks both as "Needs Rework" with a note that the existing flat top-K implementation is a placeholder that doesn't use MetroidBuilder, Hebbian edge traversal, or cosine-similarity-bounded subgraph expansion. TODO.md P1-E changes: "Upgrade" → "Rewrite", explicit note that the flat-scoring code path is fully superseded, and P1-E1 now specifies that subgraph traversal uses Hebbian edge weights (not cosine distances) for tour ordering.

Comment thread DESIGN.md Outdated
Comment on lines +115 to +120
1. **Select m1** — Identify the topic medoid most relevant to the query embedding.
2. **Freeze protected dimensions** — Lock the lower Matryoshka embedding dimensions that encode invariant semantic context (domain, language register, topic class). These dimensions are never searched for antithesis.
3. **Search for m2** — Within the remaining (unfrozen) upper dimensions, search for the nearest medoid that represents semantic opposition to m1.
4. **Compute centroid** — `c = (m1_vec + m2_vec) / 2` (element-wise average over the unfrozen dimensions).
5. **Prefer centroid as search origin** — Use `c` as the primary starting point for subgraph expansion. This prevents semantic drift toward either pole.
6. **Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from step 3. Each unwinding broadens the antithesis search.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Its a good idea to understand completely what the centroid is doing here.

It's a center of mass between m1 and m2 so when we calculate the antithetical vector, we weight by closeness to the centroid not to m1 or m2.

But we need to recognize it as a Kansas space. There's likely nothing at all actually at the centroid. It's an empty field to stand in the middle of and lookup.

But from the position of the centroid we can tell right away if it's closer to New York or LA, and also if it's closer to the South Pacific than either of NY or LA.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in commit 622570b. DESIGN.md now captures the "Kansas space" concept throughout:

  • The c definition describes it as a synthetic center of mass where no real node typically exists — a neutral vantage point for measuring distances to both poles without anchoring bias
  • Step 4 "Important" block explains why c is valuable precisely because it's empty: from it you can tell immediately whether a candidate is closer to NY (m1), closer to LA (m2), or somewhere the South Pacific (third conceptual region)
  • Step 5 reorganised as a zone-classification list: near c = synthesis, much closer to m1 = thesis, much closer to m2 = antithesis, far from all three = signal for another Matryoshka unwind or knowledge gap
  • Dialectical Search: scoring paragraph explains that scoring from m1 or m2 instead of c would anchor all results toward one pole
  • Terminology Centroid entry updated with the same framing

Comment thread ARCHITECTURE-REVIEW.md
- `hippocampus/Chunker.ts` — Text chunking; correct
- `hippocampus/PageBuilder.ts` — Page entity construction; correct
- All `VectorBackend` implementations — correct

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Those were roughed in and marked complete earlier I think just so we didn't spin our wheels.

I'm unsure if they are impacted by this change at all and if so to what degree. The same is true for everything marked as "complete". Which is the actual end point of this exercise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in commit 622570b. The zero-drift caveat is now a 4-point bulleted list:

  1. What "zero drift" means: no logic conflicts — not a redesign candidate
  2. What it does not mean: unaffected by future work
  3. Per-file impact: Query.ts and QueryResult.ts must be substantially rewritten (P1-E); Ingest.ts needs hierarchy + neighbor insertion (P1-B, P1-C)
  4. Authoritative status: PLAN.md, not this section

Query.ts and QueryResult.ts are also now explicitly flagged in the file list as "must be substantially rewritten" rather than "correct direction".

Comment thread README.md Outdated
Comment on lines +64 to +75
@@ -65,12 +65,16 @@ This is the rapid, multi-path "write" system that turns raw experience into stru
When you ask a question, Cortex does **not** return a bag of similar vectors.

Instead it:
- Constructs a **Metroid** `{ m1, m2, c }` for the query — a structured dialectical search probe pairing the thesis medoid (m1) with an antithesis medoid (m2) and a balanced centroid (c)
- Performs Matryoshka dimensional unwinding to discover semantically opposing knowledge
- Performs parallel WebGPU "scoops" across the entire active universe (sub-millisecond)
- Pulls relevant sub-graphs from IndexedDB
- Traces closed-loop paths through Hebbian connections
- Returns only self-consistent, coherent context chains
- Detects **knowledge gaps** when no antithesis medoid exists within dimensional constraints
- Broadcasts P2P curiosity probes to discover missing knowledge from peers
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot The simplest answer is to add the word "must". It's a design document and engineering spec, not a marketing document.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in commit 622570b. The Cortex section now reads as an engineering spec:

  • Header: "Required behavior (v0.5+ engineering target)"
  • Every bullet uses "must": "Must construct", "Must perform", "Must detect", "Must broadcast"
  • Kansas space note added as a sub-bullet under the Metroid bullet
  • Current behavior relabelled "(v0.1 — placeholder)"

…Probe fields; fix doc accuracy

- core/ModelProfile.ts: add matryoshkaProtectedDim?: number to ModelProfileSeed and ModelProfile
- core/ModelDefaults.ts: validate + pass through matryoshkaProtectedDim in buildModelProfileFromSeed
- core/ModelProfileResolver.ts: carry matryoshkaProtectedDim through register()/resolve()
- core/BuiltInModelProfiles.ts: add matryoshkaProtectedDim: 128 to EMBEDDING_GEMMA_300M_PROFILE
- scripts/guard-model-derived.mjs: add matryoshkaProtectedDim to MODEL_FIELD_PATTERN
- DESIGN.md: clarify centroid c construction (protected dims copied from m1); expand
  CuriosityProbe with mimeType + modelUrn fields; update model-derived numerics table
- TODO.md: update P1-M1 (matryoshkaProtectedDim from ModelProfile), P1-N2/N4 (mimeType+modelUrn)
- PLAN.md: fix module statuses (Chunker/PageBuilder/Ingest/Query/QueryResult exist);
  update What Works/Doesn't Work; fix blockers section
- README.md: reframe Cortex description as planned vs current behavior
- ARCHITECTURE-REVIEW.md: note implemented components in zero-drift section

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
… Query.ts rework scope

PLAN.md:
- FastNeighborInsert row: use Williams-cutoff distance (not K), lazy Daydreamer
  reconnection, cosine=discovery+Bayesian vs Hebbian=TSP traversal; DESIGN.md cross-ref
- Query.ts/QueryResult.ts: status "Needs Rework"; existing flat top-K code is superseded

TODO.md:
- P1-C1: neighbors found within Williams-cutoff distance; Daydreamer builds additional
  edges lazily; edge-role invariant (SemanticNeighbor.cosineSimilarity vs edges_hebbian)
- P1-C3: add test that FastNeighborInsert does NOT create Hebbian edges
- P1-E: "Rewrite" not "Upgrade"; note flat-scoring code path is fully superseded;
  add Hebbian edge traversal in P1-E1; recommended order updated

DESIGN.md:
- SemanticNeighbor: add edge-role distinction table; remove misleading "TSP-ready" comment
- SemanticNeighborSubgraph: inline note that TSP uses Hebbian weights for tour traversal
- Incremental Strategy: Williams-cutoff distance; Daydreamer lazy reconnection

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
… caveat, README must-language

DESIGN.md:
- c definition: synthetic center of mass, typically empty "Kansas space", neutral vantage
  point for unbiased scoring; properly formatted continuation lines
- Step 4: Kansas space explanation in short focused sentences
- Step 5: rewritten as bulleted zone-classification list (thesis/antithesis/synthesis/
  third-region); cross-ref to Dialectical Search; scoring-from-c rationale kept here
- Dialectical Search: removed duplicate "key property" paragraph; merged anchoring-bias
  explanation into single scoring paragraph; zone table uses closer-to-c framing
- Terminology Centroid: multiline, concise Kansas-space explanation

ARCHITECTURE-REVIEW.md:
- Query.ts / QueryResult.ts explicitly flagged "must be substantially rewritten"
- Zero-drift caveat restructured as 4 bulleted points for clarity: what it means,
  what it does not mean, per-file impact, authoritative status pointer (PLAN.md)
- Recommended Fix Order step 4: "Rewrite" not "Upgrade"

README.md:
- Cortex section: "Required behavior (v0.5+ engineering target)" with "must" language
  on every bullet; Kansas space note as sub-bullet of Metroid bullet
- Current behavior relabeled "(v0.1 — placeholder)"

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
@devlux76
Copy link
Copy Markdown
Owner

@copilot This close, but I can see some drift and this is actually not your fault but is coming from my inability to elucidate clearly what I see.

The following is the original actual draft of this part of the spec. I'd prefer not to be sued by Nintendo so I'm not sure it belongs anything verbatim. However, everything needs to align tighter to this as though it were the official words of the spec.

Metroid: A Dialectical Parallel-Search Primitive

A Metroid is a search primitive that transforms potential obstacles into progress, inspired by the mechanics of the Metroid video game. Like the floating jellyfish in the game, obstacles are not enemies, they don't usually swarm and attack. But they represent an antithesis towards progress.

Yet they can also be frozen into usable platforms, turning conceptual opposition into navigable steps toward a goal.

Core Mechanics (one full Metroid step):
1. Thesis — From the current topic vector q_1, perform a medoid search to find m_1, the median representative that keeps the search on the right conceptual road.
2. Freeze — Lock the first n vectors of the current matryoshka hierarchy in place, preserving early decisions as fixed structure (like freezing Metroids into platforms).
3. Antithesis — On the remaining free vectors, compute the opposite of cosine similarity and then find the medoid m_2 of that set. This ensures m_2 represents the closest coherent alternative in the direction of maximal divergence.
4. Synthesis — Compute the centroid C_1 of {m_1, m_2}. This centroid serves as a stable platform for subsequent exploration; it is never recalculated. Future vectors (m_3, m_4, ...) are evaluated relative to this frozen centroid to decide whether:
• They are close enough to C_1 to freeze and take the next conceptual leap, or
• Knowledge is insufficient, in which case a curiosity query is broadcast to gather additional graph fragments.

Iterative Process:
• Repeat thesis–freeze–antithesis–evaluation loops until the “protected dimension” (the coarse grained vector) is reached.
• At any step where a valid thesis or antithesis cannot be found, broadcast the current Metroid tuple as a curiosity query, soliciting additional context to fill knowledge gaps.

Philosophical Foundation:
• Centroids (means) provide gravitational pull; medoids (medians) keep the search on the right road. Alone, neither guarantees epistemic honesty. Combined with the Metroid loop, they create a search process that actively leverages obstacles as stepping stones.
• The Metroid primitive replaces prior sparse nearest-neighbor graph constructions, establishing opposition, divergence, and curiosity-driven augmentation as the canonical mechanism for guided semantic exploration.

This keeps the platform role of C_1 fixed, clarifies that m_2 is the medoid of the cosine-opposite set (not itself the raw cosine-opposite vector), and preserves the dialectical, game-inspired metaphor.

… frozen centroid, loop structure

DESIGN.md:
- The Metroid: added conceptual framing — antithesis medoid (m2) produces the frozen
  centroid (c) which becomes the stable platform for deeper exploration; added
  philosophical foundation (centroid=gravitational pull, medoid=data point anchor;
  neither alone sufficient); Metroid replaces prior sparse NN-graph constructions
- m2 definition: explicit parallel structure with m1; m2 is always an existing memory
  node (medoid of cosine-opposite set), never a phantom computed position
- MetroidBuilder Algorithm: complete rewrite as thesis→freeze→antithesis→synthesis loop
  - Step 1 (Thesis): medoid search for m1 (not centroid, always existing node)
  - Step 2 (Freeze): lock protected Matryoshka dimensions
  - Step 3 (Antithesis): score each candidate as -cosine_similarity in free dims;
    find medoid of top-scoring (cosine-opposite) set — m2 is the medoid, not a
    raw vector negation
  - Step 4 (Synthesis): compute c once and freeze it; never recomputed
  - Step 5 (Evaluate): all subsequent candidates measured against frozen c
  - Steps 6-7: unwind and stop as before, but with frozen c invariant
- Matryoshka Dimensional Unwinding: new candidates evaluated against frozen c,
  not a recomputed centroid; stop on knowledge gap → broadcast curiosity
- Terminology: Metroid and MetroidBuilder entries updated with frozen c and
  cosine-opposite medoid algorithm

TODO.md P1-M:
- Added game-inspired framing (opposition becomes stepping stone via frozen c)
- Step-by-step algorithm: exact formula -cosine_similarity; medoid of top-scoring
  candidates; frozen c never recomputed
- Exit criteria now explicitly mentions frozen centroid invariant
- Updated test cases: test c is frozen; m2 is medoid not vector negation

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 13, 2026

@copilot This close, but I can see some drift and this is actually not your fault but is coming from my inability to elucidate clearly what I see.

The following is the original actual draft of this p...

Updated in commit 62a719e. DESIGN.md and TODO.md now align tightly with the original spec:

m2 algorithm corrected — m2 is the medoid of the cosine-opposite set, not merely a semantically-opposing medoid. The exact formula: score each candidate as -cosine_similarity(candidate_free_dims, m1_free_dims), then find the medoid of the top-scoring candidates. m2 is always an existing memory node, never a computed phantom.

Centroid frozen — c is computed once after the first m1+m2 pair and never recomputed. All subsequent Matryoshka unwind candidates are evaluated relative to this frozen c. The frozen c is the stable platform.

Loop structure — thesis → freeze → antithesis → synthesis; at each further unwind step the new candidate is evaluated against the frozen c (not a shifting target). If no valid m2 is found at any layer, broadcast the curiosity query.

Philosophical foundation — centroid (mean) provides gravitational pull; medoid (median) ensures the anchor is an actual data point, never a phantom averaged position. Neither alone is sufficient for epistemic honesty.

Conceptual framing — the antithesis medoid (m2) produces the frozen centroid (c) which becomes the stable platform for deeper exploration; opposition turned into a navigable step.

Copy link
Copy Markdown
Owner

@devlux76 devlux76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! Thank you!

@devlux76 devlux76 merged commit 15204f9 into main Mar 13, 2026
2 checks passed
@devlux76 devlux76 deleted the copilot/p0-restore-architecture-documentation branch March 13, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P0- We have a fundamental misunderstanding of the design and it lead to drift. Everything is blocked until this is resolved.

3 participants