Skip to content

Integrate HierarchyBuilder into ingestion, add hierarchical query routing, enforce Williams-bound consistency#96

Merged
devlux76 merged 11 commits intomainfrom
copilot/audit-design-docs-implementation
Mar 14, 2026
Merged

Integrate HierarchyBuilder into ingestion, add hierarchical query routing, enforce Williams-bound consistency#96
devlux76 merged 11 commits intomainfrom
copilot/audit-design-docs-implementation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 14, 2026

Original prompt

This section details on the original issue you should resolve

<issue_title>Design alignment review: audit docs vs implementation</issue_title>
<issue_description>### Goal
Audit the CORTEX design docs (wiki + DESIGN.md) against the current implementation and produce a concrete backlog of issues to fix any divergences.

Deliverables

  • A complete mapping of design claims → code implementation
  • A prioritized list of gaps (missing features, mismatches, outdated docs)
  • A set of linked subtasks (issues) that implement the fixes

Subtask Issues

Initial checklist

  • Audit ingestion/hierarchy (Books/Volumes/Shelves)
  • Audit retrieval pipeline (dialectical Metroid + subgraph + coherence)
  • Audit consolidation (Daydreamer throttles, LTP/LTD, recalc budgets)
  • Audit storage & sharing (OPFS integrity, MIME/model checks, eligibility)
  • Audit policy usage (Williams bounds, model-derived numerics, performance)
  • Create linked subtask issues for each major gap

Notes

This issue is the parent for the design alignment work; child issues should include Part of devlux76/cortex#89 references to this issue and be tracked as subtasks.
</issue_description>

<agent_instructions>The goal here is to finish the complete implementation of the library side of things. Design docs are now at https://github.com/devlux76/cortex/wiki please make sure to close each subissue as you go along. </agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits March 14, 2026 11:13
Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
…d references from neighbor graph code

Closes #92

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
@github-actions github-actions bot added the layer: testing Test coverage and integration tests label Mar 14, 2026
Copilot AI and others added 6 commits March 14, 2026 11:17
ingestText() now calls buildHierarchy() after page creation to produce
the full Books→Volumes→Shelves hierarchy on every ingest call.
IngestResult extended with books[], volumes[], shelves[] fields.

Closes #90

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
…peline

Query.ts now routes through Shelf→Volume→Book→Page hierarchy using
rankShelves/rankVolumes/rankBooks before flat page scoring. Combines
hierarchy-discovered pages with hotpath pages for comprehensive results.

Closes #91

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
- FullNeighborRecalc now derives maxNeighbors from computeNeighborMaxDegree
  instead of using a fixed default of 16
- Query.ts uses getResidentCount() as proxy for corpus size to avoid
  scanning all pages on the hot path for computeSubgraphBounds

Closes #93

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
…ms-derived max degree

HebbianUpdater.decayAndPrune now derives maxDegree from
computeNeighborMaxDegree(graphMass) instead of a fixed 16, matching
the design requirement for Williams-bound-driven edge pruning.

Closes #95

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
Check off all implemented and tested items across Hippocampus (HierarchyBuilder,
FastNeighborInsert, Ingest), Cortex (Ranking, MetroidBuilder, KnowledgeGapDetector,
OpenTSPSolver, Query, QueryResult), Daydreamer (ExperienceReplay, ClusterStability),
integration tests, benchmarks, CI hardening, documentation, and product surfaces.

Update section headings with ✅ COMPLETE markers and refresh the summary table
to reflect v0.1/v0.5/v1.0 phases as complete. Only P3-A (WebGL provider) and
P3-H (GitHub sync smoke test) remain pending.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…implementation

Update PLAN.md to reflect full completion of Hippocampus (100%),
Cortex (100%), and all documentation sections. Update TODO.md
checkboxes for all completed P1/P2/P3 items. Fix HierarchyBuilder
test to expect volumes/shelves from ingestText.

Closes #94

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
Copilot AI changed the title [WIP] Audit CORTEX design docs against current implementation Integrate HierarchyBuilder into ingestion, add hierarchical query routing, enforce Williams-bound consistency Mar 14, 2026
Copilot AI requested a review from devlux76 March 14, 2026 11:38
@devlux76 devlux76 marked this pull request as ready for review March 14, 2026 12:33
Copilot AI review requested due to automatic review settings March 14, 2026 12:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns implementation with the design across ingestion, querying, consolidation budgets, and semantic-neighbor naming by integrating hierarchy construction into ingest, adding hierarchical routing to queries, and removing remaining “metroid neighbor” terminology.

Changes:

  • Update ingest to build full Pages→Books→Volumes→Shelves hierarchy and return it in IngestResult.
  • Add Shelf→Volume→Book→Page routing in query() and avoid full-corpus scans for subgraph bounds by using resident count.
  • Replace hardcoded neighbor degree defaults with Williams-derived computeNeighborMaxDegree(...), and clean up semantic-neighbor naming in code/tests + sync plan/todo docs.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/sharing/SubgraphExchange.test.ts Renames mock neighbor map from metroidNeighbors to semanticNeighbors.
tests/integration/IngestQuery.test.ts Updates integration assertions for multi-book ingest + persisted volumes/shelves.
tests/hippocampus/Ingest.test.ts Adjusts ingest unit test expectations for hierarchy output.
tests/hippocampus/HierarchyBuilder.test.ts Updates hierarchy-builder ingest expectations to include volumes/shelves and multi-book coverage.
tests/daydreamer/PrototypeRecomputer.test.ts Renames mock neighbor map to semanticNeighbors.
tests/daydreamer/HebbianUpdater.test.ts Renames mock neighbor map to semanticNeighbors.
tests/daydreamer/FullNeighborRecalc.test.ts Renames mock neighbor map + sync accessor naming.
tests/daydreamer/ClusterStability.test.ts Updates comments/stubs to semantic-neighbor naming.
tests/cortex/Query.test.ts Adds coverage ensuring query works with hierarchy-based routing present.
tests/SalienceEngine.test.ts Removes obsolete Metroid-neighbor stub methods from mock store.
package-lock.json Locks added dev deps (@vitest/coverage-v8, husky) already reflected in package.json.
lib/storage/IndexedDbMetadataStore.ts Updates v3 store comment to semantic-neighbor wording.
lib/sharing/SubgraphExporter.ts Renames local variables/comments to “semantic neighbors”.
lib/hippocampus/Ingest.ts Changes ingest to insert semantic neighbors + build hierarchy; expands IngestResult.
lib/daydreamer/HebbianUpdater.ts Derives maxDegree from Williams bounds when not provided.
lib/daydreamer/FullNeighborRecalc.ts Derives maxNeighbors from Williams bounds when not provided.
lib/daydreamer/ClusterStability.ts Renames documentation/comments to semantic-neighbor terminology.
lib/cortex/Query.ts Implements hierarchical routing and uses resident count for Williams-derived subgraph bounds.
TODO.md Updates completion status and “Last Updated” date.
PLAN.md Updates plan status text to reflect completed modules and corrected terminology.

Comment on lines +96 to +112
@@ -149,8 +102,20 @@ export async function ingestText(
metadataStore,
});

// Run hotpath promotion for the newly ingested pages and book.
await runPromotionSweep([...pageIds, bookId], metadataStore);
// Build the full hierarchy: Pages → Books → Volumes → Shelves.
// buildHierarchy handles medoid selection, adjacency edges, prototype
// computation, Williams fanout enforcement, and promotion sweeps.
const hierarchy = await buildHierarchy(pageIds, {
modelProfile,
vectorStore,
metadataStore,
});
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot use your best judgement to resolve the issue in accordance with the DESIGN to the best of your abilities even if that means the DESIGN will need to be updated as well. Try to solve the problem while minimizing downstream and knock-on effects.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4335f67. Swapped the ordering so buildHierarchy() runs before insertSemanticNeighbors() — this ensures books/volumes (and their reverse indexes) exist when the neighbor inserter traverses page→book→volume to flag dirty volumes via flagVolumeForNeighborRecalc(). Also addressed the other review items in the same commit: renamed the misleading hierarchyPageIds in Query.ts to per-tier variables (volumeIdsFromShelves, bookIdsFromVolumes, pageIdsFromBooks) and strengthened the integration test to enforce exactly-once book membership.

Comment on lines +61 to +116
if (hotpathShelfEntries.length > 0) {
const topShelves = await rankShelves(
queryEmbedding,
hotpathShelfEntries.map((e) => e.entityId),
Math.max(2, Math.ceil(hotpathShelfEntries.length / 2)),
rankingOptions,
);
for (const s of topShelves) {
const shelf = await metadataStore.getShelf(s.id);
if (shelf) {
for (const vid of shelf.volumeIds) hierarchyPageIds.add(vid);
}
}
}

// Rank volumes — include both hotpath volumes and those found via shelf drill-down
const volumeCandidateIds = new Set<Hash>([
...hotpathVolumeEntries.map((e) => e.entityId),
...hierarchyPageIds,
]);
hierarchyPageIds.clear();

if (volumeCandidateIds.size > 0) {
const topVolumes = await rankVolumes(
queryEmbedding,
[...volumeCandidateIds],
Math.max(2, Math.ceil(volumeCandidateIds.size / 2)),
rankingOptions,
);
for (const v of topVolumes) {
const volume = await metadataStore.getVolume(v.id);
if (volume) {
for (const bid of volume.bookIds) hierarchyPageIds.add(bid);
}
}
}

const hotResults = await rankPages(queryEmbedding, hotpathIds, topK, rankingOptions);
// Rank books — include both hotpath books and those found via volume drill-down
const bookCandidateIds = new Set<Hash>([
...hotpathBookEntries.map((e) => e.entityId),
...hierarchyPageIds,
]);
hierarchyPageIds.clear();

if (bookCandidateIds.size > 0) {
const topBooks = await rankBooks(
queryEmbedding,
[...bookCandidateIds],
Math.max(2, Math.ceil(bookCandidateIds.size / 2)),
rankingOptions,
);
for (const b of topBooks) {
const book = await metadataStore.getBook(b.id);
if (book) {
for (const pid of book.pageIds) hierarchyPageIds.add(pid);
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot use your best judgement to resolve the issue in accordance with the DESIGN to the best of your abilities even if that means the DESIGN will need to be updated as well. Try to solve the problem while minimizing downstream and knock-on effects.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 91b10f4. Ranking functions (rankShelves/rankVolumes/rankBooks/rankPages/spillToWarm) now return RankedResult { id, score, childIds } — child IDs are captured during the scoring pass and returned alongside the score. Query.ts uses s.childIds/v.childIds/b.childIds directly from ranking results instead of re-fetching entities from IDB.

Comment on lines +57 to +82
// Collect candidate page IDs from hierarchical routing.
const hierarchyPageIds = new Set<Hash>();

// Shelf → Volume → Book → Page drill-down
if (hotpathShelfEntries.length > 0) {
const topShelves = await rankShelves(
queryEmbedding,
hotpathShelfEntries.map((e) => e.entityId),
Math.max(2, Math.ceil(hotpathShelfEntries.length / 2)),
rankingOptions,
);
for (const s of topShelves) {
const shelf = await metadataStore.getShelf(s.id);
if (shelf) {
for (const vid of shelf.volumeIds) hierarchyPageIds.add(vid);
}
}
}

// Rank volumes — include both hotpath volumes and those found via shelf drill-down
const volumeCandidateIds = new Set<Hash>([
...hotpathVolumeEntries.map((e) => e.entityId),
...hierarchyPageIds,
]);
hierarchyPageIds.clear();

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot use your best judgement to resolve the issue in accordance with the DESIGN to the best of your abilities even if that means the DESIGN will need to be updated as well. Try to solve the problem while minimizing downstream and knock-on effects.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 4335f67hierarchyPageIds was renamed to per-tier variables: volumeIdsFromShelves, bookIdsFromVolumes, pageIdsFromBooks. Each variable now clearly indicates its tier.

Comment on lines +427 to +431
// Every page must belong to exactly one book
const allBookPageIds = result.books.flatMap((b) => b.pageIds);
for (const page of result.pages) {
expect(storedBook!.pageIds).toContain(page.pageId);
expect(allBookPageIds).toContain(page.pageId);
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot it makes more sense if "exactly" is replaced with "at least". There's a small albeit non-zero chance that we hit the same page across multiple books. For instance legal corpora are famously long and duplicative and full of cross citations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 91b10f4. Changed the test comment to "Every page must belong to at least one book" and removed the exactly-once membership assertion. Pages can now appear in multiple books without failing the test.

@github-actions github-actions bot added the layer: documentation API docs, developer guide, architecture diagrams label Mar 14, 2026
…line

Swap the ordering so that Books/Volumes/Shelves (and their reverse
indexes) exist before semantic neighbor insertion attempts to flag
dirty volumes via page→book→volume traversal.

Also: rename misleading `hierarchyPageIds` in Query.ts to per-tier
variables (`volumeIdsFromShelves`, `bookIdsFromVolumes`,
`pageIdsFromBooks`) and strengthen the integration test assertion
to enforce exactly-once book membership.

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
…book membership assertion

Ranking functions (rankShelves/rankVolumes/rankBooks/rankPages/spillToWarm)
now return RankedResult { id, score, childIds } so Query.ts can use child
IDs from the ranking pass directly instead of re-fetching entities from IDB.

Test assertion for page-to-book membership relaxed from "exactly one" to
"at least one" per user feedback — pages can legitimately appear in
multiple books (e.g., legal corpora with cross-citations).

Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
@devlux76 devlux76 merged commit 55d6249 into main Mar 14, 2026
3 checks passed
@devlux76 devlux76 deleted the copilot/audit-design-docs-implementation branch March 14, 2026 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: documentation API docs, developer guide, architecture diagrams layer: testing Test coverage and integration tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design alignment review: audit docs vs implementation

3 participants