Conversation
…ODO; delete ERRATA.md Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR integrates the former ERRATA.md “Williams Bound” architecture into the project’s living docs (README/DESIGN/PLAN/TODO) and removes ERRATA.md, making sublinear hotpath growth a first-class design/planning concept for CORTEX.
Changes:
- Added a Williams-bound-based sublinear growth framework (H(t), salience, tier/community quotas, resident-first retrieval, bounded maintenance) to
DESIGN.md. - Rescoped execution sequencing in
PLAN.mdandTODO.mdto put HotpathPolicy/SalienceEngine foundations first, and updated downstream tasks to integrate them. - Updated
README.mddesign principles to reflect the new HOT/WARM/COLD model and Williams-bound constraints; deletedERRATA.md.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| TODO.md | Introduces P0-F/P0-G foundations and threads hotpath/salience/community quota requirements through later phases and tests. |
| README.md | Updates core principles to include Williams-bound sublinear growth + three-zone memory framing. |
| PLAN.md | Updates module status and phases to prioritize HotpathPolicy/SalienceEngine and hotpath-aware storage/query work. |
| DESIGN.md | Adds a comprehensive “Williams Bound & Sublinear Growth” section plus hotpath entities/stores and resident-first query path. |
| ERRATA.md | Removes the now-integrated errata document. |
|
|
||
| ### Policy Source of Truth | ||
|
|
||
| All hotpath constants — `c`, `α`, `β`, `γ`, `q_s`, `q_v`, `q_b`, `q_p` — live in `core/HotpathPolicy.ts` as a frozen default policy object. These are **policy-derived constants** (not model-derived) and are kept strictly separate from `core/ModelDefaults.ts`. A companion guard (or an extension to `guard:model-derived`) prevents these constants from being hardcoded elsewhere. |
There was a problem hiding this comment.
@copilot Please fix as recommended.
There was a problem hiding this comment.
Fixed in 4de4453. DESIGN.md now reads "is planned (see TODO.md P3-E3) to prevent these constants from being hardcoded elsewhere; until that guard is in place, discipline is enforced by convention." The matching note in TODO.md was updated to "P3-E3 will enforce this automatically; until then, enforce by convention."
| This **dual constraint** — tier quota × community quota — ensures both vertical coverage across hierarchy levels and horizontal coverage across topics. | ||
|
|
There was a problem hiding this comment.
@copilot Good catch! Please fix as described using your best reasoning for the domain involved while respecting the constraints imposed by the design.
There was a problem hiding this comment.
Fixed in df00a1c. Replaced max(1, ceil(...)) with a largest-remainder method that guarantees quotas sum to exactly tier_budget regardless of community count or sizes. The four-step allocation is now explicit in the design: compute fractional shares, floor to base allocations, distribute remaining slots to largest-remainder communities, and exclude zero-share communities from that tier. This makes the quota system both implementable and directly unit-testable.
| ### Dynamic Subgraph Expansion Bounds | ||
|
|
||
| The fixed `<30 node` subgraph target is replaced by dynamic formulas that shrink gracefully as the graph grows: | ||
|
|
||
| ``` | ||
| maxSubgraphSize = min(30, ⌊√(t · log₂(1+t)) / log₂(t)⌋) | ||
| maxHops = ⌈log₂(log₂(1 + t))⌉ | ||
| perHopBranching = ⌊maxSubgraphSize ^ (1 / maxHops)⌋ | ||
| ``` |
There was a problem hiding this comment.
@copilot This was an excellent catch! For small corpora we do need to implement something that won't introduce more problems at scale. Clamping ought to do that but let's make sure whatever this solution looks like, that it doesn't interfere with the dynamics at scale. Perhaps a corpora sized based constraint that falls away above the threshold. Use your best judgment here.
There was a problem hiding this comment.
Fixed in 06702d5. Added an effective-mass floor t_eff = max(t, 2) applied before all three formulas, plus max(1, …) secondary guards on maxHops and perHopBranching. At cold-start (t < 2) the formulas evaluate to conservative safe values (1/1/1 — a single-node subgraph, the only sensible result for fewer than two nodes). Once the corpus grows past the floor the clamp becomes completely inactive (t_eff = t for all t ≥ 2), so large-corpus dynamics are unaffected. The valid domain and this fall-away behaviour are now documented inline.
| - `hotpath_index` (resident hotpath entries, keyed by `entityId`) | ||
| - `page_activity` (per-page activity metadata for salience computation) |
There was a problem hiding this comment.
@copilot I'd periodically cache it somewhere so that we handle the situation where the user rebooted the machine. IndexedDB feels like a good place for it. It also means that the daydream module can touch it directly and restart/ reload the process which may be simpler since daydream is the default state.
There was a problem hiding this comment.
Fixed in 2ce4dc6. HotpathEntry now documents its dual role explicitly: (1) the live RAM resident set that every query scans first, and (2) the hotpath_index IndexedDB store which holds a periodic snapshot used to reconstruct the RAM index on startup/reboot — no full corpus replay required. Daydreamer owns the write path to hotpath_index and checkpoints it at the end of each maintenance cycle (LTP/LTD pass), so the persisted snapshot is never more than one cycle stale. The object store description and the Hotpath glossary entry are updated to match.
| - H(t) is monotonically non-decreasing: verify H(t+1) ≥ H(t) for all t | ||
| - Tier quotas sum exactly to capacity: `q_s + q_v + q_b + q_p === 1.0` | ||
| - Community quotas sum to tier budget and each slot ≥ 1 | ||
| - Salience is deterministic for same inputs |
There was a problem hiding this comment.
@copilot Good catch, we should make sure to follow TDD throughout. Nothing scales to infinity but we should always expressly guard against NaN, MAX and MIN and be reasonably thorough in our testing while being reasonable in general about the expectations.
Use your best judgement here and fix it please.
There was a problem hiding this comment.
Fixed in 2905eb9. The test bullets in P0-F2 are now concretely testable and TDD-ready:
- Monotonicity is checked over the finite range
[0, 1, 2, 10, 100, 1_000, 10_000, 100_000]instead of "for all t" - NaN/Infinity/bounds guard added for H(t): edge inputs
t=0,t=1,t=Number.MAX_SAFE_INTEGERmust return a finite integer ≥ 1 - Quota assertion now checks that derived tier-quota counts sum to
capacity, not that ratios sum to 1.0 - Community quota edge cases added:
budget=0, emptysizesarray,budget < sizes.length; result must never contain NaN, Infinity, or negative values - Salience finitude guard added: extreme weight/hit-count inputs must never produce NaN or Infinity
| - **Hebbian & Dynamic** — Connections strengthen and weaken naturally. | ||
| - **Biological Scarcity** — Only a fixed number of active prototypes live in memory. Everything else is gracefully demoted to disk. | ||
| - **Sublinear Growth (Williams Bound)** — The resident hotpath index is bounded to H(t) = ⌈c·√(t·log₂(1+t))⌉ where t = total graph mass (pages + edges). Memory scales sublinearly as the graph grows, trading time for space at a mathematically principled rate. See [`DESIGN.md`](DESIGN.md) for the full theorem mapping. | ||
| - **Three-Zone Memory** — HOT (resident in-memory index, capacity H(t)), WARM (indexed in IndexedDB), COLD (raw bytes in OPFS only). All data is retained locally; zones control lookup cost, not data lifetime. |
There was a problem hiding this comment.
@copilot to me a cold item is one that exists in the graph and perhaps has prev/next nodes connecting it to other nodes to form a book but is otherwise an island unto itself. It can only be discovered during a random walk because at insertion it had no nearest neighbors in vector space.
This might need to be made explicit, cold means far away from the hot path of search and unlikely to be discovered except by a random walk.
There was a problem hiding this comment.
Fixed in 5d1abcb. The COLD definition in README.md now reads:
COLD (metadata in IndexedDB + raw vectors in OPFS, but semantically isolated from the search path — no strong nearest neighbours in vector space at insertion time; only discoverable by a deliberate random walk)
Also updated "zones control lookup cost, not data lifetime" → "zones control lookup cost and discoverability, not data lifetime" to make the random-walk-only nature explicit at the principle level.
| - **TDD Approach:** Write failing tests first, then implement to green. | ||
| - **Documentation Sync:** Update PLAN.md module status as tasks complete. | ||
| - **Williams Bound Invariant:** The resident count must never exceed H(t). Every test that touches the hotpath should assert this. | ||
| - **Policy constants:** Never hardcode hotpath constants outside `core/HotpathPolicy.ts`. The guard in P3-E3 enforces this. |
There was a problem hiding this comment.
@copilot agreed, please use your best judgement and fix it.
There was a problem hiding this comment.
This was already fixed in 4de4453. The note in TODO.md now reads: "P3-E3 will add a guard to enforce this automatically; until then, enforce by convention." — future tense throughout, no implication the guard currently exists.
Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
…h exact sum guarantee Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
… and domain documentation Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
…rsistence checkpoint Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
…e, NaN/edge guards Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
…ed, random-walk-only discoverable Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com>
max(1, ceil(...))community quota formula with largest-remainder method that guarantees quotas sum exactly totier_budget; excluded zero-share communities explicitlyt_eff = max(t, 2)bootstrap floor andmax(1, …)secondary guards; documented valid domain, cold-start safe values, and that the clamp is inactive at scaleHotpathEntrydual role: live RAM resident index and periodic IndexedDB persistence checkpoint; Daydreamer owns write path; startup reconstructs RAM index from IndexedDB without full corpus replay; updated Hotpath glossary entry to match[0,1,2,10,100,1_000,10_000,100_000]; quota assertions check derived counts sum to capacity (not ratios sum to 1.0); added NaN/Infinity/edge-case guards for H(t), tier quotas, community quotas, and salienceOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.