feat: Column H — EntityTypeId on BindSpace (Phase 1 of 4)#272
feat: Column H — EntityTypeId on BindSpace (Phase 1 of 4)#272AdaWorldAPI merged 1 commit intomainfrom
Conversation
Foundry Vertex "Object Type" equivalent. Per-row entity type binding in the BindSpace SoA, enabling type-filtered queries without schema re-parsing. D-H1: `EntityTypeId = u16` + `entity_type_id(ontology, name) -> u16` in contract::ontology. 1-based index into Ontology.schemas. 0 = untyped. D-H2: `entity_type: Box<[u16]>` field on BindSpace SoA. +2 bytes/row (71774 → 71776 footprint for 1 row). D-H3: `BindSpaceBuilder::push_typed()` writes entity_type per row. `push()` defaults to 0 (untyped) for backward compat. D-H4: 4 tests (entity_type defaults to 0, set/get, builder push_typed, contract entity_type_id 1-based lookup). All pass. Phase 1 complete per bindspace-columns-v1.md §5. Unblocks: LF-22 ObjectView usage, LF-40 type-filtered search. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 117a76a5ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub fn entity_type_id(ontology: &Ontology, name: &str) -> EntityTypeId { | ||
| ontology.schemas.iter() | ||
| .position(|s| s.name == name) | ||
| .map(|idx| (idx + 1) as EntityTypeId) |
There was a problem hiding this comment.
Prevent EntityTypeId wraparound on large ontologies
Casting (idx + 1) to u16 here will silently wrap once ontology.schemas.len() >= 65_536, which can turn a real schema into 0 (the reserved “untyped” sentinel) or collide with another type ID. In those large-ontology cases this corrupts row typing semantics in BindSpace and makes type-based filtering unreliable; this lookup should detect overflow and fail explicitly (or otherwise enforce the maximum schema count) instead of truncating.
Useful? React with 👍 / 👎.
Three artifacts in one commit: 1. Post-merge governance for #352 (lance-graph-ontology v5 + ogit-cascade v1 plans, merged 2026-05-07 as 8e2f088): - PR_ARC_INVENTORY.md prepended with full Added/Locked/Deferred/Docs entry; Confidence line updatable. - LATEST_STATE.md table prepended; "Last updated" refreshed to 2026-05-07. 2. New plan: .claude/plans/palantir-parity-cascade-v2.md (262 lines). Integration capstone over 4 prior Foundry parity docs and v1 cascade. Pillar 0 carry-forward: Foundry parity IS SoA-as-canon parity. Column H (PR #272 SHIPPED) is already the Foundry Object Type bridge; v2 makes the SoA carry the Foundry-equivalent shape. 15 deliverables, top-3 ship with this plan (V2-1 ledger, V2-2 triangle, V2-3 BusDto bridge). Business Logic ↔ Thinking-style ↔ OGIT triangle introduced as routing knowledge artifact. 3. New knowledge doc: .claude/knowledge/soa-dto-dependency-ledger.md (210 lines). Append-only entropy table of 22 DTOs across 4 tiers (sensor → engine → contract → callcenter). Three classifications: bare-metal (9), SoA-glue (7), bridge-projection (6, with 3 OPEN re-classifications). Internal vs external O(1) mapping diagrams. Codec cascade column status: all 8 cascade columns OPEN, current registry uses (bridge_id, public_name) tuples + ogit_uri hashing per 2026-05-07 audit. Probe queue with pass criteria for D-CASCADE- V1-1/7/11 + D-PARITY-V2-3/10. Maintenance protocol attached. Findings driving the artifacts: - StreamDto, ResonanceDto, BusDto all live in thinking-engine::dto.rs (Tier 0/1/2), upstream of contract. - ResonanceDto IS the SoA (4096 ripple energies), not a glue layer. - OntologyRegistry has NO codec cascade columns today; D-CASCADE-V1-7 is the wiring deliverable. - Foundry parity has 5+ prior docs; v2 integrates, does not duplicate. Append-only governance honored on PR_ARC, LATEST_STATE, INTEGRATION_PLANS (prepend only; no past entries edited). Layer-2 AGENT_LOG.md (gitignored) will carry the entry post-push. https://claude.ai/code/session_01WevBiZ3jzVocu8fBpTY8sq
Third addendum, written after actually loading Grok's bundle and the relevant source files into one mental space, the way the prompt asked for from the start. The audit shape used file-by-file slice reads; this addendum used full-file parallel reads with the bundle held together. The output is a different topology, not just additional findings. Substantive observations the audit and Grok's pass both missed: 1. AwarenessPlane16K already exists, six channels not one. Shipped 2026-05-06 in crates/lance-graph-contract/src/splat.rs. Support / Contradiction / Forecast / Counterfactual / Style / Source. Forecast and Counterfactual are scenario-only and explicitly cannot promote ontology facts. Grok's single-channel AwarenessColumn undershoots — the workspace already separates "I am believing" from "I am imagining" at the substrate level. 2. The deposition kernel is geometric: (center_a << 8) ^ center_b mod 16384. TriadicProjection (S/P, P/O, S/O) selects which Pearl-2³ lens produced the codebook pair. Pearl 2³ as parallel dimensions is already implemented at the splat level via projection-byte addressing, spreading evidence across multiple Pearl-aware addresses. 3. Schema-as-MUL-priors lives in DomainProfile per StepDomain. Medcare = 0.92 + Human + fail-closed + HIPAA 6-year retention; SMB = 0.75 + Llm + commerce-grade. Unit-tested invariants. The "ontology-aware MUL trust thresholds" TODO at bindspace.rs:191-198 is exactly the missing wire between EntityTypeId (Column H, shipped PR #272) and DomainProfile. 4. Investigation-as-substrate-traversal is concrete: anchor by entity_type → traverse CausalEdge64.forward() chains accumulating into cycle_fingerprint → deposit splats per channel during traversal → read CamSplatCertificate at stabilization → emit four-way SplatDecision (Proceed / RequireExactReplay / PrefetchOnly / ScenarioOnly). Forecast and Counterfactual channels are how the substrate runs hypothetical investigations without committing to facts — this IS the preemption framing Grok proposed, with sharper semantics. 5. The 3-byte polyglot tag is dialect:u8 (CognitiveEventRow membrane projection) + MetaWord.thinking:6 + MetaWord.nars_f:8 — already produced, scattered across two structs that compose at the membrane boundary. The work is composing them as one queryable predicate, not inventing a new tag. 6. CrystalFingerprint is a multi-carrier enum carrying Binary16K / Structured5x5 / Vsa10kI8 / Vsa10kF32 / Vsa16kF32 deliberately. The user's "Vsa10000 deprecated" correction is per-call-site routing, not deletion: each carrier has a purpose (Vsa10000 for Markov bundling, Vsa16kF32 for collapse-gate + cycle column, Binary16K for bit-deposition splat planes, CAM-PQ for compressed search). The TECH_DEBT entry needs reframing as N decisions, not one migration. 7. DEBUG-STRINGIFY-1 (entropy 5): 35 format!("{:?}", logical_plan) sites read DataFusion LogicalPlan Debug as a stable surface workspace-wide. Hot-path Cypher migration is sized 1-2x larger than Grok's Phase 2 plan accounts for, because the typed-visitor work to eradicate the 35 sites is comparable in size to the parser wiring itself. 8. The substrate shape is six operations on one SoA, each operation a different addressing of the same rows: cognition (cycle column + MetaColumn + CausalEdge64.forward()); memory (entity_type / temporal / cycle_fp_hi-lo); imagination (SplatChannel Forecast/Counterfactual); awareness (AwarenessPlane16K six-channel); external traffic (CognitiveEventRow scalar projection across BBB); audit (step_trajectory_hash). One SoA. Six operations. Each operation is a different way of *reading* the same rows; writes stay narrow via CollapseGate. This is not "BindSpace + accessories"; it's an SoA deliberately designed so every cognitive operation is an addressing mode rather than a layer. The single most valuable observation: the workspace's own ARCHITECTURE_ENTROPY_LEDGER already classifies 22 DTOs across 4 tiers with concrete spaghetti clusters and per-row entropy scores. Grok's pass and my audit independently re-derived smaller versions of this ledger. The right framing for the next session is "complete the seams between structures that already exist" — sized as the high-entropy rows in the existing ledger — not "add the proposed structures." The substrate is more complete than either external pass realized. Also: removed the .grok/ checkout from the working tree (per user's note that the prefix is distinctive enough that bleed risk is low; the files live on origin/main, no need to mirror them on the feature branch). https://claude.ai/code/session_01WevBiZ3jzVocu8fBpTY8sq
Summary
Phase 1 of the BindSpace Columns E/F/G/H integration plan (PR #271).
Adds Column H (EntityTypeId) — the Palantir Vertex "Object Type" equivalent.
Deliverables
EntityTypeId = u16type alias +entity_type_id(ontology, name) -> u16function incontract::ontology. 1-based index intoOntology.schemas, 0 = untyped.entity_type: Box<[u16]>field onBindSpaceSoA. +2 bytes/row.BindSpaceBuilder::push_typed()writes entity_type per row. Existingpush()defaults to 0 for backward compat — no breaking change.What it changes
BindSpace::byte_footprint(): 71774 → 71776 per row (+2 bytes = 0.003%)BindSpaceBuilder::push(): unchanged (passes entity_type=0)BindSpaceBuilder::push_typed()with explicit entity_typeBrutal honest review
What's good:
entity_type_id()is a simple position lookup — no complexity hidden behind the API.push()→push_typed()delegation keeps all existing callers working.What's not great:
entity_type_id()does a linear scan ofOntology.schemas— O(N) per lookup. Fine for N < 100 schemas, but if someone has 1000+ entity types this becomes a problem. Should be aHashMap<&str, EntityTypeId>cache onOntology. Not worth optimizing now (N is ~10 for SMB), but flagged.dispatch()step that writes entity_type into the SoA (D-H3 in the plan) is NOT wired yet — this PR adds the FIELD but not the dispatch-time write. That's Phase 2 territory because it requires knowing whichOntologySpecthe current triplet matches, which is the novel-pattern-detection logic in D-E3.Bottom line: Phase 1 is the foundation. Column H exists, has tests, doesn't break anything. The interesting work (dispatch-time type binding) is Phase 2.
Test plan
cargo checkworkspace cleanpush()still works without entity_type arghttps://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Generated by Claude Code