Enrich ARCHITECTURE.md with detailed verification precedents by danbri · Pull Request #3 · danbri/factoidal

danbri · 2026-03-05T21:05:06Z

No description provided.

Systematic comparison of every type, constraint, and operation in the F* specification with the corresponding JavaScript implementation in docs/index.html and docs/tests.html. https://claude.ai/code/session_01V26rLo7FkbPghV7HUZLoNc

… suites Covers: JS vs WASM correctness comparison, F*→WASM extraction via KaRaMeL, browser WASM integrity (SRI/CSP/signatures), W3C RDF/SPARQL 1.1 test suite structure, and how persistence (SQLite/PG/binary/HTTP) fits with F* formalities. https://claude.ai/code/session_01V26rLo7FkbPghV7HUZLoNc

Add EverParse production deployment details, Benzaken SQL/Coq verification (directly adaptable to SPARQL), verified B+-trees, FSCQ/Argosy crash safety, and Oxigraph architecture details as closest Rust prior art. https://claude.ai/code/session_01V26rLo7FkbPghV7HUZLoNc

Copilot

Pull request overview

This PR adds two new documentation files to the Factoidal repository: ARCHITECTURE.md (a comprehensive architecture analysis and roadmap covering correctness comparisons between JS, Rust/WASM, and F* implementations, WASM signing/verification, W3C test integration, and persistence strategies) and docs/GROUNDING_ANALYSIS.md (a detailed analysis of how the JS demos map to the F* formal specification).

Changes:

New ARCHITECTURE.md with four sections analyzing implementation correctness, WASM signing, W3C test suite integration plans, and persistence strategies with verified systems precedents
New docs/GROUNDING_ANALYSIS.md with a type-by-type mapping of F* formal definitions to JS implementations, plus well-formedness constraint analysis and test coverage summary

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
ARCHITECTURE.md	New architecture analysis document covering JS vs Rust/WASM correctness, F*→WASM extraction pipeline, W3C test suite integration proposal, and persistence roadmap with verified systems references
docs/GROUNDING_ANALYSIS.md	New grounding analysis documenting how JS demo types and constraints map to the F* formal specification, including extensions, divergences, and test coverage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T21:09:30Z

@@ -0,0 +1,224 @@
+# Factoidal: Architecture Analysis & Roadmap


The PR title says "Enrich ARCHITECTURE.md with detailed verification precedents," implying this is an update to an existing file. However, this is a brand-new 224-line file. Additionally, the PR also adds docs/GROUNDING_ANALYSIS.md (74 lines), which is not mentioned in the title at all. The title should accurately reflect the scope of this PR.

Copilot · 2026-03-05T21:09:30Z

+
+**For the Rust/WASM path specifically:**
+- **Oxigraph** (Rust, SPARQL-compliant, RocksDB backend) is the closest prior art:
+  - 11 key-value tables: `id2str` mapping + 9 quad index orderings (SPO, SOP, PSO, etc.)


Minor arithmetic inconsistency: "11 key-value tables: id2str mapping + 9 quad index orderings" adds up to 10, not 11. Either the count is wrong or there's an additional table not mentioned. Please verify and correct.

Suggested change

- 11 key-value tables: `id2str` mapping + 9 quad index orderings (SPO, SOP, PSO, etc.)

- 10 key-value tables: `id2str` mapping + 9 quad index orderings (SPO, SOP, PSO, etc.)

Copilot · 2026-03-05T21:09:31Z

+`tests.html` provides 31 test cases covering every formal constraint:
+- IRI well-formedness (6 tests)
+- BNode identity (4 tests)
+- Literal construction and well-formedness (9 tests)
+- Triple serialization (1 test)
+- Graph operations including bnodes (9+ tests)


The document states "tests.html provides 31 test cases" but the actual file contains 43 individual test assertions (6 IRI + 4 BNode + 14 Literal + 1 Triple + 16 Graph operations + 2 Cross-type equality). The breakdown also understates Literal tests as "9 tests" when there are actually 14 (the 4 literal equality tests on lines 206-209 of tests.html appear to be missing from the count). Please recount and correct the total and per-category numbers.

Suggested change

`tests.html` provides 31 test cases covering every formal constraint:

- IRI well-formedness (6 tests)

- BNode identity (4 tests)

- Literal construction and well-formedness (9 tests)

- Triple serialization (1 test)

- Graph operations including bnodes (9+ tests)

`tests.html` provides 43 test cases covering every formal constraint:

- IRI well-formedness (6 tests)

- BNode identity (4 tests)

- Literal construction and well-formedness (14 tests)

- Triple serialization (1 test)

- Graph operations including bnodes (16 tests)

…umentation Layer 1 (rule, prevents recurrence): - CLAUDE.md rule #11: experimental_ocaml_glue/ patches MUST NOT override F* runtime functions with semantic re-implementations. Three explicit allow-cases (assume val realisations, companion- file writers, trivial dispatch shims). Forbidden: replacing F* function bodies, adding pruning/optimisation/planning logic in OCaml, or "Shape A" agent prompts that bypass F*. - Top-level Claude is responsible for enforcing this in every agent dispatch. The drift between 2026-04-25 and 2026-04-26 happened because I (top-level) wrote prompts with "Shape A vs Shape B" clauses giving each agent permission to add OCaml-side logic. The cumulative effect was ~3000 LoC of OCaml shims replacing F* runtime. The verified-extraction story collapsed. Layer 2 plan (unwind): - docs/designissues/fstar-purity-unwind.md: inventory of 9 override patches (3094 LoC total) with phased retirement plan 2.2 -> 2.8 in agent-pace (~5 days, not 3 human-weeks). - First instrumentation: Util.Log F* module. - F* assume val emit + level/module/message API; pure-spec Tot unit; OCaml realisation in experimental_ocaml_glue/util_log_runtime.sh (compliant with rule #3: pure I/O, realises an assume val). - Industry-conventional: ISO-8601 UTC timestamps, RFC-5424-ish levels (Error/Warn/Info/Debug/Trace), env-var configured (FACTOIDAL_LOG_LEVEL, FACTOIDAL_LOG_FILE). - Call sites in F* SPARQL11.Store.choose_best_tp_backend + estimate_tp_backend_mu emit decisions at LL_Debug / LL_Info. This is the ground-truth instrumentation that Pe5's --explain reimplementation should have called instead of duplicating the planner in OCaml. Layer 1 ancillary: - data/logs/ added to .gitignore — gitignored runtime artefacts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- SPARQL.JSON.Escape.fst: add verification-surface comment noting fs_byte_at / fs_byte_length are assume-val primitives realised in patch 89 (review comment #5). - factoidal_http.ml: reword 'F*-verified' to 'F*-extracted' (review comment #4) — narrower claim, matches the actual trust boundary. Not addressing in this PR: - review #1 (perf: list/rev/string_of_list vs Buffer): byte-for-byte parity was the goal; perf is a separate concern. List the perf optimization as a follow-up issue if it shows up in profiling. - reviews #2/#3 (extracted .ml not committed → STEP=compile brittle): consistent with project-wide pattern; .ml files are gitignored and regenerated by build-ocaml.sh extract. Out of scope for this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds formal/fstar/SHACL.Validation.fst as the Phase 1 skeleton for the W3C SHACL Core sister track (epic #181). This commit is just the F* source + glue patch stub; a follow-up wires the build. Surface coverage (AST only): - Targets: T_Class, T_Node, T_SubjectsOf, T_ObjectsOf, T_ImplicitClass, T_Sparql. - Shapes: a single `shape` record covers NodeShape / PropertyShape (distinguished by `is_property` + `shape_path`). - Constraint components: 25 cases covering cardinality, datatype, nodeKind, class, in, hasValue, pattern, length, range, logical operators (Not / And / Or / Xone), property pairs, closed shapes, and sh:select. - Property paths: predicate, inverse, sequence, alternative, zero-or-more, one-or-more, zero-or-one. - Validation report: violation + validation_report records. Entry points (assume val, rule #11(c) host call-out): - validate : data -> shapes -> ML report - parse_shape_from_graph (Phase 2 makes this Tot) - eval_sparql_target_select (the long-term host call-out) The corresponding glue patch lives at formal/fstar/minimal_regrettable_glue_code_each_with_an_open_issue/ 181_shacl_validate_stub.sh and is tracked by issue #181. Phase 1 patch is a no-op idempotent acknowledgement that injects a marker comment into the extracted SHACL_Validation.ml — its purpose is to discharge CLAUDE.md rule #3 (every assume val has a glue file) without sneaking semantic logic into OCaml (rule #11). Phase 2 shrinks this to wire only eval_sparql_target_select. Verified clean under F* 2025.12.15 + z3 4.13.3, no --lax. Out of scope: build wiring (next commit), parse implementation, validation evaluator, W3C test corpus runner. https://claude.ai/code/session_01Gid59KLYfGtaXLNfFWnPt1

…patch The F* rewrite_query_bnodes_pattern function (SPARQL11.Algebra.fst line 3648) already implements blank-node-to-variable rewriting for entailment regimes — it just wasn't wired in the runner. The runner duplicated the same logic inline 7 times. This commit: 1. Replaces all 7 inline OCaml rewrite blocks (~210 lines) with a single call: { query with q_pattern = SPARQL11_Algebra.rewrite_query_bnodes_pattern query.q_pattern } 2. Deletes 53_blank_node_variable_rewriting.sh entirely (the patch's job was to inject the same inline OCaml — now obsolete). Section E retirement #3 (after #95 + #66 yesterday). Patches in minimal_regrettable_glue_code_each_with_an_open_issue/: Start of yesterday: 15 After #95: 14 (56108f9) After #66: 13 (07638ee) After #53: 12 (this commit) Section E retirements: 3 / 6 confirmed VIOLATIONs cleared. The F* function `rewrite_query_bnodes_pattern` was already in SPARQL11.Algebra.fst since an earlier session — apparently written but never wired. The OCaml inline version was kept as a parallel implementation. Removing it gives a single F*-verified path; the rewrite is now sound by construction (F* type checks the AST traversal). Refs: #53, #200 (Section E), #237 (recovery workstream). https://claude.ai/code/session_01Gid59KLYfGtaXLNfFWnPt1

…#200 PR5, #67) Migrates the byte-walking validation logic out of formal/fstar/minimal_regrettable_glue_code_each_with_an_open_issue/ 67_rdfxml_validation.sh into pure-F\* `XML.Wellformedness.fst`. What moved to F\*: - `is_valid_ncname` — XML 1.1 NCName production (NameStartChar followed by NameChar*); codepoint-aware via FStar.Char.char. - `forbidden_node_element_names` / `forbidden_property_element_names` — RDF/XML §7.2.11 / §7.2.12 forbidden-name lists. - `validate_rdf_id_attr` — checks any rdf:ID / rdf:nodeID attribute values are valid NCNames; returns option string. - `check_conflicting_attrs_common` / `_node` / `_property` — RDF/XML §7.2.4..7.2.10 mutual-exclusion rules. F\* verification clean (z3 4.13.3, no --lax / no --admit_smt_queries). What stays in OCaml (rule-#11(a) glue): - `Rdfxml_error` exception (the existing call sites in process_node_element / process_property_element raise it). - `attrs_to_pairs` — converts `Parser_XML.xml_attribute list` to the F\*-shaped `(string * string) list` (pure record-to-tuple mapping). - `option string -> raise Rdfxml_error` adapters. The patch (67_rdfxml_validation.sh) shrinks from ~120 lines of inline OCaml validation logic to ~50 lines of thin adapter code. Per the corrected boundary-audit taxonomy, this patch moves from `VIOLATION-SEM` to `ASSUME-IO`. build-ocaml.sh + tests/unit/run-all.sh extended to thread the new module through the F\* extraction pipeline AND the OCaml link order (XML_Wellformedness must precede Parser_RDFXML). Verification: - W3C rdf-xml suite: 166 pass / 0 fail (zero regression). - Unit suite: 8/8 files pass. Per CLAUDE.md rule #11 + Iron Rule #3 (assume val gap closed). https://claude.ai/code/session_01Gid59KLYfGtaXLNfFWnPt1

Per #200 plan: attempt the formal F* round-trip lemma for DictWriter; admit-with-issue if SMT chokes. - Adds `lemma_parse_serialize_dict` to RDF.CottasStore.DictWriter.fst. - Statement: `parse_dict (serialize_dict tokens) == Some tokens` under the same overflow preconditions [serialize_dict] requires ([length tokens < 2^32] and the cumulative token-data offset fits in u64). - Body is `admit ()`. The proof composes four foundation lemmas in RDF.Bytes (parse∘write inverses for u32_le, u64_le, parse_n_bytes, parse_string_of_length); none exist yet, and writing them needs FStar.List.Tot.Properties / FStar.String composition lemmas in scope. Tracked separately in #252. - The CI hash-roundtrip witness in tests/unit/dict_writer_roundtrip.ml remains the empirical guarantee today (4 fixtures, SHA-256 pinned). F* verifies clean (z3 4.13.3, no --lax, no --admit_smt_queries — the `admit ()` is local to a single Lemma body, not a global escape). Per CLAUDE.md Iron Rule #3 spirit (no silent gaps): the admit is visible in the source, the issue references the four prerequisite lemmas, and the empirical CI witness covers the format-stability question on representative fixtures. https://claude.ai/code/session_01Gid59KLYfGtaXLNfFWnPt1

…y 4.13.3 F*'s upstream installer asks for both z3-4.8.5 and z3-4.13.3 because the wider F* ecosystem (HACL*, EverParse, Vale) still targets 4.8.5 while newer libs are on 4.13.3. Factoidal pins 4.13.3 only — every --z3version flag in this repo (Makefiles, build-ocaml.sh, iterate.sh) says 4.13.3, and no module uses 4.8.5. z3's role is upstream of extraction: it runs during F* verification, not in the OCaml/JS/Wasm/C downstream toolchains. js_of_ocaml and KaRaMeL both consume already-verified F* output and never invoke z3, so the version pin is identical across all three extraction targets. Changes: - README Quick Start: note that the installer's 4.8.5 mention is for other ecosystem projects, not Factoidal. - skills/fstar-env iron-rule #3: same clarification. - skills/build-and-test: replace the stale "expected '4.8.5', got '4.13.3'" error-table row with the actual modern failure mode (opam's 4.15.x shadowing the binary release on PATH). - docs/skills/validating.md: drop z3-4.8.5 from the tooling list. https://claude.ai/code/session_01NVopnmZbHjwanF1asedHuM

@context

build-ocaml.sh gains the jsonld_runner target (sibling block to rdfc10_runner, PARQUET_NATIVE_STUBS after COMMON_MODULES per the http_client lesson). One OCaml comment in the runner spelled F-star with a star-paren, which closes an OCaml block comment early - the exact CLAUDE.md syntax-trap warning, now demonstrated in production. Measured baseline against the vendored W3C json-ld-api toRdf manifest: 33 pass, 423 fail, 11 skip (out of 467). Positive-eval 17 pass of 345 (expanded-form only; every @context input scores FAIL by design, not SKIP - anti-pattern #3); negative-eval 0 pass of 106 (the expanded-form parser is lenient); positive-syntax 16 pass of 16; 11 skips are option.specVersion json-ld-1.0. This is the burn-down that context processing (Phases 3-4) eats into. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RAihmCsGHo3uyFJJcsUUJ9

@graph

SPARQL11.IRI.Resolve rewritten as a direct RFC 3986 section 5.2/5.3 transliteration (dot-segment removal, network-path, query-only, empty-reference resolution) - validated against the RFC's own 42-case section 5.4 battery before porting. This is the first of the owner's reusable-foundations modules (the RDF.IRI role): the old SPARQL-lite heuristic was the root cause of 13+ JSON-LD failures, and SPARQL itself only gets more conformant (631 pass, 0 fail held). JSON-LD: top-level @graph wrapper no longer misread as a named graph (5 flips); JSON literals via a documented RFC 8785 JCS subset (20 flips); @included with strict node-object validation (5 flips); empty-IRI document-relative expansion. Measured: 287 pass, 169 fail, 11 skip (out of 467), zero regressions by per-test set-diff. JSONLD.Loader.fst declares the documentLoader seam (assume val, declaration-only, not yet in build lists); stub patch 275_jsonld_document_loader.sh per Iron Rule #3, tracked in issue #275 - wiring remote contexts + @import through it is the next slice and most of the remaining 169. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RAihmCsGHo3uyFJJcsUUJ9

claude added 3 commits March 5, 2026 20:39

Copilot AI review requested due to automatic review settings March 5, 2026 21:05

danbri merged commit ee68db5 into claude/main Mar 5, 2026
1 check passed

danbri deleted the claude/analyze-js-demos-V6anC branch March 5, 2026 21:05

Copilot started reviewing on behalf of danbri March 5, 2026 21:05 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

danbri mentioned this pull request Mar 11, 2026

Audit ocaml-patches.sh and w3c_runner.ml: elevate logic to F* #61

Open

danbri mentioned this pull request Apr 21, 2026

UTF-8 end-to-end: FastString speedup interacted with under-documented byte/codepoint boundary, broke invariants across the evaluator #92

Open

Copilot AI mentioned this pull request May 7, 2026

audit: Tableau.fst — completeness check + recommend continued staged completion #210

Merged

3 tasks

danbri mentioned this pull request Jul 4, 2026

JSONLD.Loader.jsonld_load_document: acknowledged assume-val gap (remote contexts / @import unwired) #275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enrich ARCHITECTURE.md with detailed verification precedents#3

Enrich ARCHITECTURE.md with detailed verification precedents#3
danbri merged 3 commits into
claude/mainfrom
claude/analyze-js-demos-V6anC

danbri commented Mar 5, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,224 @@
		# Factoidal: Architecture Analysis & Roadmap

	- 11 key-value tables: `id2str` mapping + 9 quad index orderings (SPO, SOP, PSO, etc.)
	- 10 key-value tables: `id2str` mapping + 9 quad index orderings (SPO, SOP, PSO, etc.)

Conversation

danbri commented Mar 5, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants