Skip to content

Enrich ARCHITECTURE.md with detailed verification precedents#3

Merged
danbri merged 3 commits into
claude/mainfrom
claude/analyze-js-demos-V6anC
Mar 5, 2026
Merged

Enrich ARCHITECTURE.md with detailed verification precedents#3
danbri merged 3 commits into
claude/mainfrom
claude/analyze-js-demos-V6anC

Conversation

@danbri

@danbri danbri commented Mar 5, 2026

Copy link
Copy Markdown
Owner

No description provided.

claude added 3 commits March 5, 2026 20:39
Systematic comparison of every type, constraint, and operation in
the F* specification with the corresponding JavaScript implementation
in docs/index.html and docs/tests.html.

https://claude.ai/code/session_01V26rLo7FkbPghV7HUZLoNc
… suites

Covers: JS vs WASM correctness comparison, F*→WASM extraction via KaRaMeL,
browser WASM integrity (SRI/CSP/signatures), W3C RDF/SPARQL 1.1 test suite
structure, and how persistence (SQLite/PG/binary/HTTP) fits with F* formalities.

https://claude.ai/code/session_01V26rLo7FkbPghV7HUZLoNc
Add EverParse production deployment details, Benzaken SQL/Coq verification
(directly adaptable to SPARQL), verified B+-trees, FSCQ/Argosy crash safety,
and Oxigraph architecture details as closest Rust prior art.

https://claude.ai/code/session_01V26rLo7FkbPghV7HUZLoNc
Copilot AI review requested due to automatic review settings March 5, 2026 21:05
@danbri danbri merged commit ee68db5 into claude/main Mar 5, 2026
1 check passed
@danbri danbri deleted the claude/analyze-js-demos-V6anC branch March 5, 2026 21:05

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two new documentation files to the Factoidal repository: ARCHITECTURE.md (a comprehensive architecture analysis and roadmap covering correctness comparisons between JS, Rust/WASM, and F* implementations, WASM signing/verification, W3C test integration, and persistence strategies) and docs/GROUNDING_ANALYSIS.md (a detailed analysis of how the JS demos map to the F* formal specification).

Changes:

  • New ARCHITECTURE.md with four sections analyzing implementation correctness, WASM signing, W3C test suite integration plans, and persistence strategies with verified systems precedents
  • New docs/GROUNDING_ANALYSIS.md with a type-by-type mapping of F* formal definitions to JS implementations, plus well-formedness constraint analysis and test coverage summary

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
ARCHITECTURE.md New architecture analysis document covering JS vs Rust/WASM correctness, F*→WASM extraction pipeline, W3C test suite integration proposal, and persistence roadmap with verified systems references
docs/GROUNDING_ANALYSIS.md New grounding analysis documenting how JS demo types and constraints map to the F* formal specification, including extensions, divergences, and test coverage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ARCHITECTURE.md
@@ -0,0 +1,224 @@
# Factoidal: Architecture Analysis & Roadmap

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title says "Enrich ARCHITECTURE.md with detailed verification precedents," implying this is an update to an existing file. However, this is a brand-new 224-line file. Additionally, the PR also adds docs/GROUNDING_ANALYSIS.md (74 lines), which is not mentioned in the title at all. The title should accurately reflect the scope of this PR.

Copilot uses AI. Check for mistakes.
Comment thread ARCHITECTURE.md

**For the Rust/WASM path specifically:**
- **Oxigraph** (Rust, SPARQL-compliant, RocksDB backend) is the closest prior art:
- 11 key-value tables: `id2str` mapping + 9 quad index orderings (SPO, SOP, PSO, etc.)

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor arithmetic inconsistency: "11 key-value tables: id2str mapping + 9 quad index orderings" adds up to 10, not 11. Either the count is wrong or there's an additional table not mentioned. Please verify and correct.

Suggested change
- 11 key-value tables: `id2str` mapping + 9 quad index orderings (SPO, SOP, PSO, etc.)
- 10 key-value tables: `id2str` mapping + 9 quad index orderings (SPO, SOP, PSO, etc.)

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +65
`tests.html` provides 31 test cases covering every formal constraint:
- IRI well-formedness (6 tests)
- BNode identity (4 tests)
- Literal construction and well-formedness (9 tests)
- Triple serialization (1 test)
- Graph operations including bnodes (9+ tests)

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document states "tests.html provides 31 test cases" but the actual file contains 43 individual test assertions (6 IRI + 4 BNode + 14 Literal + 1 Triple + 16 Graph operations + 2 Cross-type equality). The breakdown also understates Literal tests as "9 tests" when there are actually 14 (the 4 literal equality tests on lines 206-209 of tests.html appear to be missing from the count). Please recount and correct the total and per-category numbers.

Suggested change
`tests.html` provides 31 test cases covering every formal constraint:
- IRI well-formedness (6 tests)
- BNode identity (4 tests)
- Literal construction and well-formedness (9 tests)
- Triple serialization (1 test)
- Graph operations including bnodes (9+ tests)
`tests.html` provides 43 test cases covering every formal constraint:
- IRI well-formedness (6 tests)
- BNode identity (4 tests)
- Literal construction and well-formedness (14 tests)
- Triple serialization (1 test)
- Graph operations including bnodes (16 tests)

Copilot uses AI. Check for mistakes.
danbri added a commit that referenced this pull request Apr 26, 2026
…umentation

Layer 1 (rule, prevents recurrence):
- CLAUDE.md rule #11: experimental_ocaml_glue/ patches MUST NOT
  override F* runtime functions with semantic re-implementations.
  Three explicit allow-cases (assume val realisations, companion-
  file writers, trivial dispatch shims). Forbidden: replacing F*
  function bodies, adding pruning/optimisation/planning logic in
  OCaml, or "Shape A" agent prompts that bypass F*.
- Top-level Claude is responsible for enforcing this in every
  agent dispatch. The drift between 2026-04-25 and 2026-04-26
  happened because I (top-level) wrote prompts with "Shape A vs
  Shape B" clauses giving each agent permission to add OCaml-side
  logic. The cumulative effect was ~3000 LoC of OCaml shims
  replacing F* runtime. The verified-extraction story collapsed.

Layer 2 plan (unwind):
- docs/designissues/fstar-purity-unwind.md: inventory of 9
  override patches (3094 LoC total) with phased retirement plan
  2.2 -> 2.8 in agent-pace (~5 days, not 3 human-weeks).
- First instrumentation: Util.Log F* module.
  - F* assume val emit + level/module/message API; pure-spec
    Tot unit; OCaml realisation in
    experimental_ocaml_glue/util_log_runtime.sh (compliant with
    rule #3: pure I/O, realises an assume val).
  - Industry-conventional: ISO-8601 UTC timestamps, RFC-5424-ish
    levels (Error/Warn/Info/Debug/Trace), env-var configured
    (FACTOIDAL_LOG_LEVEL, FACTOIDAL_LOG_FILE).
- Call sites in F* SPARQL11.Store.choose_best_tp_backend +
  estimate_tp_backend_mu emit decisions at LL_Debug / LL_Info.
  This is the ground-truth instrumentation that Pe5's --explain
  reimplementation should have called instead of duplicating the
  planner in OCaml.

Layer 1 ancillary:
- data/logs/ added to .gitignore — gitignored runtime artefacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
danbri added a commit that referenced this pull request Apr 30, 2026
- SPARQL.JSON.Escape.fst: add verification-surface comment noting
  fs_byte_at / fs_byte_length are assume-val primitives realised in
  patch 89 (review comment #5).
- factoidal_http.ml: reword 'F*-verified' to 'F*-extracted' (review
  comment #4) — narrower claim, matches the actual trust boundary.

Not addressing in this PR:
- review #1 (perf: list/rev/string_of_list vs Buffer): byte-for-byte
  parity was the goal; perf is a separate concern. List the perf
  optimization as a follow-up issue if it shows up in profiling.
- reviews #2/#3 (extracted .ml not committed → STEP=compile brittle):
  consistent with project-wide pattern; .ml files are gitignored and
  regenerated by build-ocaml.sh extract. Out of scope for this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
danbri added a commit that referenced this pull request Apr 30, 2026
- SPARQL.JSON.Escape.fst: add verification-surface comment noting
  fs_byte_at / fs_byte_length are assume-val primitives realised in
  patch 89 (review comment #5).
- factoidal_http.ml: reword 'F*-verified' to 'F*-extracted' (review
  comment #4) — narrower claim, matches the actual trust boundary.

Not addressing in this PR:
- review #1 (perf: list/rev/string_of_list vs Buffer): byte-for-byte
  parity was the goal; perf is a separate concern. List the perf
  optimization as a follow-up issue if it shows up in profiling.
- reviews #2/#3 (extracted .ml not committed → STEP=compile brittle):
  consistent with project-wide pattern; .ml files are gitignored and
  regenerated by build-ocaml.sh extract. Out of scope for this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
danbri pushed a commit that referenced this pull request May 7, 2026
Adds formal/fstar/SHACL.Validation.fst as the Phase 1 skeleton for
the W3C SHACL Core sister track (epic #181). This commit is just the
F* source + glue patch stub; a follow-up wires the build.

Surface coverage (AST only):
  - Targets: T_Class, T_Node, T_SubjectsOf, T_ObjectsOf,
    T_ImplicitClass, T_Sparql.
  - Shapes: a single `shape` record covers NodeShape /
    PropertyShape (distinguished by `is_property` + `shape_path`).
  - Constraint components: 25 cases covering cardinality, datatype,
    nodeKind, class, in, hasValue, pattern, length, range,
    logical operators (Not / And / Or / Xone), property pairs,
    closed shapes, and sh:select.
  - Property paths: predicate, inverse, sequence, alternative,
    zero-or-more, one-or-more, zero-or-one.
  - Validation report: violation + validation_report records.

Entry points (assume val, rule #11(c) host call-out):
  - validate         : data -> shapes -> ML report
  - parse_shape_from_graph (Phase 2 makes this Tot)
  - eval_sparql_target_select (the long-term host call-out)

The corresponding glue patch lives at
formal/fstar/minimal_regrettable_glue_code_each_with_an_open_issue/
181_shacl_validate_stub.sh and is tracked by issue #181. Phase 1
patch is a no-op idempotent acknowledgement that injects a marker
comment into the extracted SHACL_Validation.ml — its purpose is
to discharge CLAUDE.md rule #3 (every assume val has a glue file)
without sneaking semantic logic into OCaml (rule #11). Phase 2
shrinks this to wire only eval_sparql_target_select.

Verified clean under F* 2025.12.15 + z3 4.13.3, no --lax. Out of
scope: build wiring (next commit), parse implementation,
validation evaluator, W3C test corpus runner.

https://claude.ai/code/session_01Gid59KLYfGtaXLNfFWnPt1
danbri pushed a commit that referenced this pull request May 9, 2026
…patch

The F* rewrite_query_bnodes_pattern function (SPARQL11.Algebra.fst
line 3648) already implements blank-node-to-variable rewriting for
entailment regimes — it just wasn't wired in the runner. The runner
duplicated the same logic inline 7 times.

This commit:
  1. Replaces all 7 inline OCaml rewrite blocks (~210 lines) with
     a single call:
       { query with q_pattern =
         SPARQL11_Algebra.rewrite_query_bnodes_pattern query.q_pattern }
  2. Deletes 53_blank_node_variable_rewriting.sh entirely (the
     patch's job was to inject the same inline OCaml — now obsolete).

Section E retirement #3 (after #95 + #66 yesterday). Patches in
minimal_regrettable_glue_code_each_with_an_open_issue/:
  Start of yesterday: 15
  After #95:          14 (56108f9)
  After #66:          13 (07638ee)
  After #53:          12 (this commit)

Section E retirements: 3 / 6 confirmed VIOLATIONs cleared.

The F* function `rewrite_query_bnodes_pattern` was already in
SPARQL11.Algebra.fst since an earlier session — apparently
written but never wired. The OCaml inline version was kept as
a parallel implementation. Removing it gives a single
F*-verified path; the rewrite is now sound by construction
(F* type checks the AST traversal).

Refs: #53, #200 (Section E), #237 (recovery workstream).

https://claude.ai/code/session_01Gid59KLYfGtaXLNfFWnPt1
danbri pushed a commit that referenced this pull request May 10, 2026
…#200 PR5, #67)

Migrates the byte-walking validation logic out of
formal/fstar/minimal_regrettable_glue_code_each_with_an_open_issue/
67_rdfxml_validation.sh into pure-F\* `XML.Wellformedness.fst`.

What moved to F\*:
- `is_valid_ncname` — XML 1.1 NCName production (NameStartChar
  followed by NameChar*); codepoint-aware via FStar.Char.char.
- `forbidden_node_element_names` / `forbidden_property_element_names` —
  RDF/XML §7.2.11 / §7.2.12 forbidden-name lists.
- `validate_rdf_id_attr` — checks any rdf:ID / rdf:nodeID attribute
  values are valid NCNames; returns option string.
- `check_conflicting_attrs_common` / `_node` / `_property` —
  RDF/XML §7.2.4..7.2.10 mutual-exclusion rules.

F\* verification clean (z3 4.13.3, no --lax / no --admit_smt_queries).

What stays in OCaml (rule-#11(a) glue):
- `Rdfxml_error` exception (the existing call sites in process_node_element
  / process_property_element raise it).
- `attrs_to_pairs` — converts `Parser_XML.xml_attribute list` to the
  F\*-shaped `(string * string) list` (pure record-to-tuple mapping).
- `option string -> raise Rdfxml_error` adapters.

The patch (67_rdfxml_validation.sh) shrinks from ~120 lines of
inline OCaml validation logic to ~50 lines of thin adapter code.
Per the corrected boundary-audit taxonomy, this patch moves from
`VIOLATION-SEM` to `ASSUME-IO`.

build-ocaml.sh + tests/unit/run-all.sh extended to thread the new
module through the F\* extraction pipeline AND the OCaml link order
(XML_Wellformedness must precede Parser_RDFXML).

Verification:
- W3C rdf-xml suite: 166 pass / 0 fail (zero regression).
- Unit suite: 8/8 files pass.

Per CLAUDE.md rule #11 + Iron Rule #3 (assume val gap closed).

https://claude.ai/code/session_01Gid59KLYfGtaXLNfFWnPt1
danbri pushed a commit that referenced this pull request May 10, 2026
Per #200 plan: attempt the formal F* round-trip lemma for
DictWriter; admit-with-issue if SMT chokes.

- Adds `lemma_parse_serialize_dict` to
  RDF.CottasStore.DictWriter.fst.
- Statement: `parse_dict (serialize_dict tokens) == Some tokens`
  under the same overflow preconditions [serialize_dict] requires
  ([length tokens < 2^32] and the cumulative token-data offset
  fits in u64).
- Body is `admit ()`. The proof composes four foundation lemmas in
  RDF.Bytes (parse∘write inverses for u32_le, u64_le, parse_n_bytes,
  parse_string_of_length); none exist yet, and writing them needs
  FStar.List.Tot.Properties / FStar.String composition lemmas in
  scope. Tracked separately in #252.
- The CI hash-roundtrip witness in tests/unit/dict_writer_roundtrip.ml
  remains the empirical guarantee today (4 fixtures, SHA-256 pinned).

F* verifies clean (z3 4.13.3, no --lax, no --admit_smt_queries —
the `admit ()` is local to a single Lemma body, not a global escape).

Per CLAUDE.md Iron Rule #3 spirit (no silent gaps): the admit is
visible in the source, the issue references the four prerequisite
lemmas, and the empirical CI witness covers the format-stability
question on representative fixtures.

https://claude.ai/code/session_01Gid59KLYfGtaXLNfFWnPt1
danbri pushed a commit that referenced this pull request May 14, 2026
…y 4.13.3

F*'s upstream installer asks for both z3-4.8.5 and z3-4.13.3 because
the wider F* ecosystem (HACL*, EverParse, Vale) still targets 4.8.5
while newer libs are on 4.13.3. Factoidal pins 4.13.3 only — every
--z3version flag in this repo (Makefiles, build-ocaml.sh, iterate.sh)
says 4.13.3, and no module uses 4.8.5.

z3's role is upstream of extraction: it runs during F* verification,
not in the OCaml/JS/Wasm/C downstream toolchains. js_of_ocaml and
KaRaMeL both consume already-verified F* output and never invoke z3,
so the version pin is identical across all three extraction targets.

Changes:
- README Quick Start: note that the installer's 4.8.5 mention is for
  other ecosystem projects, not Factoidal.
- skills/fstar-env iron-rule #3: same clarification.
- skills/build-and-test: replace the stale "expected '4.8.5', got
  '4.13.3'" error-table row with the actual modern failure mode
  (opam's 4.15.x shadowing the binary release on PATH).
- docs/skills/validating.md: drop z3-4.8.5 from the tooling list.

https://claude.ai/code/session_01NVopnmZbHjwanF1asedHuM
danbri pushed a commit that referenced this pull request Jul 4, 2026
build-ocaml.sh gains the jsonld_runner target (sibling block to
rdfc10_runner, PARQUET_NATIVE_STUBS after COMMON_MODULES per the
http_client lesson). One OCaml comment in the runner spelled
F-star with a star-paren, which closes an OCaml block comment early -
the exact CLAUDE.md syntax-trap warning, now demonstrated in
production.

Measured baseline against the vendored W3C json-ld-api toRdf
manifest: 33 pass, 423 fail, 11 skip (out of 467). Positive-eval
17 pass of 345 (expanded-form only; every @context input scores FAIL
by design, not SKIP - anti-pattern #3); negative-eval 0 pass of 106
(the expanded-form parser is lenient); positive-syntax 16 pass of
16; 11 skips are option.specVersion json-ld-1.0. This is the
burn-down that context processing (Phases 3-4) eats into.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RAihmCsGHo3uyFJJcsUUJ9
danbri pushed a commit that referenced this pull request Jul 4, 2026
SPARQL11.IRI.Resolve rewritten as a direct RFC 3986 section 5.2/5.3
transliteration (dot-segment removal, network-path, query-only,
empty-reference resolution) - validated against the RFC's own 42-case
section 5.4 battery before porting. This is the first of the owner's
reusable-foundations modules (the RDF.IRI role): the old SPARQL-lite
heuristic was the root cause of 13+ JSON-LD failures, and SPARQL
itself only gets more conformant (631 pass, 0 fail held).

JSON-LD: top-level @graph wrapper no longer misread as a named graph
(5 flips); JSON literals via a documented RFC 8785 JCS subset (20
flips); @included with strict node-object validation (5 flips);
empty-IRI document-relative expansion. Measured: 287 pass, 169 fail,
11 skip (out of 467), zero regressions by per-test set-diff.

JSONLD.Loader.fst declares the documentLoader seam (assume val,
declaration-only, not yet in build lists); stub patch
275_jsonld_document_loader.sh per Iron Rule #3, tracked in issue
#275 - wiring remote contexts + @import through it is the next
slice and most of the remaining 169.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RAihmCsGHo3uyFJJcsUUJ9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants