Cross-artifact conformance harness for the CIRIS federation stack — the substrate and fabric of CEWP, the CIRIS Epistemic Web Platform (pronounced "soup"): github.com/CIRISAI/CEWP · FSD. It doubles as the CEWP reference: the specs it conforms against are vendored under reference/.
This repo verifies that independently-distributed CIRIS wheels coexist correctly in a single Python process. The CIRIS stack ships as five separate PyO3 extension modules:
ciris-persist— substrate (federation_keys directory, outbound queue, etc.)ciris-verify(ciris-keyring+ciris-crypto) — hybrid Ed25519 + ML-DSA-65 signingciris-edge— federation wire transportciris-node-core(planned) — node-mode serving + WA UXciris-lens-core(planned) — capacity-score + detector logic
These wheels are built independently but designed to cohabit in one Python interpreter — the CIRIS 3.0 cohabitation EPIC (CIRISPersist#85). Cohabitation is its own engineering surface: shared substrate handles, cross-module type identity, version-skew compatibility, import order. Per-crate unit tests and per-crate integration tests cannot cover this surface — they all run in single-binary test environments where cross-module problems vanish by construction.
This harness exists to close that test gap.
| Test class | Scope | Lives in |
|---|---|---|
| Unit | In-crate invariants | each crate |
| Integration | Crate against its dependencies, one binary | each crate |
| Conformance | Artifacts (wheels) conforming to a cross-artifact contract | this repo |
The name comes from the W3C / Khronos conformance-suite tradition: independent implementations of a contract are exercised against a separate, neutral suite that proves they conform to the contract.
The suite is partitioned into two tiers (pytest -m substrate / pytest -m fabric):
- Substrate — the independently-built ciris-* wheels cohabit in one process, and each primitive conforms to the CEG contract (cohabitation scenarios + the CEG CCP/CCC/CCS profiles).
- Fabric — the emergent federation behaviour: the replication discipline (per-actor eviction, eviction sweeper, trust-threshold intake) and the scaling factors (
effective_trust_set_multiplier, thek_effcorridor, retention) from FEDERATION_SCALING_MODEL — how the CEWP "we don't need big tech" claim becomes a checked property.
See docs/FABRIC_CONFORMANCE.md for the tier coverage matrix.
Beyond cohabitation, this harness verifies the three CEG 0.1 conformance profiles (§0.2) — CCP (producer), CCC (consumer), CCS (substrate). See docs/CEG_CONFORMANCE.md for the profile definitions, the §0.5 fractal-self reading discipline, and a coverage matrix tracking which CEG paths are tested today vs. pending an upstream surface. Profile tests carry the ceg marker plus ccp/ccc/ccs; run one with pytest -m ccc.
# From a checkout of this repo:
pip install -e ".[dev]"
pytest
# Against a specific wheel matrix (CI default):
pytest --matrix matrices/current.yaml
# Single scenario:
pytest tests/test_030_cohabitation_init.py -vEach scenario runs in a fresh Python subprocess because PyO3 type registration is process-global — once a module is imported, you cannot rewind it (the mechanics are in the first drop-down below).
The CIRIS stack ships as several separate libraries (storage, crypto, networking) that are built and released independently but are meant to run together inside one program. This suite checks that they actually do. Expand a section for the details.
Why each test runs in its own fresh Python process
These libraries are compiled extensions (written in Rust). When Python imports one, the import permanently registers things into the running interpreter — there's no way to "un-import" it afterward. So a test that loaded one library would leave traces that contaminate the next test.
To keep every check clean, the harness launches a brand-new Python process for each scenario, hands it a short script, and reads the result back as JSON. The test runner file itself deliberately imports none of the CIRIS libraries, so nothing leaks in by accident.
What "cohabitation" means — and why the libraries' own tests miss these bugs
"Cohabitation" is just the situation where all these independently-shipped libraries run side by side in one process — which is exactly how they run in production.
That situation has its own failure modes that don't exist anywhere else: two libraries can each define what looks like "the same" type, but the program treats them as different and rejects the hand-off; they can fight over a shared resource; the order you load them in can matter. Each library's own test suite compiles everything into a single combined build, where these cross-library problems simply can't happen. This harness installs the real, separately-published libraries together — the only place those bugs actually surface.
The two kinds of checks: the building blocks vs. the whole network
Two test groups, selectable with pytest -m substrate or pytest -m fabric:
- Substrate — do the building-block libraries load and work together, and does each one correctly produce, read, and store the shared message format the components use to talk to each other?
- Fabric — does the network behave correctly as a whole: the rules for which data a node keeps, whose data it's allowed to delete, when it drops stale data, and the math behind the claim that this scales to internet size on ordinary hardware.
Detailed coverage tables: docs/CEG_CONFORMANCE.md (building blocks) and docs/FABRIC_CONFORMANCE.md (network).
What "conforming" means for each component (producer / consumer / storage)
The components talk to each other using a shared, signed message format — the CEG ("CIRIS Epistemic Grammar"; full spec under reference/CEG/). Every claim ("this content is genuine," "I trust this peer") is a signed message. A component can play three roles, and the spec says what correct means for each:
- Producer — writes well-formed messages and signs them properly.
- Consumer — checks those signatures and applies the agreed rules before acting on a message.
- Storage — keeps and forwards messages without corrupting them (verifies content against its hash, doesn't silently duplicate, etc.).
Signatures use both a standard algorithm and a post-quantum one, so they stay valid for decades.
How the tested versions are pinned
matrices/current.yaml lists the exact library versions expected to work together right now; CI installs precisely those into a clean environment. To move it forward: bump a version, run the tests, and update any test whose expected-failure now passes.
Why tests are marked "expected failure" instead of skipped
A skipped test silently hides untested code, which is easy to mistake for "it works." This suite never skips. A test either passes against the real library, or it's marked an expected failure linked to a specific bug report we've filed upstream.
The rule: when a library is missing a feature or has a bug, we report it upstream and mark the test expected-to-fail — we don't paper over it with a workaround that tests something easier. The moment the upstream fix ships, that test automatically becomes a real, enforced check.
Running inside the phone app (the Android build)
The CIRIS agent packages three of these libraries into a single Android app and runs them on the phone. Android does this with a tool that bundles the compiled libraries directly and runs them on its own bundled Python — skipping the usual version checks. That only works because the libraries are built against Python's stable binary interface, so one build runs across Python versions.
These tests confirm that's actually true, that the libraries cope with the phone's secure-key hardware, and that startup produces a valid network identity. CI also runs on ARM chips (what phones use) and reproduces the Android bundling trick, so a break shows up before it reaches an app store.
The specs this suite checks against (the reference copies)
reference/ holds copies of the specifications this suite verifies: the platform overview (CEWP), the scaling model and the small program that computes it, the message-format spec, and the research paper behind the scaling claims. These are snapshots for convenience — reference/README.md records exactly where each one came from.
Any CIRIS-stack repo can run this harness against its just-built artifact + the pinned sibling wheels:
# In e.g. CIRISEdge/.github/workflows/ci.yml
jobs:
conformance:
needs: [pyo3-wheel]
uses: CIRISAI/CIRISConformance/.github/workflows/run-against-wheels.yml@main
with:
under-test-wheel: ciris_edge-wheel-linux-x86_64 # the just-built artifact
under-test-package: ciris-edge
matrix: matrices/current.yaml # pinned siblingsThe reusable workflow installs the under-test wheel + pinned siblings into a clean venv and runs pytest. A regression in the under-test repo fails its own CI before merge.
- Identify the cross-artifact invariant under test (e.g. "import order doesn't affect engine type identity").
- Add a single Python file
tests/test_NNN_short_name.pywith one or morepytestfunctions. - If the case requires specific wheel versions, parametrize via the
wheelsfixture (conftest.py). - If the case is a known-failing regression seed (like persist#109 was for cohabitation init), mark it
@pytest.mark.xfail(strict=True, reason="...")so the harness goes green once the upstream fix lands.
Each test file is self-contained — no shared imports between test files — so any failure reproduces in isolation and can be referenced verbatim in a bug report.
| File | Tier | Verifies | Status |
|---|---|---|---|
test_010_solo_imports.py |
substrate | Each ciris-* wheel imports cleanly alone | ✅ |
test_020_pairwise_imports.py |
substrate | Any two ciris-* wheels coexist in one process | ✅ |
test_030_cohabitation_init.py |
substrate | edge.init_edge_runtime(persist.Engine) capsule handshake |
✅ |
test_040_pyclass_identity.py |
substrate | Cross-module PyClass identity invariants | ✅ |
test_050_send_receive.py |
substrate | Send/receive surface; ephemeral refuses cleanly; loopback + durable xfail (edge#50) |
✅ |
test_060_version_skew.py |
substrate | Compatible / incompatible version-pair matrix | xfail (needs clean-venv fixture) |
test_070_hsm_transport_identity.py |
substrate | hardware_hsm_only cohab init → 32-byte transport identity |
✅ |
test_080_mobile_target.py |
substrate | Android/Chaquopy bundling (abi3), keystore taxonomy, bring-up gate | ✅ |
test_100_ccc_hybrid_verify.py |
substrate (CCC) | Hybrid-signature verify policy matrix | ✅ |
test_110_ccs_blob_integrity.py |
substrate (CCS) | Blob full-SHA integrity + signed round-trip | ✅ |
test_120_ccp_canonical_bytes.py |
substrate (CCP) | Canonical-bytes determinism + sign/verify round-trip | ✅ (§0.5 reject xfail persist#126) |
test_130_multimedia.py |
substrate + fabric | CEG 0.3 multimedia: media blob storage, perceptual-hash gate, takedown scheduling, key-grant retire, budget eviction | ✅ (takedown local-holder xfail persist#130) |
test_200_fabric_eviction.py |
fabric | Per-actor eviction + withdraws, sweeper, trust threshold |
✅ (holders/gate xfail persist#130/#129) |
test_210_fabric_scaling_factors.py |
fabric | Scaling-factor contract (multiplier curve, k_eff, retention) |
✅ |
When CIRISNodeCore / CIRISLensCore / CIRISRegistry start shipping wheels, add them to:
matrices/current.yaml— pin the versionconftest.py::ALL_WHEELS— register in the pairwise import test- New test files for the crate-specific cohabitation invariants
The harness shape doesn't change.
AGPL-3.0-or-later (matches the broader CIRIS stack).