v13.0.0
Three intertwined phases — DuckDB swap (CA), test-layer architecture
rewrite (CB), Oracle 19c image self-build (CB.14) — ship together as
v13.0.0. Breaking changes are all on the test/dev surface: SQLite is
gone, the runner reads typed markers instead of file lists, and CI is
collapsed onto a single self-hosted runner. Production dialects
(PostgreSQL, Oracle) and the four bundled dashboards are unchanged at
the operator surface.
Headline impact:
- DuckDB local-iteration backend with 54.9× faster bundled
refresh at ~1M rows vs the SQLite baseline (4.6s vs 252.9s) and
byte-identical matview output across all 20 L1/Investigation
matviews. - Test-layer architecture rebuilt around typed markers
(@tier(...),@dialects(...),@writes(...)); runner is a thin
--tier=X --dialect=Ydispatcher. Hardcoded file lists in
_layer_commanddeleted (CB.6). 4011 unit tests + 69 db tests + 137
qs_browser tests pass on the new dispatch shape. - Provider-marked isolation (CB.7 —
isolated_cfg/
IsolationScope) closes the long-standing DuckDB pytest-xdist
intra-cell file-lock bug (#199) by branching on test-declared write
intent. Dialect.SQLITEdeleted +aiosqlite/aiosqlitepool
dropped from[prod]extras. SQLite was a 25k-row local-iteration
convenience that DuckDB replaced; keeping it incurred parallel-test
matrix overhead. Manual SQLite triage works on commits ≤ v12.0.2
(--dialects=sl); on v13+ the dialect is gone.- AWS RDS Aurora decommissioned. Production-parity testing moves
to Docker containers on the self-hosted CI runner; QS data sources
reach the runner via thehotchkiss.ioDDNS forward documented in
the CB.10 spike. ~$45/mo of idle Aurora cost removed. - In-repo Oracle 19c image build (
tools/oracle-19c/build.sh)
built from Oracle's ownoracle/docker-imagesrecipe + the
operator-downloaded binary, with adocker commitpre-init pass
baking the initialized DB into the image (~16× cold-start speedup:
240s DBCA → 15s on arm64 Mac, 26-28s on WSL2 amd64). Replaces the
staledoctorkirk/oracle-19cimage (amd64-only, unmaintained ~2yr)
the CB.7-followup spike used. - CI collapsed onto a single self-hosted job.
ci.ymlruns
./run_tests.sh up_to=qs_browser;e2e.ymldeleted;release.yml
stays tag-driven and onubuntu-latestper the CB.7-followup
secret-isolation policy.
Migration
For operators on v12.0.2:
- DuckDB local cells are the default —
cfg.yaml's
dialect: duckdb+demo_database_url: duckdb:///<path>shape is
the canonical local-iteration form (operator workflow snippet
below). - No code change required for production PG / Oracle deploys. The
shape ofrecon-gen schema apply / data apply / json applyis
unchanged; only the local-dev backend swapped. - If you pinned
recon-gen[prod]with sqlite: that extra is
gone. Either drop the pin (no replacement needed — sqlite/aiosqlite
weren't operator-facing) or replace withrecon-gen[prod](just
PostgreSQL + Oracle). - Self-hosted CI runners must install AWS CLI v2; the
[interop]and[automount]flags in/etc/wsl.conflock down
Windows-host visibility (seetools/oracle-19c/README.mdfor the
WSL2 setup notes).
Operator workflow (unchanged at the surface)
cat > cfg.yaml <<EOF
aws_account_id: "111122223333"
aws_region: "us-east-1"
dialect: duckdb
demo_database_url: "duckdb:///./demo.duckdb"
deployment_name: "my-demo"
db_table_prefix: "my_demo"
studio_enabled: true
EOF
recon-gen schema apply --execute -c cfg.yaml --l2 tests/l2/spec_example.yaml
recon-gen data apply --execute -c cfg.yaml --l2 tests/l2/spec_example.yaml
recon-gen studio -c cfg.yaml --l2 tests/l2/spec_example.yaml
# http://localhost:8765Phase CA — DuckDB swap (engine-level detail)
DuckDB joins as the new default local-iteration backend alongside the
existing PG / Oracle production targets. The CA.0 spike measured a
54.9× bundled-refresh speedup vs the SQLite baseline at ~1M base-tx
rows (4.6s vs 252.9s) and a 59% smaller on-disk footprint, with
byte-identical matview output across all 20 L1/Investigation
matviews. Engine-level audit: docs/audits/ca_0_duckdb_spike.md.
What landed:
Dialect.DUCKDBalongside PG/Oracle/SQLite incommon/sql/dialect.py;
every helper has a DuckDB arm.duckdb>=1.1is now a core dependency
(not extras — everyrecon-geninvocation imports the enum).- Runner local cell swapped
sl→duinexpand_full()and
DEFAULT_DIALECTS. CI'se2e-sqlitejob becamee2e-duckdb.
--dialects=slstill parses (manual SQLite triage) but no longer
appears in the default 13-cell matrix. - Studio + App2 runs end-to-end against DuckDB.
cfg.yamlcarries
dialect: duckdb+demo_database_url: duckdb:///<path>; schema
apply / data apply / studio boot are operator commands unchanged. All
four dashboards (L1 / L2FT / Investigation / Executives) + the
/etl/surface + the/l2_shape/editor surfaces render. - Async pool (
_AsyncDuckdbPool) uses the one-root-connect+
per-acquire-cursor()pattern. DuckDB rejects a secondconnect()
to the same file from the same process (Unique file handle conflict);
cursor()forks a thread-safe sub-connection (verified with 8
parallel threads). - Multi-process pytest safety via
RECON_GEN_DB_READ_ONLY=1. The
runner injects this for db / app2 / browser layers when the cell URL
isduckdb://; pytest-xdist workers all open read-only against the
pre-seeded.duckdbfile (per DuckDB's read_only docs:
"Read-only mode is required if multiple Python processes want to
access the same database file at the same time"). Production CLI
invocations (schema/data apply) run before pytest dispatches under
sequential variant-seed steps that don't see the env and continue
to open read-write. - Schema sequences for DuckDB's
entrycolumn (DuckDB has no
BIGSERIAL-alias; auto-increment viaCREATE SEQUENCE+
DEFAULT nextval('<seq>')). - Paramstyle rewrite:
:name→$namefor DuckDB in
_sql_executor(DuckDB's Python driver rejects:name). - BZ.0 SQLite scratch-table workaround DELETED. DuckDB's vectorized
executor handles the correlatedSUM(...) WHERE posting <= day
subquery natively (8% FASTER than the scratch form at ~1M rows on
DuckDB per the CA.0 spike). 88 lines deleted, 18 added. - Semantic-lock files for DuckDB:
tests/data/_semantic_locks/<instance>.duckdb.json
alongside the existing.sqlite.jsonentries. Violation sets are
byte-identical (BZ.4 + CA.0 audit); onlyscenario_fingerprint.dialect
differs.
Operator workflow against this HEAD:
cat > cfg.yaml <<EOF
aws_account_id: "111122223333"
aws_region: "us-east-1"
dialect: duckdb
demo_database_url: "duckdb:///./demo.duckdb"
deployment_name: "my-demo"
db_table_prefix: "my_demo"
studio_enabled: true
EOF
recon-gen schema apply --execute -c cfg.yaml --l2 tests/l2/spec_example.yaml
recon-gen data apply --execute -c cfg.yaml --l2 tests/l2/spec_example.yaml
recon-gen studio -c cfg.yaml --l2 tests/l2/spec_example.yaml
# http://localhost:8765SQLite still works as a manual opt-in (--dialects=sl for the
runner, dialect: sqlite in cfg). Phase CB (next) restructures the
test-layer architecture per docs/audits/cb_test_layers_update.md —
includes per-xdist-worker DuckDB isolation (:memory: per worker),
broader test-fixture migration off SQLite, the Dialect.SQLITE removal,
and the aiosqlite/aiosqlitepool extras cleanup.
3246 unit tests + 8 semantic-lock tests + the sp_du_lo app2 cell all
pass. Local SQLite vs DuckDB wall-clock at the spec_example scale (~12k
base tx): SQLite db 4.41s + app2 18.47s, DuckDB db 7.15s + app2 15.44s
— essentially equivalent at small scale; the engine perf gap manifests
at large scale per the CA.0 spike measurements.
CA.11 — pyarrow-based seed apply fast path (opt-in via [prod]).
The seed-text → multi-row-VALUES coalescer (CA.10) cut sasquatch_pr
apply from ~179s to ~21s on M1 — but the rest of the wall was
Python-bound: 8.7s in the comment-aware statement splitter and ~10s
re-rendering parsed VALUES tuples back into multi-row VALUES literal
SQL for DuckDB to re-parse. CA.11 takes a structural shortcut: a
single re.finditer scan over the seed text extracts INSERTs
directly (skipping the 8.7s splitter), _parse_simple_values walks
each VALUES body once, and consecutive same-shape rows flush via
pa.Table.from_pylist + con.register + INSERT SELECT — DuckDB's
zero-copy Arrow ingest path. Measured 21.8s → 2.3s on M1 sasquatch_pr
(8.5× speedup), byte-identical DB state vs the CA.10 path (row
counts + content hashes match on both base tables; matview violation
set unchanged).
pyarrow is added to [prod] extras (pyarrow>=14) — same opt-in
posture as oracledb for Oracle. Operators on the minimal [dev]
install transparently fall back to CA.10's multi-row VALUES path; no
new hard dep. CA.12 (PG via adbc_ingest) and CA.13 (Oracle via
Connection.direct_path_load, added in oracledb 3.4.0) are filed
to extend the same pyarrow pipe to the other production dialects
once their integration testing lands.
CA.13 — Oracle direct_path_load fast path. Oracle was the dialect
with the most leverage from the pyarrow work: the current INSERT-ALL
batched coalescer takes ~10 minutes on a 131k-row sasquatch_pr seed
(measured 621s against Docker Oracle 23 + oracledb 3.4.2 thin mode);
the only viable production-scale seed apply was "schedule overnight."
CA.13 routes Oracle seed apply through Connection.direct_path_load,
added in oracledb 3.4.0 (Oct 2025) — a server-side bypass of the SQL
parser that writes blocks directly above the High Water Mark. The
implementation mirrors CA.11's structure: single re.finditer scan
of the seed SQL, group consecutive same-shape rows by (table, cols),
build a typed pyarrow Table per group (column types discovered via
all_tab_columns, cached by (conn, schema, table, col-tuple) —
the col-tuple is part of the key because the same Oracle table can
be the target of multiple INSERT shapes in one apply), then flush via
conn.direct_path_load(schema_name, table_name, column_names, data).
Non-INSERT residual (DDL, SET commands) flows through the existing
_execute_oracle_stmt_with_lock_retry between batches, preserving
script ordering. Measured 621.33s → 6.44s on the 131k-row
sasquatch_pr Oracle seed — 96.53× speedup; business-key hashes
identical to the existing path. (Identity column ENTRY values differ
by design — Oracle's INSERT-ALL with IDENTITY reuses values across
batches, a known footgun documented in [[project_oracle_19c_compat]];
direct_path_load handles identity natively.)
oracledb floor bumped from >=2.0 to >=3.4.0 in [prod] to pull
the direct_path_load method into every production install. The
dispatch in execute_script auto-detects via hasattr(conn, 'direct_path_load') + pyarrow availability; installs with older
oracledb (or without pyarrow) transparently fall back to the existing
INSERT-ALL coalescer. CA.12 (PG) stays in backlog with the rationale
documented — PG was never the Python-bound bottleneck Oracle and
DuckDB were (psycopg's cur.execute(big_script) already batches PG
INSERTs efficiently; measured win was only 1.95×, not worth the
complexity at current scale).
Phase CB — Typed test marks + Docker-to-AWS bridge + SQLite removal
CB rebuilt the test-layer architecture around test-declared markers
instead of operator-maintained file lists in the runner. The CB audit
at docs/audits/cb_test_layers_update.md is the destination doc;
implementation landed across CB.0 – CB.12.
Marks infrastructure (CB.0 – CB.6):
tests/_marks.pydeclares the typed enums + decorators:
Tier.{UNIT,APP2,DB,QS_API,QS_BROWSER},@tier(...),@dialects(...),
@writes(...),@needs(...). Each tier got migrated incrementally
(CB.2 – CB.5), with a progressive lint ratchet enforcing coverage
per tier.@inputs(*nodeids)(CB.5 addendum) — typed cross-test
dependency marker used by the 4-way audit-dashboard agreement
pipeline (scenario_plants ⊆ direct_matview_SELECT == QS == App2).
Replaces the previous fragile string-based wiring.- Hardcoded file lists deleted (CB.6). The runner's
_layer_commandnow reads--tier=X --dialect=Yand dispatches
pytest with the matching marker.MIGRATED_TIERS == all_tiers.
Provider-marked isolation (CB.7):
isolated_cfgfixture +IsolationScope.AGREEMENT_AUDIT/
INV_PIPELINEetc. branch isolation based on what the test
declares it writes, not on cell-shared prefixes. Closes #199
(DuckDB pytest-xdist intra-cell file-lock contention) by giving
each writer its own DuckDB file; readers share the producer's
output via theisolation_consumer(scope)decorator.- Producer + consumer share a scope-pinned short-hash prefix
derived from(nodeid, l2_instance, dialect, worker_id)— same
prefix → same isolated cfg → producer's writes are visible to
consumers without cross-test leakage. - 10 sibling test files migrated; the qs_browser audit-agreement
chain (test_audit_dashboard_agreement.py) reorganized so
producers write artifacts underruns/<id>/<tier>/<inv>_<view>_*.json
and consumers re-key by tier directory rather than by global prefix.
SQLite removal (CB.8 – CB.9):
Dialect.SQLITEenum entry deleted. Runner'ssldialect arm
removed. ~600 lines of SQLite-specific seed-generator code, custom
aggregate registration (STDDEV_SAMPwas missing in stdlib
sqlite3), and matview emulation removed.aiosqlite+aiosqlitepooldropped from[prod]extras. They
were carried for the old async pool shape that DuckDB replaced.tests/schema/test_l2_schema_sqlite.py→ renamed for DuckDB;
several SQLite-only unit tests deleted outright (CB.8.c – CB.8.h).
Docker-to-AWS bridge (CB.10 – CB.12):
hotchkiss.ioDDNS forward (CB.10 spike): QuickSight data
sources point athotchkiss.io:<port>, which routes to the dev /
CI runner's Docker PG over the same connection (PG + Oracle in
prod parity, no Aurora cluster idle cost). Cfg knob
qs_disable_pg_ssl: truefor the local-Docker endpoint.- Two-cfg-per-cell pattern (CB.11.b):
runs/<id>/<cell>/cfg/local.yaml
points at127.0.0.1:<container-port>;qs.yamlpoints at
hotchkiss.io:<forwarded-port>. Local layers (db / app2) use the
first; QS-touching layers (deploy / qs_api / qs_browser) use the
second. Exists because QS (us-east-1) and local pytest workers
need different hostnames pointing at the same Docker container. - CI collapsed (CB.11.c):
ci.ymlruns a single self-hosted
test-everythingjob driving./run_tests.sh up_to=qs_browser.
e2e.ymldeleted.release.ymlstays tag-driven + on
ubuntu-latestper the CB.7-followup secret-isolation policy. - AWS RDS Aurora removed (CB.12).
aws_rdsmodule deleted;
start-db-cluster/stop-db-clusterCLI verbs gone. ~$45/mo of
idle cluster cost reclaimed.
Phase CB.14 — In-repo Oracle 19c image build
Replaces the stale doctorkirk/oracle-19c image (amd64-only,
unmaintained ~2 years, broken on Apple Silicon under QEMU) with a
build-it-yourself recipe at tools/oracle-19c/.
build.shclones Oracle's ownoracle/docker-imagesat a
pinned upstream SHA, drops in the operator-downloaded Oracle 19c
binary zip, runs the upstreambuildContainerImage.sh -v 19.3.0 -e -i,
and retags asrecon-gen/oracle-19c:local. Architecture is implicit
in which binary is present (arm64LINUX.ARM64_1919000_db_home.zip,
amd64LINUX.X64_193000_db_home.zip).- Pre-init pass boots the freshly-built image once, waits for
DATABASE IS READY TO USE, runsSHUTDOWN IMMEDIATEvia sqlplus,
thendocker commits the writable layer back onto the same tag.
Subsequent boots find existing data files under
/opt/oracle/oradata/$ORACLE_SIDand skip DBCA entirely. Measured:
cold-start drops from ~240s to 15s on Apple Silicon and 26-28s on
WSL2 native amd64 (~16× speedup). - Runner adoption auto-prefers
recon-gen/oracle-19c:localwhen
Docker reports it, withgvenzl/oracle-free:23-faststart(Oracle
23ai) as a one-shot-warning fallback when the local image isn't
built.RECON_GEN_ORACLE_IMAGEenv override pins a specific tag. - Three latent oracledb bugs fixed that 23ai's leniency hid but
19c's stricter behavior surfaced:- ORA-26085 (
direct_path_loadrequires no active txn) —
conn.commit()immediately before each flush. - ORA-00933 (thin-mode rejects trailing
;) — strip in the
INSERT-fallback path. - ORA-01843 (NLS_DATE_FORMAT mismatch on the official 19c image's
defaultDD-MON-YY) — pin session NLS to ISO at connect.
- ORA-26085 (
- Oracle19cContainer subclass (in
runner.py) bridges three
gvenzl-vs-Oracle-official divergences:ORACLE_PWD(not
ORACLE_PASSWORD),ORACLE_PDB=FREEPDB1(default isORCLPDB1),
900swait_for_logstimeout (testcontainers' 120s fires
mid-init on the cold-start).
Tests
- 4011 unit tests + 12 skipped (~3.5 min prelude under WSL2 amd64).
- 69 db-layer tests + 12 skipped on the Oracle 19c image
(sp_or_locell); 137 qs_browser tests + 34 skipped + 15 xfailed
on the spec_example cell. - 8 semantic-lock fixtures cover DuckDB byte-identity vs PG / Oracle
for the 20 L1 + Investigation matviews.