·
200 commits
to master
since this release
What's new in 1.9.0
ProvSQL 1.9.0 brings broad support for subqueries outside FROM and for
outer joins, a redesigned probability engine built around a method
catalog with a cost-based chooser and user-requested guarantees, exact
probability evaluation for a wide range of HAVING aggregate
comparisons, SQL-faithful empty-group and NULL aggregation semantics,
native arithmetic on aggregation results, and a WebAssembly build that
powers the in-browser ProvSQL Playground.
Subqueries and outer joins
- Subqueries outside FROM.
EXISTS/NOT EXISTS,IN/NOT IN,
quantified comparisons (op ANY/op ALL, rowIN), scalar
subqueries, andARRAY(SELECT ...), correlated or not, are now
internally decorrelated and rewritten with the proper (m-)semiring
provenance: value bodies throughchoose()with an
at-most-one-row gate, aggregate bodies (count,sum,avg,
min,max) through the aggregate over the outer-join group,
semijoins and antijoins through count-predicate lowerings (so
NOT INcarries the same antijoin provenance as the equivalent
EXCEPT). The subquery body may involve a single
provenance-tracked relation or join several as a comma-separated
FROM list;SELECT DISTINCTbodies,ORDER BY ... LIMIT 1
(argmax) bodies, several coalesced subqueries, aggregate bodies
compared against outer columns, and untracked outer FROMs are all
handled. Bodies touching no tracked relation pass through to
PostgreSQL untouched. - Outer joins.
LEFT,RIGHTandFULLjoins are lowered to
their matched and null-padded antijoin arms, capturing the
non-monotone 0-match world;EXCEPT/EXCEPT ALLprovenance is
corrected to NOT-IN semantics. - SQL-faithful aggregate NULL handling.
count(expr),sum,
min,maxandavgnow ignore NULL inputs as SQL requires;
count(*)still counts rows.
Probability evaluation
- Method catalog and cost chooser. Probability-method dispatch is
rebuilt as a catalog of strategies under a unified cost-based
chooser (a lazy uniform-cost search over feature-acquisition and
method-execution costs, with calibrated constants and speculative
execution that budgets a candidate at the next-best method's cost
and escalates on overrun). - Guarantee-first evaluation. Request
exact, relative
(eps, delta), or additive(eps, delta)and let the chooser pick
the cheapest admissible method; every approximate method shares one
key=valueargument grammar and emits a machine-readable
approximation-guarantee NOTICE. The newprovsql.last_eval_method
GUC reports what actually ran, down to the resolved external tool
(compilation:d4,wmc:ganak). - New methods.
karp-luby(the DNF FPRAS, with the
Dagum-Karp-Luby-Ross self-adjusting stopping rule and stratified
sampling),stopping-rule(whole-circuit relative-error FPRAS),
sieve(exact inclusion-exclusion over monotone DNFs), and
d-tree(anytime probability interval bounds after
Olteanu-Huang-Koch, on arbitrary circuits, with the new
probability_boundsSQL function). - Exact HAVING aggregate probabilities. Closed-form and exact
evaluators forHAVINGcomparisons overCOUNT,SUM,MIN,
MAXandAVG: safe (hierarchical) joins at arbitrary depth,
cross-product joins,repair_keyBID blocks, branch-spanning sums
over UNION/EXCEPT contributors, numeric/float aggregates via
decimal scale-to-integer, constant arithmetic folded into the
threshold, and comparisons combining several aggregates resolved by
possible-worlds enumeration.COUNT(DISTINCT ...)in HAVING is fixed, and aggregates that no
exact arm covers fall back to an approximation-safe sampling route
rather than erroring.
Aggregation semantics and arithmetic
- Empty groups, SQL-faithfully. Scalar (ungrouped) aggregations
are now tagged in the circuit so true-on-empty predicates
(count(*) = 0,sum(x) IS NULL…) evaluate correctly in every
possible world;count(col)with NULLs,HAVING ... IS [NOT] NULL
on groups with NULL-valued rows, and moments/support ofagg_token
min/max(conditioned on a non-empty group, so they stay finite)
are all handled. - Arithmetic on aggregation results.
agg_tokengains native
+ - * /, unary-, and aggregate-vs-aggregate comparisons,
buildinggate_arithcircuits that carry the computed value (so
query results still displayvalue (*)), with comparisons over the
results evaluated exactly where possible.
SQL surface and usability
add_provenanceandcreate_provenance_mappingare idempotent
(NOTICE-and-return when already done).- New
setup_search_path()helper plus an installation-time advisory
whenprovsqlis missing from the defaultsearch_path; the
random_variabletouuidcast is demoted to ASSIGNMENT so it no
longer shadows operator resolution. - The SQL
probability_benchmarkhelper (added in 1.7.0) is removed;
ProvSQL Studio's per-tool-timeout probability benchmark supersedes
it. - Sessions that run
CREATE EXTENSION provsqlthemselves no longer
end up with a stale constants cache that silently disabled the
subquery rewrites until reconnect. provenance()inside a subquery expression now raises a clear
error, and hand-madeprovsqlcolumns are rejected instead of
crashing.
ProvSQL in the browser
- The extension compiles to WebAssembly and runs inside
PGlite entirely client-side: a single-process
circuit store (PROVSQL_INPROCESS_STORE) replaces the background
worker and shared memory, and subprocess-free builds
(PROVSQL_NO_SUBPROCESS) fall back to the in-process compiler. This
powers the ProvSQL Playground
(provsql.org/playground): the
full ProvSQL Studio, including its new notebook mode, with no
installation. Reproducible build underwasm/, driven by
make wasm/make playground.
Docker image
- The demonstration image now bundles the ganak and sharpsat-td
weighted model counters and ships the tutorial and case-study
databases pre-seeded (switchable from Studio's connection chip).
Robustness
- Fixed a heap overflow in the where-provenance column map (the
long-standing flaky macOS crash), a backend segfault on NULL
provenance-mapping values, and pathological-circuit recursion now
errors cleanly via stack-depth guards instead of crashing the
backend.