XorqStatPipeline issues 51 SQL queries per widget construction (should be 1-2)

## Summary

Constructing `XorqBuckarooWidget` issues **51 separate SQL queries** to the backend for an 8-column dataframe (50k rows), and **87 queries** for a 26-column dataframe. This is the dominant cost of widget construction (700–800 ms vs polars's 60–100 ms doing the same work in-memory).

The expectation, based on how polars does it, is that the entire stats pipeline collapses into **one** complex SQL query (or at most a small constant number).

## Repro

```bash
.venv/bin/python -W ignore::DeprecationWarning -c "
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import xorq.api as xo
from buckaroo.pluggable_analysis_framework.xorq_stat_pipeline import XorqStatPipeline
from buckaroo.xorq_buckaroo import XorqBuckarooWidget
import sys, traceback
sys.path.insert(0, 'scripts/perf')
from perf_data import make_pandas
from collections import Counter

orig = XorqStatPipeline._execute
sites = []
def counting(self, q):
    stack = traceback.extract_stack()
    site = next((f'{f.filename.split(\"/\")[-1]}:{f.lineno}:{f.name}'
                 for f in reversed(stack)
                 if 'buckaroo' in f.filename and 'xorq_stat_pipeline' not in f.filename), '?')
    sites.append(site)
    return orig(self, q)
XorqStatPipeline._execute = counting

pdf = make_pandas(50_000)
con = xo.connect()
table = con.create_table('t', pdf)
XorqBuckarooWidget(table)
print(f'total queries: {len(sites)}')
for site, n in Counter(sites).most_common():
    print(f'  {n:3}  {site}')
"
```

## Output (datafusion backend, 50k × 8)

```
total queries: 51
   40  xorq_stats_v2.py:245:_categorical_histogram
    6  xorq_stats_v2.py:213:_numeric_histogram
    4  dataflow.py:421:_get_summary_sd          # the batched-aggregate query
    1  dataflow.py:345:setup_options_from_analysis
```

Same numbers on duckdb backend.

On 50k × 26 (Boston restaurant inspections head): **87 queries**.

## Two problems stacked

### A. Pipeline runs the full query set 4× per widget construction

`dataflow.py:421:_get_summary_sd` is called **4 times**. Each call re-runs the entire stat pipeline — meaning 4 batched aggregates and ~4× the per-column histogram queries. Likely a traitlet observer chain firing on every transient trait assignment during `__init__`. Most of these calls are redundant; the final `widget_args_tuple` is the only one whose output is consumed.

A user constructing the widget once should be paying for one pipeline run, not four. This is similar in spirit to issue #706 (over-eager change-detection on traitlets-held DataFrames) — except here it's doing actual work, not just comparing.

### B. Histograms are one query per column instead of one query total

`_numeric_histogram` and `_categorical_histogram` (`buckaroo/customizations/xorq_stats_v2.py:191` and `:238`) each issue a per-column query — an `expr.group_by(col).aggregate(count).order_by(...).limit(10)` for categoricals, a bucketed group-by for numerics. **40 + 6 = 46 of the 51 queries (90%) are histograms.**

For categorical histograms specifically, all N column queries can be expressed as one SQL pass using `array_agg`/`struct_agg` over per-column top-K subqueries, or one CTE that pivots. For numerics, the bucket aggregation can be unified across columns by using the already-known `min`/`max` per column (which the batched aggregate already computes).

The Phase-1 batched aggregate (`table.aggregate(...)` for length / null_count / min / max / etc.) already proves the pattern works — extend it to cover histograms too.

## Why this matters

Comparison to pandas/polars on the same data (no SQL, all in-memory):

| widget | Boston 883k × 26 |
| --- | --- |
| `BuckarooInfiniteWidget` (pandas, post-#706 fix) | 715 ms |
| `PolarsBuckarooInfiniteWidget` | 96 ms |
| `XorqBuckarooInfiniteWidget` (datafusion) | 732 ms |
| `XorqBuckarooInfiniteWidget` (duckdb) | 680 ms |

Even on an in-process embedded engine like datafusion, per-query overhead × dozens of queries ≈ pandas's worst case (which itself is dominated by an unrelated traitlets bug). On a remote backend (Trino, DuckDB-over-network, etc.) the per-query latency would push this from "slow" into "completely unusable."

## Suggested fix shape

In rough priority order:

1. **Stop redundant pipeline runs.** Investigate why `_get_summary_sd` fires 4× per construction — likely some `@observe` chain. Coalesce into one run, or short-circuit when `widget_args_tuple` hasn't actually changed shape.
2. **Batch the histograms.** Either:
   - Fold all per-column histogram aggregates into the existing Phase-1 `table.aggregate(...)` call (one SELECT, many `…name(f"{col}|histogram_bucket_{i}")` columns).
   - Or run all categorical histograms as one `UNPIVOT` / `UNION ALL` query, all numeric histograms as another.
3. **Goal: one batched query for stats, plus one per-column-group histogram query** (still much better than 46). Long-term: one query for everything, like polars.

## Test plan

- [ ] Add a regression test that asserts `len(queries) <= N` for some small N (probably ≤ 10) on a 26-column input. Pipeline-runs-4× alone should be caught with `≤ 2`.
- [ ] Re-run `scripts/perf/perf_xorq.py`. Boston datafusion target: well under 200 ms (down from 793 ms).
- [ ] Confirm the resulting summary stats are byte-identical to the pre-fix pipeline on a representative test fixture.

## References

- Investigation: `docs/research/perf-polars-stats.md` (xorq section)
- Bench: `scripts/perf/perf_xorq.py` (on branch `perf/polars-stats-instrumentation`)
- Stat pipeline: `buckaroo/pluggable_analysis_framework/xorq_stat_pipeline.py:168` (`process_table`)
- Histograms: `buckaroo/customizations/xorq_stats_v2.py:191,238`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XorqStatPipeline issues 51 SQL queries per widget construction (should be 1-2) #709

Summary

Repro

Output (datafusion backend, 50k × 8)

Two problems stacked

A. Pipeline runs the full query set 4× per widget construction

B. Histograms are one query per column instead of one query total

Why this matters

Suggested fix shape

Test plan

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

widget	Boston 883k × 26
`BuckarooInfiniteWidget` (pandas, post-#706 fix)	715 ms
`PolarsBuckarooInfiniteWidget`	96 ms
`XorqBuckarooInfiniteWidget` (datafusion)	732 ms
`XorqBuckarooInfiniteWidget` (duckdb)	680 ms

XorqStatPipeline issues 51 SQL queries per widget construction (should be 1-2) #709

Description

Summary

Repro

Output (datafusion backend, 50k × 8)

Two problems stacked

A. Pipeline runs the full query set 4× per widget construction

B. Histograms are one query per column instead of one query total

Why this matters

Suggested fix shape

Test plan

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions