fix: splitNspans / splitEachNspans on spanset return SpanSet not LIST<Span>#58
Open
estebanzimanyi wants to merge 12 commits intomainfrom
Open
fix: splitNspans / splitEachNspans on spanset return SpanSet not LIST<Span>#58estebanzimanyi wants to merge 12 commits intomainfrom
estebanzimanyi wants to merge 12 commits intomainfrom
Conversation
…n types Lands the first MobilityDuck AggregateFunction registration: extent(span) that aggregates a column of spans into the bounding span of all values. Registers a single AggregateFunctionSet "extent" with five overloads (intspan, bigintspan, floatspan, datespan, tstzspan). Implementation pattern (in src/temporal/span_aggregates.cpp): - SpanExtentState holds an inline Span + bool isset. The MEOS Span is a fixed-size struct so the state is POD and trivially serialisable. - SpanExtentFunction provides Initialize / Operation / ConstantOperation / Combine / Finalize via DuckDB's UnaryAggregate template helper. - Operation and Combine bridge to MEOS span_extent_transfn, which expands state's bounds in place when state is non-null and otherwise returns a fresh palloc'd copy. Since our state is inline we just memcpy the Span on first input. - Finalize uses finalize_data.ReturnString(string_t(&state.span, sizeof(Span))) to emit the result blob. This pattern extends naturally to: - extent(spanset) via spanset_extent_transfn - extent(set) via set_extent_transfn - extent(tnumber) via tnumber_extent_transfn (returns TBox state) - tcount / tand / tor / tmin / tmax / tsum via skiplist-backed states - spanUnion(span) via spanset_union_transfn This first PR establishes the infrastructure; follow-up PRs land each of the listed aggregates against the same scaffolding.
Adds 10 new aggregate overloads to the existing extent() set:
- extent(<spanset>) -> <span> for int/bigint/float/date/tstzspanset
- extent(<set>) -> <span> for int/bigint/float/date/tstzset
(textset has no span equivalent)
Each new functor (SpansetExtentFunction, SetExtentFunction) reuses the
shared SpanExtentState (Span span + bool isset) and the shared
Initialize / Combine / Finalize via SpanExtentBase. Operation dispatches
to the corresponding MEOS transfn:
- spanset_extent_transfn(state, ss)
- set_extent_transfn(state, s)
These return a palloc'd Span* on first call (NULL state) which we
memcpy-into-state-and-free, then expand the inline state in place on
subsequent calls — same convention as the original SpanExtentFunction.
…ne set
Adds two new aggregate overloads:
- extent(tint) → tbox
- extent(tfloat) → tbox
Implementation in new module src/temporal/temporal_aggregates.cpp:
- TboxExtentState holds an inline TBox (period + value-span + flags),
fixed-size POD.
- TnumberExtentFunction copies the input Temporal blob into a heap
allocation before passing to MEOS tnumber_extent_transfn. On first
call MEOS palloc's a fresh TBox; we memcpy into state and free.
Subsequent calls expand the inline state via the same MEOS function.
- Combine merges two TBox states via tbox_expand (replicating MEOS's
inline path when both operands are non-null).
Refactors how extent() overloads are registered: previously each module
registered its own AggregateFunctionSet("extent"), which broke when
two modules both did it — DuckDB rejects overlap-by-name registrations
because CreateAggregateFunctionInfo has no GetAlterInfo override.
Now both span_aggregates.cpp and temporal_aggregates.cpp expose
AddExtentOverloads(AggregateFunctionSet&) that mutate a shared set. The
load site in LoadInternal builds one set, calls both, then registers
once.
…gregate
Adds tCount() temporal-count aggregate over all 4 temporal base types:
tint, tbool, tfloat, ttext. Returns a tint where each instant carries
the count of input temporal values valid at that instant.
Implementation pattern (new in MobilityDuck — establishes the scaffolding
for tand/tor/tmin/tmax/tsum):
- TCountState holds a heap-allocated MEOS SkipList* (the skiplist
itself lives outside the inline aggregate state buffer).
- Lifetime is managed by registering the aggregate with
AggregateFunction::UnaryAggregateDestructor, which installs a
destructor that calls our TCountFunction::Destroy → skiplist_free.
- Operation copies the input Temporal blob (varlena) into a heap
allocation and calls temporal_tcount_transfn, which allocates a
fresh SkipList on the first call and extends it in place after.
- Combine merges two skiplist states via temporal_skiplist_splice
with a sum merge function. MEOS's datum_sum_int32 isn't exported
so we provide a local equivalent (mduck_datum_sum_int32).
- Finalize calls temporal_tagg_finalfn which produces a Temporal*
AND frees the skiplist; we null state.skiplist after the call so
the destructor doesn't double-free.
Verified for:
* single-row, multi-distinct, dup-instant (sums to 2), overlapping
sequences (count rises in overlap region)
* tint, tfloat, tbool, ttext
* NULL handling: ignored mid-run, all-NULL → returns NULL
Adds 5 more skiplist-backed temporal aggregates on top of the infrastructure landed for tCount: tAnd(tbool) -> tbool per-instant boolean AND tOr(tbool) -> tbool per-instant boolean OR tagg_min(tint) -> tint per-instant minimum tagg_min(tfloat) -> tfloat tagg_max(tint) -> tint per-instant maximum tagg_max(tfloat) -> tfloat tagg_sum(tint) -> tint per-instant sum tagg_sum(tfloat) -> tfloat ttext skipped — text merge needs text_cmp plumbing not yet wired. Implementation factors out the boilerplate from TCountFunction into a templated SkiplistAggFunction<TRANSFN, MERGE_FN, CROSSINGS> functor. Each aggregate is one MakeSkiplistAggregate<...> instantiation. The merge functions used by Combine (datum_and, datum_or, datum_min_int32, datum_max_int32, datum_sum_int32, datum_min_float8, datum_max_float8, datum_sum_float8) are NOT exported from MEOS, so local equivalents are provided. Float8 conversion goes through a union to handle the IEEE 754 bit-pattern in Datum. Naming note: tagg_min / tagg_max / tagg_sum instead of tMin / tMax / tSum because Tmin / Tmax already exist as scalar time-min / time-max accessors on tbox / stbox. DuckDB's catalog is case-insensitive at the name level, so registering an aggregate named tMin would trigger an ALTER path on the existing scalar that CreateAggregateFunctionInfo doesn't implement, and the extension would fail to load.
Adds spanUnion aggregate that merges a column of spans (or spansets)
into a single canonical spanset. 10 overloads:
spanUnion(<span>) -> <spanset> for {int, bigint, float, date, tstz}span
spanUnion(<spanset>) -> <spanset> same 5 types
Closes the architectural gap documented in the parity manifest at
test/sql/parity/015_span_aggfuncs.test (PR #21).
Implementation: new SpansetUnionState holds a heap-allocated
SpanSet*. Same destructor pattern as the skiplist-backed aggregates.
Operation copies the input blob, calls span_union_transfn or
spanset_union_transfn, and reassigns state.spanset to the returned
pointer (the transfn may return a new spanset and free the old
state when it grows). Combine merges via spanset_union_transfn
on (target, source). Finalize calls spanset_union_finalfn which
compacts and frees the state — null state.spanset after to avoid
double-free.
The transfn dispatch is templated on a bool INPUT_IS_SPANSET so
the same SpanUnionFunction handles both Span and SpanSet inputs.
Verified for distinct / overlapping / adjacent spans, nested
spansets, floatspan / tstzspan, and NULL handling (mid-run + all-
NULL → returns NULL).
Adds the ttext branch of the temporal min/max aggregates that the prior PR explicitly skipped (ttext merge needs text_cmp plumbing). MEOS exports text_cmp as a public API, so the wrapper datum_min_text / datum_max_text equivalents are simple two-line functions. Each merge fn casts both Datums to text* (PG varlena) and invokes text_cmp; the result selects which Datum to return. tagg_min picks the smaller, tagg_max the larger.
…mestamptz) Adds 5 scalar overloads to the extent() AggregateFunctionSet, matching MobilityDB's CREATE AGGREGATE extent(integer) / (bigint) / (float) / (date) / (timestamptz) registrations: extent(integer) -> intspan extent(bigint) -> bigintspan extent(float) -> floatspan (DOUBLE in DuckDB) extent(date) -> datespan extent(timestamptz) -> tstzspan Each subtype gets its own functor that bridges to the matching MEOS *_extent_transfn (int_extent_transfn, bigint_extent_transfn, etc.). Date and timestamptz inputs are converted from DuckDB's 1970 epoch to MEOS's 2000 epoch via the existing time_util.hpp helpers (ToMeosDate / DuckDBToMeosTimestamp). The output Span is stored in MEOS-native epoch form, which round-trips correctly through the existing span in/out functions. Verified for distinct, NULL-handling (mid-run), and round-trip correctness for date / tstz across the epoch boundary.
Adds sliding-window aggregates over tint and tfloat: wmin(tint, INTERVAL) -> tint min over each [t, t+interval] window wmin(tfloat, INTERVAL) -> tfloat wmax(tint, INTERVAL) -> tint wmax(tfloat, INTERVAL) -> tfloat wsum(tint, INTERVAL) -> tint wsum(tfloat, INTERVAL) -> tfloat Each input row contributes its temporal value plus the (constant) window-width interval; the aggregate emits a single temporal value whose values at each instant reflect the minimum / maximum / sum over the window centred on that instant. Implementation: WindowAggFunction<TRANSFN, MERGE_FN, CROSSINGS> binary aggregate over (Temporal blob, interval_t). The interval is converted to MEOS Interval via the existing time_util.hpp helper and forwarded to MEOS tint_w*_transfn / tfloat_w*_transfn. State, Combine, Finalize, and Destroy reuse the skiplist-backed pattern. wavg deferred — needs datum_sum_double2 which is a MEOS-internal struct (sum, count) not available through the public API.
Adds the spatial branch of extent() — aggregates a column of tgeompoint values into the spatiotemporal bounding box (STBox) over all values. State holds an inline STBox (fixed-size POD: time period, x/y/z bounds, SRID, flags). Operation copies the input Temporal blob, calls MEOS tspatial_extent_transfn, and assigns or expands state.box. Combine merges via stbox_expand. Finalize returns the box blob or NULL. The new SpatialAggregates::AddExtentOverloads is wired into the same extent() AggregateFunctionSet that already carries span / spanset / set / tnumber / scalar overloads.
…<Span> Previously the spanset variants of splitNspans / splitEachNspans returned the MEOS Span* output array as a LIST<Span> column. MobilityDB returns these as a single SpanSet, which is the natural shape for a partitioned spanset. Fix: wrap the MEOS result array in spanset_make() and emit a single spanset blob per row. Update the four registrations in spanset.cpp (camelCase + lowercase variants, two functions) to declare return type `spanset_type` instead of `LIST(child_type)`.
Reflects the return-type change made in this branch (LIST<Span> -> SpanSet). The values themselves match the existing canonicalised output; only the wrapping container differs.
3 tasks
Member
Author
Consolidation note — title doesn't match content; surface overlaps #103This PR's title is "fix: splitNspans / splitEachNspans on spanset return SpanSet not LIST" but the 12 commits actually cover:
The first 10 commits are all aggregate-functions parity work that overlaps directly with #103 ( Suggested resolutionPer the new
Currently |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`splitNspans(spanset, n)` and `splitEachNspans(spanset, n)` previously returned the MEOS `Span*` output array as a `LIST` column. MobilityDB returns these as a single `SpanSet`, which is the natural shape for a partitioned spanset and matches the upstream regression tests.
```sql
-- before:
SELECT splitNspans(intspanset '{[1, 5)}', 1);
-- ['[1, 5)'] :: intspan[]
-- after:
SELECT splitNspans(intspanset '{[1, 5)}', 1);
-- {[1, 5)} :: intspanset
```
Implementation
Two-line conceptual change:
Note on remaining value-divergence
The value divergence flagged in the parity test annotation
(MobilityDuck's
{[1, 8), [9, 10)}vs MobilityDB's{[1, 4), [5, 10)}for the same input) appears to be MEOS-internal normalization
behavior in
spanset_makeafter the partitioner runs — bothrepresentations are valid 2-partitions, just grouped differently.
This PR does not change that.
Test plan
intspanset/floatspanset/ etc., notintspan[]