Add support for aggregates with an internal stype#8505
Add support for aggregates with an internal stype#8505colm-mchugh merged 7 commits intocitusdata:mainfrom
Conversation
| SELECT key, internalsum(val), sum(val) from aggdata group by key order by key; | ||
|
|
||
| DROP AGGREGATE internalsum(int8); | ||
|
|
There was a problem hiding this comment.
Couple of additional test suggestions:
- An aggregate with an internal stype but no serial funcs. This should be caught by error handling, as before this PR - can this be tested ?
- An aggregate with internal stype that is text serializable only (is
falseforIsAggTransTypeBinarySerializable()) - This would exercise the serialization support added toworker_partial_agg_ffunc().
I'm not sure if the second suggestion is gated by the first - an internal stype with invalid serialfn should result in an error - if so then is the change to worker_partial_agg_ffunc() redundant ? In any case, the ask is to exercise the change in worker_partial_agg_ffunc() if possible.
There was a problem hiding this comment.
Added a test for the aggregate without serialfunc. the thing is SERIAL/DESERIAL will always go through the binary agg since the output of a SERIALFUNC is bytea (by construction) so the second is not testable
There was a problem hiding this comment.
Thanks for adding the additional tests. Given that 2) is not testable, does it make sense to remove the serial check from worker_partial_agg_ffunc() ? Given that these are execution functions, keep them as lean as possible.
sfc-gh-mslot
left a comment
There was a problem hiding this comment.
Just for context. One thing that has historically kept us from doing this was uncertainty around whether serialized state is actually safe to transfer over the network (e.g. never contains things like OIDS), since it's generally only used for IPC.
I would still go ahead, because it's a very worthwhile optimization, but maybe have a GUC to enable/disable in case nasty errors come up.
colm-mchugh
left a comment
There was a problem hiding this comment.
Concerned that a serialization function that returns NULL on non-NULL input is not handled - see comments in CheckAndCallSerialFunc() - and with no constraints on user-defined aggregates (such as strictness or non NULL) we could be exposed here.
colm-mchugh
left a comment
There was a problem hiding this comment.
Lgtm, couple of minor asks.
…ha/add-internal-agg-support
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (22.41%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.
Additional details and impacted files@@ Coverage Diff @@
## main #8505 +/- ##
===========================================
- Coverage 88.92% 44.80% -44.12%
===========================================
Files 286 286
Lines 63129 62253 -876
Branches 7914 7652 -262
===========================================
- Hits 56135 27892 -28243
- Misses 4727 31961 +27234
- Partials 2267 2400 +133 🚀 New features to boost your workflow:
|
DESCRIPTION: Add support for aggregates with an internal stype
Citus has historically required custom aggregates to not have an internal stype except for specific internal aggregates. This has led to a number of workarounds to get performance and custom aggregates working with distributed tables.
This change removes that restriction by mirroring Postgres's use of the SERIALFUNC and DESERIALFUNC to roundtrip state for aggregates' internal stype metadata between workers and coordinators allowing more natural use of custom aggregates in Citus.