-
Notifications
You must be signed in to change notification settings - Fork 201
Description
Apache Cloudberry version
PostgreSQL 14.4 (Apache Cloudberry 2.1.1 build 118781)
What happened
explain analyze select a,sum(a) ,count(distinct b),count(distinct c) from t1 group by a ;
In gp7 ,The query plan like this:
QUERY PLAN
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..7284102.91 rows=99750 width=35) (actual time=16873700.033..17017567.064 rows=100000 loops=1)
-> Sequence (cost=0.00..7284087.23 rows=49875 width=35) (actual time=17017450.151..17017547.817 rows=50056 loops=1)
-> Shared Scan (share slice:id 1:0) (cost=0.00..869248.66 rows=7603200000 width=1) (actual time=0.000..3095640.961 rows=0 loops=1)
-> Seq Scan on t1 (cost=0.00..196973.72 rows=7603200000 width=47) (actual time=1.265..1264155.093 rows=7611715584 loops=1)
-> Hash Join (cost=0.00..6414836.82 rows=49875 width=35) (actual time=13921809.200..13921903.013 rows=50056 loops=1)
Hash Cond: (NOT (share0_ref3.a IS DISTINCT FROM share0_ref2.a))
Extra Text: (seg1) Hash chain length 1.4 avg, 6 max, using 35002 of 65536 buckets.
-> Hash Join (cost=0.00..3772696.37 rows=49875 width=38) (actual time=8982097.168..8982163.085 rows=50056 loops=1)
Hash Cond: (NOT (share0_ref4.a IS DISTINCT FROM share0_ref3.a))
Extra Text: (seg1) Hash chain length 1.4 avg, 6 max, using 35002 of 65536 buckets.
-> HashAggregate (cost=0.00..1130555.97 rows=49875 width=19) (actual time=4044452.387..4044488.489 rows=50056 loops=1)
Group Key: share0_ref4.a
Extra Text: (seg0) hash table(s): 1; chain length 3.5 avg, 17 max; using 49944 of 65536 buckets; total 0 expansions.
-> Shared Scan (share slice:id 1:0) (cost=0.00..201991.83 rows=7603200000 width=11) (actual time=24.577..1015774.214 rows=7611715584 loops=1)
-> Hash (cost=2642098.02..2642098.02 rows=49875 width=19) (actual time=4937644.230..4937644.232 rows=50056 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 3006kB
-> HashAggregate (cost=0.00..2642098.02 rows=49875 width=19) (actual time=4937619.454..4937630.349 rows=50056 loops=1)
Group Key: share0_ref3.a
Extra Text: (seg0) hash table(s): 1; chain length 3.5 avg, 17 max; using 49944 of 65536 buckets; total 0 expansions.
-> HashAggregate (cost=0.00..2300327.77 rows=2798455079 width=22) (actual time=4937583.762..4937596.052 rows=50056 loops=1)
Group Key: share0_ref3.a, share0_ref3.b
Planned Partitions: 256
Extra Text: (seg0) hash table(s): 1; chain length 2.2 avg, 8 max; using 49944 of 131072 buckets; total 0 expansions.
-> Shared Scan (share slice:id 1:0) (cost=0.00..403552.66 rows=7603200000 width=22) (actual time=13.745..1010632.572 rows=7611715584 loops=1)
-> Hash (cost=2642098.02..2642098.02 rows=49875 width=19) (actual time=4939711.503..4939711.505 rows=50056 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 3006kB
-> HashAggregate (cost=0.00..2642098.02 rows=49875 width=19) (actual time=4939686.695..4939697.085 rows=50056 loops=1)
Group Key: share0_ref2.a
Extra Text: (seg0) hash table(s): 1; chain length 3.5 avg, 17 max; using 49944 of 65536 buckets; total 0 expansions.
-> HashAggregate (cost=0.00..2300327.77 rows=2798455079 width=22) (actual time=4939653.933..4939664.446 rows=50056 loops=1)
Group Key: share0_ref2.a, share0_ref2.c
Planned Partitions: 256
Extra Text: (seg0) hash table(s): 1; chain length 2.2 avg, 6 max; using 49944 of 131072 buckets; total 0 expansions.
-> Shared Scan (share slice:id 1:0) (cost=0.00..403552.66 rows=7603200000 width=22) (actual time=12.466..1005955.666 rows=7611715584 loops=1)
Optimizer: GPORCA
Planning Time: 46.188 ms
(slice0) Executor memory: 10868K bytes.
- (slice1) Executor memory: 43314K bytes avg x 2 workers, 43339K bytes max (seg1). Work_mem: 31876K bytes max, 607527036K bytes wanted.
Memory used: 128000kB
Memory wanted: 6682797886kB
Execution Time: 17029281.450 ms
But in clouddbery, It will fall back to postgres planner ,and generate another query plan :
INFO: GPORCA failed to produce a plan, falling back to Postgres-based planner
DETAIL: Falling back to Postgres-based planner because GPORCA does not support the following feature: Invalid system target list found for AO table.
QUERY PLAN
Gather Motion 4:1 (slice1; segments: 4) (cost=9686890.19..10527206.48 rows=980243 width=59) (actual time=55856.132..145504.949 rows=1000000 loops=1)
-> GroupAggregate (cost=9686890.19..10514953.45 rows=245061 width=59) (actual time=58029.176..144649.932 rows=251048 loops=1)
Group Key: a
-> Sort (cost=9686890.19..9851890.19 rows=66000000 width=33) (actual time=58029.176..80074.623 rows=66276672 loops=1)
Sort Key: a
Sort Method: external merge Disk: 11107072kB
-> Seq Scan on t1 (cost=0.00..663648.00 rows=66000000 width=33) (actual time=2.000..8175.166 rows=66276672 loops=1)
Planning Time: 47.174 ms
(slice0) Executor memory: 46K bytes.
(slice1) Executor memory: 61559K bytes avg x 4x(0) workers, 61559K bytes max (seg0). Work_mem: 61559K bytes max.
What you think should happen instead
No response
How to reproduce
create table t1 (
a numeric,
b numeric,
c numeric,
d numeric)
using ao_column with(compresstype=zstd)
explain analyze select a,sum(a) ,count(distinct b),count(distinct c) from t1 group by a ;
Operating System
Red Hat Enterprise Linux release 8.10 (Ootpa)
Anything else
No response
Are you willing to submit PR?
- Yes, I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct.