-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] flaky incorrect results of hashjoin_spill in parallel mode on x86_64 #259
Comments
I didn't reproduce it on my x86_64 machine, a 1000 times test show no failed cases: for in in {1..1000};
do PGOPTIONS='-c optimizer=off -c enable_parallel=true -c min_parallel_table_scan_size=0 -c min_parallel_index_scan_size=0 -c force_parallel_mode=1 -c gp_appendonly_insert_files=0'
./pg_regress --inputdir=. --bindir='/home/gpadmin/install/gp9/bin' --load-extension=gp_inject_fault --dlpath=. --max-connections=20 --max-concurrent-tests=24 --init-file=./init_file
workfile/hashjoin_spill | grep -i 'workfile/hashjoin_spill' | tee -a debug.log;
done cat debug.log | grep -i ok | wc -l
1000 |
We got the wrong results again: https://github.com/cloudberrydb/cloudberrydb/actions/runs/6661394625/job/18104487842?pr=265 --- /code/gpdb_src/src/test/regress/expected/workfile/hashjoin_spill.out 2023-10-27 01:05:56.552220732 +0000
+++ /code/gpdb_src/src/test/regress/results/workfile/hashjoin_spill.out 2023-10-27 01:05:56.554220688 +0000
@@ -43,7 +50,7 @@
select avg(i3::numeric) from (SELECT t1.* FROM test_hj_spill AS t1 RIGHT JOIN test_hj_spill AS t2 ON t1.i1=t2.i2) foo;
avg
----------------------
- 499.5000000000000000
+ 499.4107171506497674
(1 row) Pr #260 doesn't fix this. |
first attempts of reproducing succeed with the following diffs: --- /home/gpadmin/cbdb/src/test/regress/expected/workfile/hashjoin_spill.out 2023-10-27 11:44:25.385210186 +0800
+++ /home/gpadmin/cbdb/src/test/regress/results/workfile/hashjoin_spill.out 2023-10-27 11:44:25.385210186 +0800
@@ -43,7 +49,7 @@
select avg(i3::numeric) from (SELECT t1.* FROM test_hj_spill AS t1 RIGHT JOIN test_hj_spill AS t2 ON t1.i1=t2.i2) foo;
avg
----------------------
- 499.5000000000000000
+ 498.7976339559297647
(1 row)
select * from hashjoin_spill.is_workfile_created('explain (analyze, verbose) SELECT t1.* FROM test_hj_spill AS t1 RIGHT JOIN test_hj_spill AS t2
ON t1.i1=t2.i2'); |
|
@Ray-Eldath and I debug this for a while, we have reproduced this about 2~3% of a 200-times loop tests in case of high load. But still can't reproduce it stably. And a deep experiment shows that it's related with count | sum | avg
-------+----------+----------------------
37409 | 18716789 | 500.3285038359753001
(1 row)
count | sum | avg
-------+----------+----------------------
37398 | 18676962 | 499.4107171506497674
(1 row)
count | sum | avg
-------+----------+----------------------
37398 | 18676962 | 499.4107171506497674
(1 row) the right results: count | sum | avg
-------+----------+----------------------
45000 | 22477500 | 499.5000000000000000
(1 row)
The reason may be that the CBDB's special memory limit (work file)doesn't consider parallel-aware hashjoin as it's introduced since GPDB init commit. We may need to re-implement the work file feature with parallel-aware hashjoin. |
Update: I add some logs to print shared hash table's total rows, a wrong result appears when total rows is correct: gpadmin=# select count(i3), avg(i3::numeric) from (SELECT t1.* FROM test_hj_spill AS t1 RIGHT JOIN test_
hj_spill AS t2 ON t1.i1=t2.i2) foo;
INFO: shared hash table has 15060 rows. (seg1 slice1 127.0.1.1:9003 pid=661224)
INFO: shared hash table has 15060 rows. (seg1 slice1 127.0.1.1:9003 pid=661222)
INFO: shared hash table has 15010 rows. (seg0 slice1 127.0.1.1:9002 pid=661221)
INFO: shared hash table has 14930 rows. (seg2 slice1 127.0.1.1:9004 pid=661225)
INFO: shared hash table has 14930 rows. (seg2 slice1 127.0.1.1:9004 pid=661223)
INFO: shared hash table has 15010 rows. (seg0 slice1 127.0.1.1:9002 pid=661220)
count | avg
-------+----------------------
37409 | 500.3285038359753001
(1 row)
gpadmin=# explain(costs off) select count(i3), avg(i3::numeric) from (SELECT t1.* FROM test_hj_spill AS t1 RIGHT JOIN test_hj_spill AS t2 ON t1.i1=t2.i2) foo;
QUERY PLAN
------------------------------------------------------------------------
Finalize Aggregate
-> Gather Motion 6:1 (slice1; segments: 6)
-> Partial Aggregate
-> Parallel Hash Left Join
Hash Cond: (t2.i2 = t1.i1)
-> Redistribute Motion 6:6 (slice2; segments: 6)
Hash Key: t2.i2
Hash Module: 3
-> Parallel Seq Scan on test_hj_spill t2
-> Parallel Hash
-> Parallel Seq Scan on test_hj_spill t1
Optimizer: Postgres query optimizer
(12 rows) And I modify the SQL to inner join, the issue can't be reproduced anymore with a 200 times loop. |
Negative. gpadmin=# explain select count(i3), avg(i3::numeric) from (SELECT t1.* FROM test_hj_spill AS t1 JOIN test_hj_spill AS t2 ON t1.i2=t2.i1) foo;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
-------
Finalize Aggregate (cost=684.26..684.27 rows=1 width=40)
-> Gather Motion 6:1 (slice1; segments: 6) (cost=684.13..684.21 rows=6 width=40)
-> Partial Aggregate (cost=684.13..684.14 rows=1 width=40)
-> Parallel Hash Join (cost=196.75..627.88 rows=7500 width=4)
Hash Cond: (t1.i2 = t2.i1)
-> Redistribute Motion 6:6 (slice2; segments: 6) (cost=0.00..253.00 rows=7500 wi
dth=8)
Hash Key: t1.i2
Hash Module: 3
-> Parallel Seq Scan on test_hj_spill t1 (cost=0.00..103.00 rows=7500 width
=8)
-> Parallel Hash (cost=103.00..103.00 rows=7500 width=4)
-> Parallel Seq Scan on test_hj_spill t2 (cost=0.00..103.00 rows=7500 width
=4)
Optimizer: Postgres query optimizer
(12 rows) |
Update: psql:p.sql:3: INFO: shared hash table has 15060 rows. (seg1 slice1 127.0.1.1:9003 pid=890950)
psql:p.sql:3: INFO: shared hash table has 15010 rows. (seg0 slice1 127.0.1.1:9002 pid=890949)
psql:p.sql:3: INFO: shared hash table has 15010 rows. (seg0 slice1 127.0.1.1:9002 pid=890948)
psql:p.sql:3: INFO: shared hash table has 15060 rows. (seg1 slice1 127.0.1.1:9003 pid=890951)
psql:p.sql:3: INFO: shared hash table has 14930 rows. (seg2 slice1 127.0.1.1:9004 pid=890952)
psql:p.sql:3: INFO: shared hash table has 14930 rows. (seg2 slice1 127.0.1.1:9004 pid=890953)
psql:p.sql:3: INFO: parallel outer batch has total 7592 rows. (seg0 slice1 127.0.1.1:9002 pid=890948)
psql:p.sql:3: INFO: parallel outer batch has total 7420 rows. (seg0 slice1 127.0.1.1:9002 pid=890949)
psql:p.sql:3: INFO: parallel outer batch has total 7603 rows. (seg2 slice1 127.0.1.1:9004 pid=890952)
psql:p.sql:3: INFO: parallel outer batch has total 7329 rows. (seg2 slice1 127.0.1.1:9004 pid=890953)
psql:p.sql:3: INFO: parallel outer batch has total 7470 rows. (seg1 slice1 127.0.1.1:9003 pid=890950)
psql:p.sql:3: INFO: parallel outer batch has total 7592 rows. (seg1 slice1 127.0.1.1:9003 pid=890951)
count | sum | avg
-------+----------+----------------------
45000 | 22477500 | 499.5000000000000000
(1 row) 6 process' logs of psql:p.sql:3: INFO: shared hash table has 15010 rows. (seg0 slice1 127.0.1.1:9002 pid=890919)
psql:p.sql:3: INFO: shared hash table has 15060 rows. (seg1 slice1 127.0.1.1:9003 pid=890921)
psql:p.sql:3: INFO: shared hash table has 14930 rows. (seg2 slice1 127.0.1.1:9004 pid=890922)
psql:p.sql:3: INFO: shared hash table has 14930 rows. (seg2 slice1 127.0.1.1:9004 pid=890923)
psql:p.sql:3: INFO: shared hash table has 15060 rows. (seg1 slice1 127.0.1.1:9003 pid=890920)
psql:p.sql:3: INFO: parallel outer batch has total 7420 rows. (seg0 slice1 127.0.1.1:9002 pid=890919)
psql:p.sql:3: INFO: parallel outer batch has total 7592 rows. (seg1 slice1 127.0.1.1:9003 pid=890921)
psql:p.sql:3: INFO: parallel outer batch has total 7329 rows. (seg2 slice1 127.0.1.1:9004 pid=890923)
psql:p.sql:3: INFO: parallel outer batch has total 7603 rows. (seg2 slice1 127.0.1.1:9004 pid=890922)
psql:p.sql:3: INFO: parallel outer batch has total 7470 rows. (seg1 slice1 127.0.1.1:9003 pid=890920)
psql:p.sql:3: INFO: shared hash table has 15010 rows. (seg0 slice1 127.0.1.1:9002 pid=890918)
count | sum | avg
-------+----------+----------------------
37409 | 18716789 | 500.3285038359753001
(1 row) 5 process' logs of |
Adding some logs show that: when wrong results happened, one worker of seg1 didn't participate in psql:p.sql:3: INFO: shared hash table has 15060 rows. (seg1 slice1 127.0.1.1:9003 pid=938113)
psql:p.sql:3: INFO: parallel build_barrier total participants: 1. (seg1 slice1 127.0.1.1:9003 pid=938113) psql:p.sql:3: INFO: parallel build_barrier participants begin, nbatch: 2 (seg1 slice1 127.0.1.1:9003 pid=938113)
psql:p.sql:3: INFO: parallel outer batch begin fetch rows. (seg1 slice1 127.0.1.1:9003 pid=938113)
psql:p.sql:3: INFO: parallel outer batch has total 7592 rows. (seg1 slice1 127.0.1.1:9003 pid=938113)
psql:p.sql:3: INFO: shared hash table has 15060 rows. (seg1 slice1 127.0.1.1:9003 pid=938112)
psql:p.sql:3: INFO: parallel build_barrier total participants: 1. (seg1 slice1 127.0.1.1:9003 pid=938112) |
another failed case: https://github.com/cloudberrydb/cloudberrydb/actions/runs/6729213443/job/18290602808?pr=278 select avg(i3::numeric) from (SELECT t1.* FROM test_hj_spill AS t1 RIGHT JOIN test_hj_spill AS t2 ON t1.i1=t2.i2) foo;
avg
----------------------
- 499.5000000000000000
+ 498.7976339559297647
(1 row) |
A 1000-times loop test shows no incorrect results with #284 , we could keep this issue open for several months to check the CI results. |
comparison500 loops test
|
#284 has been merged, let's keep this issue open for a while to see if it's really resolved. |
Didn't get this issue for three months since #284 was merged, shall we close this? @my-ship-it |
Issue doesn't exist since #284 , close. |
Cloudberry Database version
main branch 951be2d
What happened
or previous :
What you think should happen instead
No response
How to reproduce
We got this kind of results diff occasionally on several CI, ex: https://github.com/cloudberrydb/cloudberrydb/actions/runs/5960899566/job/16169472506?pr=142
and internal CI: https://code.hashdata.xyz/cloudberry/cbdb/-/jobs/366882
Operating System
x86_64
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: