You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When enable_parallel is off, we will insert into only one AO segfile even gp_appendonly_insert_files is > 1.
Think about the case: user set enable_parallel to on, have some data inserted, query and reset it to false.
That will make data skew after user set enable_parallel to off, and there are a lot of data inserted later or an online-steaming ETL(all data would be inserted into only one segfile).
And that make our parallel plan has a bottleneck.
We should take it back, insert into multiple files according to gp_appendonly_insert_files whatever enable_parallel is.
In general, we should try to make AO segfiles as much as gp_appendonly_insert_files and avoid data skew for users, no matter users use parallel or not.
And only keep gp_appendonly_insert_files default value to 4 is enough.
What you think should happen instead
No response
How to reproduce
Need to create cases.
Operating System
Ubuntu
Anything else
By fixing this, to make regression pass , we need to set GUC gp_appendonly_insert_files = 0 when deploying CBDB at CI pipeline. Need help from @sandiandian .
Cloudberry Database version
No response
What happened
When enable_parallel is off, we will insert into only one AO segfile even
gp_appendonly_insert_files
is > 1.Think about the case: user set enable_parallel to on, have some data inserted, query and reset it to false.
That will make data skew after user set enable_parallel to off, and there are a lot of data inserted later or an online-steaming ETL(all data would be inserted into only one segfile).
And that make our parallel plan has a bottleneck.
We should take it back, insert into multiple files according to
gp_appendonly_insert_files
whateverenable_parallel
is.In general, we should try to make AO segfiles as much as
gp_appendonly_insert_files
and avoid data skew for users, no matter users use parallel or not.And only keep
gp_appendonly_insert_files
default value to 4 is enough.What you think should happen instead
No response
How to reproduce
Need to create cases.
Operating System
Ubuntu
Anything else
By fixing this, to make regression pass , we need to set GUC
gp_appendonly_insert_files
= 0 when deploying CBDB at CI pipeline. Need help from @sandiandian .Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: