Load data to a table but different sum shards size in worker nodes #7600

yinan8128 · 2024-05-15T10:38:30Z

Hi Team,

We load data to 1 table, but sum shards size are different in worker nodes

ALTER TABLE ONLY distribute_table ADD CONSTRAINT distribute_table_pkey PRIMARY KEY (tenant_uuid,id);

SELECT create_distributed_table('distribute_table', 'tenant_uuid');

psql -h localhost -p 5432 -d load15_taskmanager -U tm_dflt_user -f /opt/task_manager_dump/distribute_table_data.sql

SELECT nodename,sum(shard_size) FROM citus_shards where table_name='distribute_table'::regclass group by nodename;
        nodename         |     sum
-------------------------+-------------
 citus_vm_02 | 18296504320
 citus_vm_03 |  5631139840
 citus_vm_04 |  5836931072
 citus_vm_05 |  5470191616
(4 rows)

SELECT nodename,count(*) FROM citus_shards where table_name='tasks_export'::regclass group by nodename;
        nodename         | count
-------------------------+-------
 taskmanager_citus_vm_02 |    16
 taskmanager_citus_vm_03 |    16
 taskmanager_citus_vm_04 |    16
 taskmanager_citus_vm_05 |    16
(4 rows)

The shard count is same in each work node, so it is because that some tenant have more records than others,

Please help answer:
1.We know that rebalance can balance size, but just it is difficult for storage planning, for example we want to migrate normal postgresql 400GB to 4 worker nodes citusdb, plan 100GB per worker node, is there any solution can load data have same sum shard size in worker nodes?

2.If question 1 no solution:
2.1. If every time is the first work node have the biggest sum shards size?
2.2. If question 2.1 answer is yes, what biggest size percentage of total size will be in the first work node, 50% 200GB or other value?
2.3. If question 2.1 answer is yes, what biggest size percentage of total size will be in other work nodes?

The text was updated successfully, but these errors were encountered:

yinan8128 · 2024-05-17T07:25:37Z

Hi Team,

Could you please help check? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load data to a table but different sum shards size in worker nodes #7600

Load data to a table but different sum shards size in worker nodes #7600

yinan8128 commented May 15, 2024 •

edited

Loading

yinan8128 commented May 17, 2024

Load data to a table but different sum shards size in worker nodes #7600

Load data to a table but different sum shards size in worker nodes #7600

Comments

yinan8128 commented May 15, 2024 • edited Loading

yinan8128 commented May 17, 2024

yinan8128 commented May 15, 2024 •

edited

Loading