Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intra-node isolation anomaly #7366

Open
michailtoksovo opened this issue Nov 28, 2023 · 2 comments
Open

Intra-node isolation anomaly #7366

michailtoksovo opened this issue Nov 28, 2023 · 2 comments

Comments

@michailtoksovo
Copy link

Simple ACID-test (sum calculation under pgbench), PostgreSQL 15.1/Citus 12.1

  1. Prepare data:
CREATE TABLE table_a 
( id bigint not null
, shard_n int not null
, value int not null default 0
, primary key(shard_n, id)
);

SELECT create_distributed_table('table_a', 'shard_n', shard_count := 2);

-- even to shard #1, odd to shard #2
INSERT INTO table_a (id, shard_n)
SELECT i, 2 - i % 2 FROM generate_series(1, 1024) c(i);

SELECT shard_name, nodename, nodeport, citus_table_type FROM citus_shards;
--    shard_name   | nodename  | nodeport | citus_table_type 
-- ----------------+-----------+----------+------------------
--  table_a_102008 | 127.0.0.1 |     5433 | distributed
--  table_a_102009 | 127.0.0.1 |     5433 | distributed
  1. Run pgbench:
pgbench -c 4 -T 100 -h 127.0.0.1 -p 5433 -f update.sql

update.sql:

\set id random(1, 512)
BEGIN;
-- update #1: random row from table_a_102008 (node 1)
UPDATE table_a SET value = value + 1 WHERE id = (2 * :id - 1) and shard_n = 1;
-- update #2: random row from table_a_102009 (node 1)
UPDATE table_a SET value = value - 1 WHERE id = 2 * :id and shard_n = 2;
COMMIT;
  1. Check sum under pgbench (should always be 0) - always failed
SELECT sum(value) FROM table_a; 
@aykut-bozkurt
Copy link
Contributor

pgbench -c 4 -T 100 -h 127.0.0.1 -p 5433 -f update.sql

I think I know the problem here. You distribute the table against the column 'shard_n'. (row goes to node1 if it is 1, to node2 if it is 2)

Assumption is that we increment a row and decrement another in each transaction, but this is wrong. id check in WHERE clause could cause no row is updated. Not all ids (random generated) exist in each node.

To fix the problem, you can change the insert query like below (to make sure all id in range (1,1024) exists in both shards):

INSERT INTO table_a (id, shard_n)
SELECT i, 1 FROM generate_series(1, 1024) c(i);

INSERT INTO table_a (id, shard_n)
SELECT i, 2 FROM generate_series(1, 1024) c(i);

@marcoslot
Copy link
Collaborator

The problem is lack of distributed snapshot isolation. A basic solution is proposed in #6489

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants