Insert into replicated S3 table not fully synchronous #63514

italodamato · 2024-05-08T10:58:06Z

Insert into table using storage policy of type s3 (GCS in my use case) is not fully synchronous and subsequent move/replace partition statement referencing the data that was just inserted can't be executed right away (they work correctly if executed with a delay after the insert).

Steps to reproduce:

DROP TABLE IF EXISTS default.users ON CLUSTER '{cluster}' SETTINGS max_table_size_to_drop=0;
CREATE TABLE default.users ON CLUSTER '{cluster}' (
    uid Int64, 
    ts Date,
    age1 Decimal256(8),  
    age2 Decimal256(8)
  )
  ENGINE=ReplicatedMergeTree -- change to MergeTree and it works correctly
  ORDER BY (uid, ts)
  PARTITION BY (toYYYYMM(ts))
 SETTINGS storage_policy = 'gcs_main' -- comment out  and it works correctly
; 

DROP TABLE IF EXISTS default.users1 ON CLUSTER '{cluster}' SETTINGS max_table_size_to_drop=0;
CREATE TABLE default.users1 ON CLUSTER '{cluster}' AS default.users;
DROP TABLE IF EXISTS default.users2 ON CLUSTER '{cluster}' SETTINGS max_table_size_to_drop=0;
CREATE TABLE default.users2 ON CLUSTER '{cluster}' AS default.users;

INSERT INTO default.users
  SELECT 
  	number,  
  	dateAdd(month, -number%36, date(now())), 
  	sum(number%3),
  	sum(number%3),
  FROM numbers(100000)
  GROUP BY 1
;

INSERT INTO default.users1
  SELECT * FROM default.users WHERE toYYYYMM(ts) = '202405'
;
DELETE FROM default.users WHERE toYYYYMM(ts) = '202405';
DELETE FROM default.users2 WHERE toYYYYMM(ts) = '202405';

INSERT INTO default.users2
  SELECT * FROM default.users1 WHERE toYYYYMM(ts) = '202405'
;

ALTER TABLE default.users ON CLUSTER '{cluster}' REPLACE PARTITION 202405 FROM default.users2  SETTINGS mutations_sync=2, alter_sync=2;
SELECT count(*) FROM default.users where toYYYYMM(ts) = '202405';  -- this shouldn't be zero, count users2 for confirmation

The same query works if either the engine is non-replicated or the storage policy is not of type s3.

CH version: 24.4.1.2088
3 zk nodes
storage on GCS

settings with changed = 1:

add_http_cors_header	1
allow_nondeterministic_mutations	1
async_insert_busy_timeout_max_ms	1000
async_insert_busy_timeout_ms	1000
background_pool_size	256
cancel_http_readonly_queries_on_client_close	1
connect_timeout_with_failover_ms	1000
date_time_input_format	best_effort
distributed_aggregation_memory_efficient	1
distributed_ddl_task_timeout	-1
distributed_foreground_insert	1
do_not_merge_across_partitions_select_final	1
insert_distributed_sync	1
load_balancing	nearest_hostname
log_queries	1
max_http_get_redirects	10
max_insert_block_size	2048000
max_insert_threads	16
max_result_rows	100
min_insert_block_size_bytes	750000000
min_insert_block_size_rows	0
os_thread_priority	2
parallel_view_processing	1
prefer_column_name_to_alias	1
prefer_localhost_replica	0
result_overflow_mode	break

The text was updated successfully, but these errors were encountered:

den-crane · 2024-05-08T15:34:13Z

It's not enough information to understand what is going on.
You need to share related fragments from clickhouse-server.log from ALL REPLICAS.

ALTER TABLE default.users ON CLUSTER '{cluster}' REPLACE PARTITION 202405 FROM default.users2 SETTINGS mutations_sync=2, alter_sync=2;

This is not really synchronous command, even with mutations_sync=2, alter_sync=2. So it's kinda expected if you read from a replica.

italodamato · 2024-05-08T15:59:43Z

All the queries are run on the same server and my suspect is the asynchronous part is in the insert instead of the replace partition. It's as if the replace partition command is not referencing the same data you'd get by running a select statement for some reason (i.e. there's a delay between when the two results match).

Ideally, after the insert is completed, all the following queries (select and move partitions) should reference the same underlying data.

Logs attached:
downloaded-logs-20240508-165525_1 (2).csv

italodamato added the potential bug To be reviewed by developers and confirmed/rejected. label May 8, 2024

alexey-milovidov added st-need-info We need extra data to continue (waiting for response) and removed potential bug To be reviewed by developers and confirmed/rejected. labels May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insert into replicated S3 table not fully synchronous #63514

Insert into replicated S3 table not fully synchronous #63514

italodamato commented May 8, 2024

den-crane commented May 8, 2024 •

edited

italodamato commented May 8, 2024 •

edited

Insert into replicated S3 table not fully synchronous #63514

Insert into replicated S3 table not fully synchronous #63514

Comments

italodamato commented May 8, 2024

den-crane commented May 8, 2024 • edited

italodamato commented May 8, 2024 • edited

den-crane commented May 8, 2024 •

edited

italodamato commented May 8, 2024 •

edited