Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert into replicated S3 table not fully synchronous #63514

Open
italodamato opened this issue May 8, 2024 · 2 comments
Open

Insert into replicated S3 table not fully synchronous #63514

italodamato opened this issue May 8, 2024 · 2 comments
Labels
st-need-info We need extra data to continue (waiting for response)

Comments

@italodamato
Copy link

Insert into table using storage policy of type s3 (GCS in my use case) is not fully synchronous and subsequent move/replace partition statement referencing the data that was just inserted can't be executed right away (they work correctly if executed with a delay after the insert).

Steps to reproduce:

DROP TABLE IF EXISTS default.users ON CLUSTER '{cluster}' SETTINGS max_table_size_to_drop=0;
CREATE TABLE default.users ON CLUSTER '{cluster}' (
    uid Int64, 
    ts Date,
    age1 Decimal256(8),  
    age2 Decimal256(8)
  )
  ENGINE=ReplicatedMergeTree -- change to MergeTree and it works correctly
  ORDER BY (uid, ts)
  PARTITION BY (toYYYYMM(ts))
 SETTINGS storage_policy = 'gcs_main' -- comment out  and it works correctly
; 

DROP TABLE IF EXISTS default.users1 ON CLUSTER '{cluster}' SETTINGS max_table_size_to_drop=0;
CREATE TABLE default.users1 ON CLUSTER '{cluster}' AS default.users;
DROP TABLE IF EXISTS default.users2 ON CLUSTER '{cluster}' SETTINGS max_table_size_to_drop=0;
CREATE TABLE default.users2 ON CLUSTER '{cluster}' AS default.users;

INSERT INTO default.users
  SELECT 
  	number,  
  	dateAdd(month, -number%36, date(now())), 
  	sum(number%3),
  	sum(number%3),
  FROM numbers(100000)
  GROUP BY 1
;

INSERT INTO default.users1
  SELECT * FROM default.users WHERE toYYYYMM(ts) = '202405'
;
DELETE FROM default.users WHERE toYYYYMM(ts) = '202405';
DELETE FROM default.users2 WHERE toYYYYMM(ts) = '202405';

INSERT INTO default.users2
  SELECT * FROM default.users1 WHERE toYYYYMM(ts) = '202405'
;

ALTER TABLE default.users ON CLUSTER '{cluster}' REPLACE PARTITION 202405 FROM default.users2  SETTINGS mutations_sync=2, alter_sync=2;
SELECT count(*) FROM default.users where toYYYYMM(ts) = '202405';  -- this shouldn't be zero, count users2 for confirmation

The same query works if either the engine is non-replicated or the storage policy is not of type s3.

CH version: 24.4.1.2088
3 zk nodes
storage on GCS

settings with changed = 1:

add_http_cors_header	1
allow_nondeterministic_mutations	1
async_insert_busy_timeout_max_ms	1000
async_insert_busy_timeout_ms	1000
background_pool_size	256
cancel_http_readonly_queries_on_client_close	1
connect_timeout_with_failover_ms	1000
date_time_input_format	best_effort
distributed_aggregation_memory_efficient	1
distributed_ddl_task_timeout	-1
distributed_foreground_insert	1
do_not_merge_across_partitions_select_final	1
insert_distributed_sync	1
load_balancing	nearest_hostname
log_queries	1
max_http_get_redirects	10
max_insert_block_size	2048000
max_insert_threads	16
max_result_rows	100
min_insert_block_size_bytes	750000000
min_insert_block_size_rows	0
os_thread_priority	2
parallel_view_processing	1
prefer_column_name_to_alias	1
prefer_localhost_replica	0
result_overflow_mode	break
@italodamato italodamato added the potential bug To be reviewed by developers and confirmed/rejected. label May 8, 2024
@den-crane
Copy link
Contributor

den-crane commented May 8, 2024

It's not enough information to understand what is going on.
You need to share related fragments from clickhouse-server.log from ALL REPLICAS.


ALTER TABLE default.users ON CLUSTER '{cluster}' REPLACE PARTITION 202405 FROM default.users2 SETTINGS mutations_sync=2, alter_sync=2;

This is not really synchronous command, even with mutations_sync=2, alter_sync=2. So it's kinda expected if you read from a replica.

@italodamato
Copy link
Author

italodamato commented May 8, 2024

All the queries are run on the same server and my suspect is the asynchronous part is in the insert instead of the replace partition. It's as if the replace partition command is not referencing the same data you'd get by running a select statement for some reason (i.e. there's a delay between when the two results match).

Ideally, after the insert is completed, all the following queries (select and move partitions) should reference the same underlying data.

Logs attached:
downloaded-logs-20240508-165525_1 (2).csv

@alexey-milovidov alexey-milovidov added st-need-info We need extra data to continue (waiting for response) and removed potential bug To be reviewed by developers and confirmed/rejected. labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
st-need-info We need extra data to continue (waiting for response)
Projects
None yet
Development

No branches or pull requests

3 participants