Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running OPTIMIZE FINAL #50547

Open
Shlomixg opened this issue Jun 4, 2023 · 0 comments
Open

Error when running OPTIMIZE FINAL #50547

Shlomixg opened this issue Jun 4, 2023 · 0 comments
Labels
potential bug To be reviewed by developers and confirmed/rejected.

Comments

@Shlomixg
Copy link

Shlomixg commented Jun 4, 2023

Describe what's wrong

Our system is sensitive to duplicated data, so we use OPTIMIZE FINAL on ReplacingMergeTree tables to remove duplications.
When trying to run OPTIMIZE FINAL on 2 large tables (~2 billion tables & ~200 million) we get this error, on both tables:

Orig exception: Code: 74. DB::ErrnoException: Cannot read from file: /var/lib/clickhouse/store/c48/c487a9b6-3d95-42e8-b168-cd9c45a9758b/all_1_1134741_47/<column_name>.bin, errno: 22, strerror: Invalid argument: Cache info: Buffer path: /var/lib/clickhouse/store/c48/c487a9b6-3d95-42e8-b168-cd9c45a9758b/all_1_1134741_47/<column_name>.bin, hash key: d8713dc8fbd955852c4cac2756e46d72, file_offset_of_buffer_end: 0, internal buffer remaining read range: [0:5554177], read_type: REMOTE_FS_READ_AND_PUT_IN_CACHE, last caller: c487a9b6-3d95-42e8-b168-cd9c45a9758b::all_1_1138616_48:67, file segment info: None: (while reading column <column_name>): (while reading from part /var/lib/clickhouse/store/c48/c487a9b6-3d95-42e8-b168-cd9c45a9758b/all_1_1134741_47/ from mark 0 with max_rows_to_read = 8192): While executing MergeTreeSequentialSource. (CANNOT_READ_FROM_FILE_DESCRIPTOR) (version 23.3.1.2823 (official build))

Both tables are defined as ReplactingMergeTree, and the problematic column is the first column on both tables.
We can query the tables, insert data and delete without any problem.

We have run these commands many times in the past and never encountered this error.
We can't reproduce it on other large tables in the same environment or other environments.

How to reproduce

ClickHouse Version: 23.3.1.2823
Query: OPTIMIZE TABLE <table_name> FINAL

Expected behavior

Remove duplicated rows (take the most updated row)

@Shlomixg Shlomixg added the potential bug To be reviewed by developers and confirmed/rejected. label Jun 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug To be reviewed by developers and confirmed/rejected.
Projects
None yet
Development

No branches or pull requests

1 participant