Skip to content

backups: add prefixes to backup keys to improve S3 sharding#88418

Merged
jkartseva merged 48 commits intoClickHouse:masterfrom
jkartseva:backup-keys-shard
Nov 14, 2025
Merged

backups: add prefixes to backup keys to improve S3 sharding#88418
jkartseva merged 48 commits intoClickHouse:masterfrom
jkartseva:backup-keys-shard

Conversation

@jkartseva
Copy link
Copy Markdown
Member

@jkartseva jkartseva commented Oct 13, 2025

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

S3 partitions objects internally based on key-name prefixes and automatically scales to high request rates per partition. This change introduces two new BACKUP settings: data_file_name_generator and data_file_name_prefix_length. When data_file_name_generator=checksum, backup data files are named using a hash of their contents. Example: for a checksum = abcd1234ef567890abcd1234ef567890 and data_file_name_prefix_length = 3, the resulting path will be: abc/d1234ef567890abcd1234ef567890. The resulting key distribution enhances load balancing across S3 partitions and reduces the risk of throttling.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Details

Addresses https://github.com/ClickHouse/clickhouse-private/issues/35866#issuecomment-3346135504

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Oct 13, 2025

Workflow [PR], commit [1517d12]

Summary:

job_name test_name status info comment
BuzzHouse (amd_debug) failure
Buzzing result failure cidb
BuzzHouse (amd_tsan) failure
Buzzing result failure cidb

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Oct 13, 2025
@jkartseva jkartseva marked this pull request as draft October 13, 2025 05:04
@jkartseva jkartseva marked this pull request as ready for review October 14, 2025 05:55
@vitlibar vitlibar self-assigned this Oct 16, 2025
@nikitamikhaylov
Copy link
Copy Markdown
Member

@jkartseva Let's not expose the key_prefix_template, but automatically shard by 3 symbols. It should be enough.

@nikitamikhaylov
Copy link
Copy Markdown
Member

Also, please check how restores would work with this scheme.

@jkartseva
Copy link
Copy Markdown
Member Author

Let's not expose the key_prefix_template, but automatically shard by 3 symbols. It should be enough.

I think it should be enabled through a setting – either a query setting or a configuration setting, currently under backups.key_prefix_length.

We'll also need to account for incremental backups. If we enable 3 by default, the data path patterns could be mixed. I haven't tested it fully yet.

[
pytest.param("", id="first_file_name"),
pytest.param("checksum"),
],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to create a separate test*.py file, enable there the checksum data file name generation and copy there some of these tests.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pytest.fixture(
params=[[], ["configs/data_file_name_generator_from_checksum.xml"]],
ids=["data_file_name_from_first_file_name", "data_file_name_from_checksum"],
scope="module",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this parametrization works?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a parametrized fixture – pytest will run every test that depends on setup_cluster twice, once for each value in params.
[] adds no extra config file
["configs/data_file_name_generator_from_checksum.xml"] adds one more config file that directs data_file_name to be default from checksum.

Extra configs passed as a list, which concatenated with the main_congis list in setup_cluster function.

node1.query("SYSTEM SYNC REPLICA ON CLUSTER 'cluster' tbl")

backup_name = new_backup_name()
backup_settings = {"data_file_name_generator": name_gen} if name_gen else None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could randomly enable this feature in some tests. So if something fails it'll make the tests flaky.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to to this in the integration tests. I can think of fault injection in C++ code, such as randomly adding a data_file_name_generator=checksum setting when a backup is created, but it may not be what we want.

@vitlibar
Copy link
Copy Markdown
Member

vitlibar commented Nov 5, 2025

If I understood it correctly we need this feature mostly for S3. We are not sure if we need it for Azure or for backups on local disks. Thus it seems we need to broadly test this feature for S3, and maybe just a bit of testing for Azure and for backups on local disks. So it seems right to keep your parametrized test approach for S3, and maybe have just one test for Azure and one for backups on local disks.

@jkartseva jkartseva enabled auto-merge November 14, 2025 07:40
@jkartseva jkartseva added this pull request to the merge queue Nov 14, 2025
Merged via the queue into ClickHouse:master with commit 7196ad8 Nov 14, 2025
248 of 256 checks passed
@jkartseva jkartseva deleted the backup-keys-shard branch November 14, 2025 21:36
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants