Skip to content

Conversation

@bakwc
Copy link
Owner

@bakwc bakwc commented Jun 29, 2025

Summary

Adds support for customizable PARTITION BY expressions in ClickHouse table creation to address issues with Snowflake IDs creating too many partitions.

Changes

  • New config option: partition_bys with database/table filtering (similar to indexes)
  • Custom expressions: Override default intDiv(id, 4294967) with user-defined partition logic
  • Backward compatible: Falls back to existing behavior when not configured
  • Test coverage: Modified existing test to verify custom partition functionality

Configuration Example

partition_bys:
  - databases: '*'
    tables: ['test_table']
    partition_by: 'toYYYYMM(created_at)'

Problem Solved

Fixes issue where Snowflake-style IDs (e.g., 1849360358546407424) with default partitioning create excessive partitions, triggering max_partitions_per_insert_block limits. Users can now specify time-based partitioning like toYYYYMM(created_at).

Fixes #161

bakwc added 4 commits June 29, 2025 21:08
- Add partition_bys config option similar to indexes with database/table filtering
- Support custom PARTITION BY expressions to override default intDiv(id, 4294967)
- Useful for time-based partitioning like toYYYYMM(created_at) for Snowflake IDs
- Maintains backward compatibility with existing default behavior
- Add test verification for custom partition_by functionality

Fixes #161
- Add proper deterministic partition_by expression: intDiv(id, 1000000)
- Update test to verify custom vs default partition expressions
- Ensure both CONFIG_FILE and CONFIG_FILE_MARIADB tests pass
- Fix CI failures caused by non-deterministic partition expressions
@bakwc bakwc merged commit 3727e3d into master Jun 29, 2025
1 check passed
@bakwc bakwc deleted the feature/custom-partition-by branch June 29, 2025 18:31
jaredmdobson pushed a commit to ReMatter/mysql_ch_replicator that referenced this pull request Nov 5, 2025
* Add partition_bys config option similar to indexes with database/table filtering
* Support custom PARTITION BY expressions to override default intDiv(id, 4294967)
* Useful for time-based partitioning like toYYYYMM(created_at) for Snowflake IDs
* Maintains backward compatibility with existing default behavior
* Add test verification for custom partition_by functionality

Fixes bakwc#161
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Allow Custom PARTITION BY When Using Snowflake ID (bigint) in Initial Migration

2 participants