The Spark connector currently maps ClickHouse shards/partitions directly to Spark partitions. This limits parallelism when tables have few partitions or when querying a single shard, leading to underutilized Spark executors and slower performance.
Proposed Solution
Provide an option to partition data differently from ClickHouse's physical layout. For example, use hash-based partitioning on the primary key:
WHERE cityHash64(primary_key) % N = partition_id