Support Flexible Partitioning Strategies for Parallel Reads

The Spark connector currently maps ClickHouse shards/partitions directly to Spark partitions. This limits parallelism when tables have few partitions or when querying a single shard, leading to underutilized Spark executors and slower performance.

#### Proposed Solution

Provide an option to partition data differently from ClickHouse's physical layout. For example, use hash-based partitioning on the primary key:
```sql
WHERE cityHash64(primary_key) % N = partition_id


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Flexible Partitioning Strategies for Parallel Reads #468

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Flexible Partitioning Strategies for Parallel Reads #468

Description

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions