Parent Issue
Part of #124 (support partitioned table)
Depends on #126 (BinaryRow deserialization), #127 (partition path generation)
Background
TableScan::plan_snapshot() currently discards partition information when building DataSplits:
// table_scan.rs:154
for ((_partition, bucket), group_entries) in groups {
// ...
// table_scan.rs:171-173
// todo: consider partitioned table
let bucket_path = format!("{base_path}/bucket-{bucket}");
let partition = BinaryRow::new(0); // Always empty!
}
For partitioned tables, the correct path should be {table_path}/{partition_path}/bucket-{bucket}, e.g., {table_path}/dt=2024-01-01/bucket-0/.
What needs to be done
-
Pass partition type info to plan_snapshot()
- Add partition keys (names) and partition field types (from
TableSchema) as parameters, or pass the TableSchema itself
- Alternatively, change
plan_snapshot() from a static method to an instance method that can access self.table.schema
-
Decode partition bytes into BinaryRow
- For each group key
(partition_bytes, bucket), construct a BinaryRow from the raw bytes using BinaryRow::from_bytes(arity, data)
- The arity is the number of partition keys
-
Generate partition path using PartitionPathUtils
-
Store actual partition data in DataSplit
- Pass the decoded
BinaryRow (with real data) to DataSplitBuilder.with_partition() instead of the empty BinaryRow::new(0)
Affected files
crates/paimon/src/table/table_scan.rs — plan_snapshot() method
Parent Issue
Part of #124 (support partitioned table)
Depends on #126 (BinaryRow deserialization), #127 (partition path generation)
Background
TableScan::plan_snapshot()currently discards partition information when buildingDataSplits:For partitioned tables, the correct path should be
{table_path}/{partition_path}/bucket-{bucket}, e.g.,{table_path}/dt=2024-01-01/bucket-0/.What needs to be done
Pass partition type info to
plan_snapshot()TableSchema) as parameters, or pass theTableSchemaitselfplan_snapshot()from a static method to an instance method that can accessself.table.schemaDecode partition bytes into
BinaryRow(partition_bytes, bucket), construct aBinaryRowfrom the raw bytes usingBinaryRow::from_bytes(arity, data)Generate partition path using
PartitionPathUtilsbucket_pathas{table_path}/{partition_path}/bucket-{bucket}Store actual partition data in
DataSplitBinaryRow(with real data) toDataSplitBuilder.with_partition()instead of the emptyBinaryRow::new(0)Affected files
crates/paimon/src/table/table_scan.rs—plan_snapshot()method