-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
I use kafka streaming to build my pinot realtime partition table.The partition key is studentID,which I defined it as Integer both in pinot table schema and kafka producer.Then I use Integer's HashCode function to make partitions, and I custom a partitioner class for kafka producer, the partition function is like this:
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
// I didn't convert `key` to String
return Utils.toPositive(key.hashCode() % numPartitions);
}
I found I can't get result if I use where studentID = 111 as predicate, then I found the reason is the broker calculate partition number for partition column using this way:
Line 216 in 46ed731
| partitionInfo._partitionFunction.getPartition(operands.get(1).getLiteral().getFieldValue().toString())); |
which will convert field value to String value before doing the hashCode function.
In my opinion, both kafka and piont have type systems, and different types have their own hashCode function.So is it better using their own hashCode function to calculate partition number for partition columns, not have to convert to string value before?