Skip to content

Does the partition field have to be converted to String type first? #8294

@GXM2333

Description

@GXM2333

I use kafka streaming to build my pinot realtime partition table.The partition key is studentID,which I defined it as Integer both in pinot table schema and kafka producer.Then I use Integer's HashCode function to make partitions, and I custom a partitioner class for kafka producer, the partition function is like this:

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        // I didn't convert `key` to String 
        return Utils.toPositive(key.hashCode() % numPartitions);
    }

I found I can't get result if I use where studentID = 111 as predicate, then I found the reason is the broker calculate partition number for partition column using this way:

partitionInfo._partitionFunction.getPartition(operands.get(1).getLiteral().getFieldValue().toString()));

which will convert field value to String value before doing the hashCode function.

In my opinion, both kafka and piont have type systems, and different types have their own hashCode function.So is it better using their own hashCode function to calculate partition number for partition columns, not have to convert to string value before?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions