Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kafkaMurmurHash function #48185

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
49 changes: 44 additions & 5 deletions docs/en/sql-reference/functions/hash-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -441,11 +441,11 @@ SELECT farmHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:0

## javaHash

Calculates JavaHash from a [string](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/String.java#l1452),
[Byte](https://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/Byte.java#l405),
[Short](https://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/Short.java#l410),
[Integer](https://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/Integer.java#l959),
[Long](https://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/Long.java#l1060).
Calculates JavaHash from a [string](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/String.java#l1452),
[Byte](https://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/Byte.java#l405),
[Short](https://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/Short.java#l410),
[Integer](https://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/Integer.java#l959),
[Long](https://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/Long.java#l1060).
This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.

Note that Java only support calculating signed integers hash, so if you want to calculate unsigned integers hash you must cast it to proper signed ClickHouse types.
Expand Down Expand Up @@ -660,6 +660,45 @@ Result:
└──────────────────────┴─────────────────────┘
```


## kafkaMurmurHash

Calculates a 32-bit [MurmurHash2](https://github.com/aappleby/smhasher) hash value using the same hash seed as [Kafka](https://github.com/apache/kafka/blob/461c5cfe056db0951d9b74f5adc45973670404d7/clients/src/main/java/org/apache/kafka/common/utils/Utils.java#L482) and without the highest bit to be compatible with [Default Partitioner](https://github.com/apache/kafka/blob/139f7709bd3f5926901a21e55043388728ccca78/clients/src/main/java/org/apache/kafka/clients/producer/internals/BuiltInPartitioner.java#L328).

**Syntax**

```sql
MurmurHash(par1, ...)
```

**Arguments**

- `par1, ...` — A variable number of parameters that can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md/#data_types).

**Returned value**

- Calculated hash value.

Type: [UInt32](/docs/en/sql-reference/data-types/int-uint.md).

**Example**

Query:

```sql
SELECT
kafkaMurmurHash('foobar') AS res1,
kafkaMurmurHash(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS res2
```

Result:

```response
┌───────res1─┬─────res2─┐
│ 1357151166 │ 85479775 │
└────────────┴──────────┘
```

## murmurHash3_32, murmurHash3_64

Produces a [MurmurHash3](https://github.com/aappleby/smhasher) hash value.
Expand Down
23 changes: 23 additions & 0 deletions src/Functions/FunctionsHashing.h
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,28 @@ struct GccMurmurHashImpl
static constexpr bool use_int_hash_for_pods = false;
};

/// To be compatible with Default Partitioner in Kafka:
/// murmur2: https://github.com/apache/kafka/blob/461c5cfe056db0951d9b74f5adc45973670404d7/clients/src/main/java/org/apache/kafka/common/utils/Utils.java#L480
/// Default Partitioner: https://github.com/apache/kafka/blob/139f7709bd3f5926901a21e55043388728ccca78/clients/src/main/java/org/apache/kafka/clients/producer/internals/BuiltInPartitioner.java#L328
struct KafkaMurmurHashImpl
{
static constexpr auto name = "kafkaMurmurHash";

using ReturnType = UInt32;

static UInt32 apply(const char * data, const size_t size)
{
return MurmurHash2(data, size, 0x9747b28cU) & 0x7fffffff;
}

static UInt32 combineHashes(UInt32 h1, UInt32 h2)
{
return IntHash32Impl::apply(h1) ^ h2;
}

static constexpr bool use_int_hash_for_pods = false;
};

struct MurmurHash3Impl32
{
static constexpr auto name = "murmurHash3_32";
Expand Down Expand Up @@ -1727,6 +1749,7 @@ using FunctionMetroHash64 = FunctionAnyHash<ImplMetroHash64>;
using FunctionMurmurHash2_32 = FunctionAnyHash<MurmurHash2Impl32>;
using FunctionMurmurHash2_64 = FunctionAnyHash<MurmurHash2Impl64>;
using FunctionGccMurmurHash = FunctionAnyHash<GccMurmurHashImpl>;
using FunctionKafkaMurmurHash = FunctionAnyHash<KafkaMurmurHashImpl>;
using FunctionMurmurHash3_32 = FunctionAnyHash<MurmurHash3Impl32>;
using FunctionMurmurHash3_64 = FunctionAnyHash<MurmurHash3Impl64>;
using FunctionMurmurHash3_128 = FunctionAnyHash<MurmurHash3Impl128>;
Expand Down
1 change: 1 addition & 0 deletions src/Functions/FunctionsHashingMurmur.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,6 @@ REGISTER_FUNCTION(HashingMurmur)
factory.registerFunction<FunctionMurmurHash3_64>();
factory.registerFunction<FunctionMurmurHash3_128>();
factory.registerFunction<FunctionGccMurmurHash>();
factory.registerFunction<FunctionKafkaMurmurHash>();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,7 @@ javaHashUTF16LE
joinGet
joinGetOrNull
jumpConsistentHash
kafkaMurmurHash
kostikConsistentHash
lcm
least
Expand Down
5 changes: 5 additions & 0 deletions tests/queries/0_stateless/02676_kafka_murmur_hash.reference
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
1173551340
1357151166
1161502112
661178819
2088585677
8 changes: 8 additions & 0 deletions tests/queries/0_stateless/02676_kafka_murmur_hash.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-- Test are taken from: https://github.com/apache/kafka/blob/139f7709bd3f5926901a21e55043388728ccca78/clients/src/test/java/org/apache/kafka/common/utils/UtilsTest.java#L93
-- and the reference is generated with: https://pastila.nl/?06465d36/87f8ab2c9f6501c54f1c0879a13c8626

SELECT kafkaMurmurHash('21');
SELECT kafkaMurmurHash('foobar');
SELECT kafkaMurmurHash('a-little-bit-long-string');
SELECT kafkaMurmurHash('a-little-bit-longer-string');
SELECT kafkaMurmurHash('lkjh234lh9fiuh90y23oiuhsafujhadof229phr9h19h89h8');