Implement H3 Bucket Hashing Function #33

beepy0 · 2019-06-02T10:39:39Z

for each row to max_row
key_i_bit & row

tmp_row = first_row
for each row to (max_row - 1)
tmp_row ^= next_row

return tmp_row

beepy0 · 2019-06-14T16:11:34Z

Write notes

beepy0 · 2019-06-16T09:17:33Z

Went through different iterations while optimizing the algorithm.

Simple with a conditional and a modulo to cap the hashed result
Simple without conditional but with modulo
Simple without conditional and with modulo substitute - truncation function

Having a modulo operation in the update loop proved to significantly reduce the speed. Timed results show a difference of about 6-7 fold increase of update-time due to the added modulo. Since the value space (number of buckets) that the H3 function is hashing to is assumed to be practically always smaller than the whole theoretical value space of H3, we must have a way to restrict the resulting values to the number of buckets. The solution to that is discussed in the third approach from the list above.

The difference in execution speed of the conditional didn't prove significant in the initial testing. The timed sketch-update execution times across 40+ runs were averaged and then compared to the ones with the conditional, with improvements being around 3% *[1]. I will need to re-test using a distribution from the whole range of unsigned int as it was pointed to me that this then should have a more significant impact on performance. *[2]

An alternative to the modulo was proposed by my advisor Martin Kiefer [REF], a so-called truncation method where the bits of the resulting unsigned integer are capped so that only the n most right bits are kept. These n bits give a range of values that correspond to the number of buckets. There are two cases for that:

1 If the buckets size is a power of 2, then n is ceil(log2(buckets size)), as there are enough bit combinations in an unsigned int of size n to fit all of them.
2 If the buckets size is not a power of 2, we also use the same n, but then have to subtract buckets size from each resulting value, making the end result always lower than buckets size since for these types of sizes, you cannot fit buckets size twice into ceil(log2(buckets size)).
It is also clear that this method presents another conditional, which could be mitigated by the following code *[3]:
1. Re-run the tests on the xeon machine to see difference in conditional removal with a numbers-range around the zero.
2. Test the conditional removal on the xeon machine with a numbers-range of the whole uint
3. Include a snippet of the code
Try to see if runtime improves if you shift the truncation in the H3 block
Do more testing on the xeon machine, and with more widespread data
Try to further optimize the code

beepy0 added this to the Baseline milestone Jun 2, 2019

beepy0 self-assigned this Jun 2, 2019

beepy0 closed this as completed Jun 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement H3 Bucket Hashing Function #33

Implement H3 Bucket Hashing Function #33

beepy0 commented Jun 2, 2019

beepy0 commented Jun 14, 2019

beepy0 commented Jun 16, 2019 •

edited

Loading

Implement H3 Bucket Hashing Function #33

Implement H3 Bucket Hashing Function #33

Comments

beepy0 commented Jun 2, 2019

beepy0 commented Jun 14, 2019

beepy0 commented Jun 16, 2019 • edited Loading

beepy0 commented Jun 16, 2019 •

edited

Loading