-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement H3 Bucket Hashing Function #33
Comments
|
Went through different iterations while optimizing the algorithm.
Having a modulo operation in the update loop proved to significantly reduce the speed. Timed results show a difference of about 6-7 fold increase of update-time due to the added modulo. Since the value space (number of buckets) that the H3 function is hashing to is assumed to be practically always smaller than the whole theoretical value space of H3, we must have a way to restrict the resulting values to the number of buckets. The solution to that is discussed in the third approach from the list above. The difference in execution speed of the conditional didn't prove significant in the initial testing. The timed sketch-update execution times across 40+ runs were averaged and then compared to the ones with the conditional, with improvements being around 3% *[1]. I will need to re-test using a distribution from the whole range of unsigned int as it was pointed to me that this then should have a more significant impact on performance. *[2] An alternative to the modulo was proposed by my advisor Martin Kiefer [REF], a so-called truncation method where the bits of the resulting unsigned integer are capped so that only the n most right bits are kept. These n bits give a range of values that correspond to the number of buckets. There are two cases for that:
|
for each row to max_row
key_i_bit & row
tmp_row = first_row
for each row to (max_row - 1)
tmp_row ^= next_row
return tmp_row
The text was updated successfully, but these errors were encountered: