You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi guys, we are trying to switch from fasthll to distinctcounthll. com.clearspring.analytics.stream.cardinality.HyperLogLog; is used in code and org.apache.pinot.core.startree.hll.HllUtil to serialize the hll to a string.
with the same condition we have 1000x difference.
Example:
SELECT fasthll(my_hll), distinctcounthll(my_hll)
FROM counts_table WHERE timestamp >= 1500768000
FastHll will convert one string into a hyperloglog object, which may represent thousand unique values. DistinctCountHLL treats string as a value, not hyperloglog object, so it will return the approximation of how many unique hyperloglog serialized strings, the value should be close to your total number scanned .
fasthll is deprecated because of the low performance of deserialization. You may generate BYTES type for serialized HyperLogLog using org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog) and query it with distinctcounthll
hi guys, we are trying to switch from fasthll to distinctcounthll.
com.clearspring.analytics.stream.cardinality.HyperLogLog;
is used in code andorg.apache.pinot.core.startree.hll.HllUtil
to serialize the hll to a string.with the same condition we have 1000x difference.
Example:
I get results:
Could anyone suggest what's the big difference between them?
The text was updated successfully, but these errors were encountered: