You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If looks like bfloat16 state merging is at least 2x times slower than tdigest even though compressed bfloat16 histogram is smaller on disk and faster in some/most benchmarks. Does bfloat16 merging requires more computational resources or there is an inefficiency in current implementation?
For example, for this table:
CREATETABLEtest (
td AggregateFunction(quantilesTDigest(0.5, 0.9, 0.99), UInt64),
bf AggregateFunction(quantilesBFloat16(0.5, 0.9, 0.99), UInt64),
grouping_key UInt64
)
ENGINE = SummingMergeTree()
ORDER BY (grouping_key);
insert into test
select
quantilesTDigestState(0.5, 0.9, 0.99)(number),
quantilesBFloat16State(0.5, 0.9, 0.99)(number),
number % 10000as grouping_key
from numbers(10e7)
group by grouping_key;
optimize table test final;
TDigest:
SELECT quantilesTDigestMerge(0.5)(td)
FROM test
Query id: 2ec3e5e3-49fe-46be-9208-be8201b4775a
┌─quantilesTDigestMerge(0.5)(td)─┐
│ [100047170] │
└────────────────────────────────┘
1 rows inset. Elapsed: 0.172 sec. Processed 10.00 thousand rows, 2.43 MB (58.24 thousand rows/s., 14.16 MB/s.)
BFloat16:
SELECT quantilesBFloat16Merge(0.5)(bf)
FROM test
Query id: b72df84a-30db-4dbf-9fab-d24902e3bf2a
┌─quantilesBFloat16Merge(0.5)(bf)─┐
│ [99614720] │
└─────────────────────────────────┘
1 rows inset. Elapsed: 0.425 sec. Processed 10.00 thousand rows, 5.31 MB (23.51 thousand rows/s., 12.50 MB/s.)
The situation is similar with production data.
The text was updated successfully, but these errors were encountered:
It's because BFloat16 histogram use UInt64 as Weight parameter, so each pair use 10 bytes, 2 for bfloat16 key and 8 for UInt64 Weight, so state size for quantileBFloat16 is up to 50%+ bigger than for TDigest, which use 2 Float32 values for it's Centroids
If looks like bfloat16 state merging is at least 2x times slower than tdigest even though compressed bfloat16 histogram is smaller on disk and faster in some/most benchmarks. Does bfloat16 merging requires more computational resources or there is an inefficiency in current implementation?
For example, for this table:
TDigest:
BFloat16:
The situation is similar with production data.
The text was updated successfully, but these errors were encountered: