Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bfloat16 histogram state merging is slow when compared with tdigest #32228

Open
vmihailenco opened this issue Dec 4, 2021 · 3 comments
Open

Comments

@vmihailenco
Copy link
Contributor

vmihailenco commented Dec 4, 2021

If looks like bfloat16 state merging is at least 2x times slower than tdigest even though compressed bfloat16 histogram is smaller on disk and faster in some/most benchmarks. Does bfloat16 merging requires more computational resources or there is an inefficiency in current implementation?

For example, for this table:

CREATE TABLE test (
  td AggregateFunction(quantilesTDigest(0.5, 0.9, 0.99), UInt64),
  bf AggregateFunction(quantilesBFloat16(0.5, 0.9, 0.99), UInt64),
  grouping_key UInt64
)
ENGINE = SummingMergeTree()
ORDER BY (grouping_key);

insert into test
select
  quantilesTDigestState(0.5, 0.9, 0.99)(number),
  quantilesBFloat16State(0.5, 0.9, 0.99)(number),
  number % 10000 as grouping_key
from numbers(10e7)
group by grouping_key;

optimize table test final;

TDigest:

SELECT quantilesTDigestMerge(0.5)(td)
FROM test

Query id: 2ec3e5e3-49fe-46be-9208-be8201b4775a

┌─quantilesTDigestMerge(0.5)(td)─┐
│ [100047170]                    │
└────────────────────────────────┘

1 rows in set. Elapsed: 0.172 sec. Processed 10.00 thousand rows, 2.43 MB (58.24 thousand rows/s., 14.16 MB/s.)

BFloat16:

SELECT quantilesBFloat16Merge(0.5)(bf)
FROM test

Query id: b72df84a-30db-4dbf-9fab-d24902e3bf2a

┌─quantilesBFloat16Merge(0.5)(bf)─┐
│ [99614720]                      │
└─────────────────────────────────┘

1 rows in set. Elapsed: 0.425 sec. Processed 10.00 thousand rows, 5.31 MB (23.51 thousand rows/s., 12.50 MB/s.)

The situation is similar with production data.

@vmihailenco
Copy link
Contributor Author

CC @RedClusive in case you have any input / ideas...

@vmihailenco
Copy link
Contributor Author

For what it is worth, I've tried to re-compile clickhouse using crc32 hash instead of trivial hash for bfloat16 HashMap. No change in performance.

@UnamedRus
Copy link
Contributor

It's because BFloat16 histogram use UInt64 as Weight parameter, so each pair use 10 bytes, 2 for bfloat16 key and 8 for UInt64 Weight, so state size for quantileBFloat16 is up to 50%+ bigger than for TDigest, which use 2 Float32 values for it's Centroids

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants