-
Notifications
You must be signed in to change notification settings - Fork 8.3k
count query returns different values #6735
Description
I'm running a simple count(*) on a distributed table, and I run the same query twice and am getting different row counts- as shown below:
SELECT count(*)
FROM metrics.histograms_dist
┌───count()─┐
│ 551447120 │
└───────────┘
1 rows in set. Elapsed: 0.043 sec. Processed 551.45 million rows, 551.67 MB (12.81 billion rows/s., 12.82 GB/s.)
cluster :) select count(*) from metrics.histograms_dist;
SELECT count(*)
FROM metrics.histograms_dist
┌───count()─┐
│ 679825498 │
└───────────┘
1 rows in set. Elapsed: 0.058 sec. Processed 679.83 million rows, 680.10 MB (11.77 billion rows/s., 11.77 GB/s.)
cluster :) select count(*) from metrics.histograms_dist;
SELECT count(*)
FROM metrics.histograms_dist
┌───count()─┐
│ 681610004 │
└───────────┘
1 rows in set. Elapsed: 0.062 sec. Processed 681.61 million rows, 681.88 MB (10.98 billion rows/s., 10.99 GB/s.)
cluster :) select count(*) from metrics.histograms_dist;
SELECT count(*)
FROM metrics.histograms_dist
┌───count()─┐
│ 553246686 │
└───────────┘
1 rows in set. Elapsed: 0.047 sec. Processed 553.25 million rows, 553.47 MB (11.70 billion rows/s., 11.70 GB/s.)
cluster :)
Running the query multiple times in a row is how I'm reproducing it.
I have a distributed table, histograms_dist that points to three replicated tables. Sharding is done on rand(). I expect the count to be incremental as rows are being added in the background, but the jump down from 681 million to 553 million is unexpected.
Using v19.13.1.11. No errors or messages appear in the logs to indicate a problem.
Schema:
```CREATE TABLE metrics.histograms (Timestamp DateTime CODEC(DoubleDelta), `Client` LowCardinality(String), `Path` LowCardinality(String), `Tags.Key` Array(String), `Tags.Value` Array(String), `Range` Array(Float32), `Steps` Float32, `IndexNotation` Int32, `Histogram` Array(Float32)) ENGINE = ReplicatedMergeTree('/clickhouse/tables/metrics/{shard}', '{replica}') PARTITION BY toStartOfDay(Timestamp) ORDER BY (Path, Timestamp) SETTINGS index_granularity = 8192
CREATE TABLE metrics.histograms_dist (Timestamp DateTime CODEC(DoubleDelta), Client LowCardinality(String), Path LowCardinality(String), Tags.Key Array(String), Tags.Value Array(String), Range Array(Float32), Steps Float32, IndexNotation Int32, Histogram Array(Float32)) ENGINE = Distributed(cluster, metrics, histograms, rand())```