count query returns different values

I'm running a simple `count(*)` on a distributed table, and I run the same query twice and am getting different row counts- as shown below:

```cluster :) select count(*) from metrics.histograms_dist;

SELECT count(*)
FROM metrics.histograms_dist

┌───count()─┐
│ 551447120 │
└───────────┘

1 rows in set. Elapsed: 0.043 sec. Processed 551.45 million rows, 551.67 MB (12.81 billion rows/s., 12.82 GB/s.) 

cluster :) select count(*) from metrics.histograms_dist;

SELECT count(*)
FROM metrics.histograms_dist

┌───count()─┐
│ 679825498 │
└───────────┘

1 rows in set. Elapsed: 0.058 sec. Processed 679.83 million rows, 680.10 MB (11.77 billion rows/s., 11.77 GB/s.) 

cluster :) select count(*) from metrics.histograms_dist;

SELECT count(*)
FROM metrics.histograms_dist

┌───count()─┐
│ 681610004 │
└───────────┘

1 rows in set. Elapsed: 0.062 sec. Processed 681.61 million rows, 681.88 MB (10.98 billion rows/s., 10.99 GB/s.) 

cluster :) select count(*) from metrics.histograms_dist;

SELECT count(*)
FROM metrics.histograms_dist

┌───count()─┐
│ 553246686 │
└───────────┘

1 rows in set. Elapsed: 0.047 sec. Processed 553.25 million rows, 553.47 MB (11.70 billion rows/s., 11.70 GB/s.) 

cluster :)
```

Running the query multiple times in a row is how I'm reproducing it. 

I have a distributed table, `histograms_dist` that points to three replicated tables. Sharding is done on `rand()`. I expect the count to be incremental as rows are being added in the background, but the jump down from 681 million to 553 million is unexpected. 

Using v19.13.1.11. No errors or messages appear in the logs to indicate a problem.  

Schema:
```CREATE TABLE metrics.histograms (`Timestamp` DateTime CODEC(DoubleDelta), `Client` LowCardinality(String), `Path` LowCardinality(String), `Tags.Key` Array(String), `Tags.Value` Array(String), `Range` Array(Float32), `Steps` Float32, `IndexNotation` Int32, `Histogram` Array(Float32)) ENGINE = ReplicatedMergeTree('/clickhouse/tables/metrics/{shard}', '{replica}') PARTITION BY toStartOfDay(Timestamp) ORDER BY (Path, Timestamp) SETTINGS index_granularity = 8192

CREATE TABLE metrics.histograms_dist (`Timestamp` DateTime CODEC(DoubleDelta), `Client` LowCardinality(String), `Path` LowCardinality(String), `Tags.Key` Array(String), `Tags.Value` Array(String), `Range` Array(Float32), `Steps` Float32, `IndexNotation` Int32, `Histogram` Array(Float32)) ENGINE = Distributed(cluster, metrics, histograms, rand())```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

count query returns different values #6735

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

count query returns different values #6735

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions