Skip to content

count query returns different values #6735

@volfco

Description

@volfco

I'm running a simple count(*) on a distributed table, and I run the same query twice and am getting different row counts- as shown below:


SELECT count(*)
FROM metrics.histograms_dist

┌───count()─┐
│ 551447120 │
└───────────┘

1 rows in set. Elapsed: 0.043 sec. Processed 551.45 million rows, 551.67 MB (12.81 billion rows/s., 12.82 GB/s.) 

cluster :) select count(*) from metrics.histograms_dist;

SELECT count(*)
FROM metrics.histograms_dist

┌───count()─┐
│ 679825498 │
└───────────┘

1 rows in set. Elapsed: 0.058 sec. Processed 679.83 million rows, 680.10 MB (11.77 billion rows/s., 11.77 GB/s.) 

cluster :) select count(*) from metrics.histograms_dist;

SELECT count(*)
FROM metrics.histograms_dist

┌───count()─┐
│ 681610004 │
└───────────┘

1 rows in set. Elapsed: 0.062 sec. Processed 681.61 million rows, 681.88 MB (10.98 billion rows/s., 10.99 GB/s.) 

cluster :) select count(*) from metrics.histograms_dist;

SELECT count(*)
FROM metrics.histograms_dist

┌───count()─┐
│ 553246686 │
└───────────┘

1 rows in set. Elapsed: 0.047 sec. Processed 553.25 million rows, 553.47 MB (11.70 billion rows/s., 11.70 GB/s.) 

cluster :)

Running the query multiple times in a row is how I'm reproducing it.

I have a distributed table, histograms_dist that points to three replicated tables. Sharding is done on rand(). I expect the count to be incremental as rows are being added in the background, but the jump down from 681 million to 553 million is unexpected.

Using v19.13.1.11. No errors or messages appear in the logs to indicate a problem.

Schema:
```CREATE TABLE metrics.histograms (Timestamp DateTime CODEC(DoubleDelta), `Client` LowCardinality(String), `Path` LowCardinality(String), `Tags.Key` Array(String), `Tags.Value` Array(String), `Range` Array(Float32), `Steps` Float32, `IndexNotation` Int32, `Histogram` Array(Float32)) ENGINE = ReplicatedMergeTree('/clickhouse/tables/metrics/{shard}', '{replica}') PARTITION BY toStartOfDay(Timestamp) ORDER BY (Path, Timestamp) SETTINGS index_granularity = 8192

CREATE TABLE metrics.histograms_dist (Timestamp DateTime CODEC(DoubleDelta), Client LowCardinality(String), Path LowCardinality(String), Tags.Key Array(String), Tags.Value Array(String), Range Array(Float32), Steps Float32, IndexNotation Int32, Histogram Array(Float32)) ENGINE = Distributed(cluster, metrics, histograms, rand())```

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed user-visible misbehaviour in official releasest-need-infoWe need extra data to continue (waiting for response). Either some details or a repro of the issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions