Skip to content

[Bug] Wildcard ** + ALIGN BY DEVICE + GROUP BY + value filter returns incorrect COUNT when data spans multiple TsFiles #17520

@betterlmy

Description

@betterlmy

Search before asking

  • I searched in the issues and found nothing similar.

Version

  • OS: Linux (WSL2, kernel 6.6.87)
  • IoTDB Version: 2.0.5 (production), 2.0.8 (reproduced locally with apache/iotdb:latest Docker image)
  • Deployment: Standalone

Describe the bug and provide the minimal reproduce step

Bug Summary

When querying with a wildcard path (**), ALIGN BY DEVICE, GROUP BY time interval, and a value filter (WHERE on non-timestamp columns), some time buckets return COUNT = 0 instead of the actual count.

The bug is only triggered when the device's data is spread across multiple TsFiles (i.e., written in separate batches before compaction merges them). The same query on the exact device path always returns the correct result.

Minimal Reproduce Steps

Step 1 — Disable compaction (to keep TsFiles separate and prevent the bug from being masked):

Add to iotdb-system.properties:

enable_seq_space_compaction=false
enable_unseq_space_compaction=false
enable_cross_space_compaction=false

Step 2 — Write data in 4 separate batches with FLUSH between each:

-- Batch 1 (day 1 data)
INSERT INTO root.test.service.tenant1.device1(TIMESTAMP, method, code, msgId)
  VALUES (1776147444096, 25, null, 'aaa'),
         (1776148288048, 26, 0,    'aaa');
FLUSH;

-- Batch 2 (day 2 data)
INSERT INTO root.test.service.tenant1.device1(TIMESTAMP, method, code, msgId)
  VALUES (1776233509152, 25, null, 'bbb'),
         (1776233509201, 26, 0,    'bbb');
FLUSH;

-- Batch 3 (day 3 data — this batch will be MISSING in wildcard query)
INSERT INTO root.test.service.tenant1.device1(TIMESTAMP, method, code, msgId)
  VALUES (1776300241254, 25, null, 'ccc'),
         (1776300241302, 26, 0,    'ccc');
FLUSH;

-- Batch 4 (day 7 data — this batch will also be MISSING in wildcard query)
INSERT INTO root.test.service.tenant1.device1(TIMESTAMP, method, code, msgId)
  VALUES (1776642467102, 25, null, 'ddd'),
         (1776642467150, 26, 0,    'ddd');
FLUSH;

Confirm multiple TsFiles exist (all ending in -0-0.tsfile, meaning not yet compacted):

sequence/root.test/3/xxxx/...-1-0-0.tsfile
sequence/root.test/3/xxxx/...-2-0-0.tsfile
sequence/root.test/3/yyyy/...-1-0-0.tsfile
sequence/root.test/3/yyyy/...-2-0-0.tsfile

Step 3 — Run Query A (wildcard):

SELECT COUNT(method)
FROM root.test.service.**
WHERE method = 26 AND code = 0
GROUP BY([1773936000000, 1776700800000), 1d)
ALIGN BY DEVICE;

Step 4 — Run Query B (exact path):

SELECT COUNT(method)
FROM root.test.service.tenant1.device1
WHERE method = 26 AND code = 0
GROUP BY([1773936000000, 1776700800000), 1d)
ALIGN BY DEVICE;

What did you expect to see?

Query A and Query B return identical COUNT values for every time bucket.

What did you see instead?

Query A returns COUNT = 0 for the time buckets that correspond to Batch 3 and Batch 4 (data written into separate TsFiles after the first flush). Query B returns the correct non-zero counts for those same buckets.

Results from real production data

Device: root.test.service.39eaf2bd9ca130da3368e3ac9cfcbdff.BaV7gK84WlEZz4RGArXa92knaxTEJU

Time Bucket (UTC) Query A ** (wrong) Query B exact path (correct)
2026-04-13T16:00:00.000Z 17 17
2026-04-14T16:00:00.000Z 6 6
2026-04-15T16:00:00.000Z 0 6
2026-04-16T16:00:00.000Z 0 0
2026-04-17T16:00:00.000Z 0 0
2026-04-18T16:00:00.000Z 0 0
2026-04-19T16:00:00.000Z 0 6

The two incorrect buckets (Apr 16 and Apr 20 CST) correspond exactly to data batches written into TsFiles that were created after the first flush had already occurred.

Anything else?

Why this happens in production but is hard to reproduce in dev:

In a production IoT environment, devices continuously write data across multiple days. Each daily batch triggers a new TsFile via memtable flush. Compaction may not run fast enough to merge them before the next query arrives, so the bug is consistently observable. In a development environment with sparse data (small volume, single batch), all data lands in one TsFile and the bug never triggers.

Compaction masks the bug:

With compaction enabled (default), if the compaction task completes before the query runs, TsFiles are merged and the wildcard query returns correct results. This makes the bug appear intermittent, but it is actually deterministic: it always occurs when data spans multiple uncompacted TsFiles.

Affected versions:

Both 2.0.5 and 2.0.8 (apache/iotdb:latest as of 2026-04-20) were confirmed to reproduce the bug under identical conditions (compaction disabled, data in 4 separate TsFiles, 3+ devices under the wildcard path).

Workaround:

Query specific device paths instead of using **. Enumerate device paths from application-layer metadata and pass them explicitly to the query.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions