Search before asking
Version
- OS: Linux (WSL2, kernel 6.6.87)
- IoTDB Version: 2.0.5 (production), 2.0.8 (reproduced locally with
apache/iotdb:latest Docker image)
- Deployment: Standalone
Describe the bug and provide the minimal reproduce step
Bug Summary
When querying with a wildcard path (**), ALIGN BY DEVICE, GROUP BY time interval, and a value filter (WHERE on non-timestamp columns), some time buckets return COUNT = 0 instead of the actual count.
The bug is only triggered when the device's data is spread across multiple TsFiles (i.e., written in separate batches before compaction merges them). The same query on the exact device path always returns the correct result.
Minimal Reproduce Steps
Step 1 — Disable compaction (to keep TsFiles separate and prevent the bug from being masked):
Add to iotdb-system.properties:
enable_seq_space_compaction=false
enable_unseq_space_compaction=false
enable_cross_space_compaction=false
Step 2 — Write data in 4 separate batches with FLUSH between each:
-- Batch 1 (day 1 data)
INSERT INTO root.test.service.tenant1.device1(TIMESTAMP, method, code, msgId)
VALUES (1776147444096, 25, null, 'aaa'),
(1776148288048, 26, 0, 'aaa');
FLUSH;
-- Batch 2 (day 2 data)
INSERT INTO root.test.service.tenant1.device1(TIMESTAMP, method, code, msgId)
VALUES (1776233509152, 25, null, 'bbb'),
(1776233509201, 26, 0, 'bbb');
FLUSH;
-- Batch 3 (day 3 data — this batch will be MISSING in wildcard query)
INSERT INTO root.test.service.tenant1.device1(TIMESTAMP, method, code, msgId)
VALUES (1776300241254, 25, null, 'ccc'),
(1776300241302, 26, 0, 'ccc');
FLUSH;
-- Batch 4 (day 7 data — this batch will also be MISSING in wildcard query)
INSERT INTO root.test.service.tenant1.device1(TIMESTAMP, method, code, msgId)
VALUES (1776642467102, 25, null, 'ddd'),
(1776642467150, 26, 0, 'ddd');
FLUSH;
Confirm multiple TsFiles exist (all ending in -0-0.tsfile, meaning not yet compacted):
sequence/root.test/3/xxxx/...-1-0-0.tsfile
sequence/root.test/3/xxxx/...-2-0-0.tsfile
sequence/root.test/3/yyyy/...-1-0-0.tsfile
sequence/root.test/3/yyyy/...-2-0-0.tsfile
Step 3 — Run Query A (wildcard):
SELECT COUNT(method)
FROM root.test.service.**
WHERE method = 26 AND code = 0
GROUP BY([1773936000000, 1776700800000), 1d)
ALIGN BY DEVICE;
Step 4 — Run Query B (exact path):
SELECT COUNT(method)
FROM root.test.service.tenant1.device1
WHERE method = 26 AND code = 0
GROUP BY([1773936000000, 1776700800000), 1d)
ALIGN BY DEVICE;
What did you expect to see?
Query A and Query B return identical COUNT values for every time bucket.
What did you see instead?
Query A returns COUNT = 0 for the time buckets that correspond to Batch 3 and Batch 4 (data written into separate TsFiles after the first flush). Query B returns the correct non-zero counts for those same buckets.
Results from real production data
Device: root.test.service.39eaf2bd9ca130da3368e3ac9cfcbdff.BaV7gK84WlEZz4RGArXa92knaxTEJU
| Time Bucket (UTC) |
Query A ** (wrong) |
Query B exact path (correct) |
| 2026-04-13T16:00:00.000Z |
17 |
17 |
| 2026-04-14T16:00:00.000Z |
6 |
6 |
| 2026-04-15T16:00:00.000Z |
0 ❌ |
6 |
| 2026-04-16T16:00:00.000Z |
0 |
0 |
| 2026-04-17T16:00:00.000Z |
0 |
0 |
| 2026-04-18T16:00:00.000Z |
0 |
0 |
| 2026-04-19T16:00:00.000Z |
0 ❌ |
6 |
The two incorrect buckets (Apr 16 and Apr 20 CST) correspond exactly to data batches written into TsFiles that were created after the first flush had already occurred.
Anything else?
Why this happens in production but is hard to reproduce in dev:
In a production IoT environment, devices continuously write data across multiple days. Each daily batch triggers a new TsFile via memtable flush. Compaction may not run fast enough to merge them before the next query arrives, so the bug is consistently observable. In a development environment with sparse data (small volume, single batch), all data lands in one TsFile and the bug never triggers.
Compaction masks the bug:
With compaction enabled (default), if the compaction task completes before the query runs, TsFiles are merged and the wildcard query returns correct results. This makes the bug appear intermittent, but it is actually deterministic: it always occurs when data spans multiple uncompacted TsFiles.
Affected versions:
Both 2.0.5 and 2.0.8 (apache/iotdb:latest as of 2026-04-20) were confirmed to reproduce the bug under identical conditions (compaction disabled, data in 4 separate TsFiles, 3+ devices under the wildcard path).
Workaround:
Query specific device paths instead of using **. Enumerate device paths from application-layer metadata and pass them explicitly to the query.
Are you willing to submit a PR?
Search before asking
Version
apache/iotdb:latestDocker image)Describe the bug and provide the minimal reproduce step
Bug Summary
When querying with a wildcard path (
**),ALIGN BY DEVICE,GROUP BYtime interval, and a value filter (WHEREon non-timestamp columns), some time buckets returnCOUNT = 0instead of the actual count.The bug is only triggered when the device's data is spread across multiple TsFiles (i.e., written in separate batches before compaction merges them). The same query on the exact device path always returns the correct result.
Minimal Reproduce Steps
Step 1 — Disable compaction (to keep TsFiles separate and prevent the bug from being masked):
Add to
iotdb-system.properties:Step 2 — Write data in 4 separate batches with FLUSH between each:
Confirm multiple TsFiles exist (all ending in
-0-0.tsfile, meaning not yet compacted):Step 3 — Run Query A (wildcard):
Step 4 — Run Query B (exact path):
What did you expect to see?
Query A and Query B return identical COUNT values for every time bucket.
What did you see instead?
Query A returns
COUNT = 0for the time buckets that correspond to Batch 3 and Batch 4 (data written into separate TsFiles after the first flush). Query B returns the correct non-zero counts for those same buckets.Results from real production data
Device:
root.test.service.39eaf2bd9ca130da3368e3ac9cfcbdff.BaV7gK84WlEZz4RGArXa92knaxTEJU**(wrong)The two incorrect buckets (Apr 16 and Apr 20 CST) correspond exactly to data batches written into TsFiles that were created after the first flush had already occurred.
Anything else?
Why this happens in production but is hard to reproduce in dev:
In a production IoT environment, devices continuously write data across multiple days. Each daily batch triggers a new TsFile via memtable flush. Compaction may not run fast enough to merge them before the next query arrives, so the bug is consistently observable. In a development environment with sparse data (small volume, single batch), all data lands in one TsFile and the bug never triggers.
Compaction masks the bug:
With compaction enabled (default), if the compaction task completes before the query runs, TsFiles are merged and the wildcard query returns correct results. This makes the bug appear intermittent, but it is actually deterministic: it always occurs when data spans multiple uncompacted TsFiles.
Affected versions:
Both 2.0.5 and 2.0.8 (
apache/iotdb:latestas of 2026-04-20) were confirmed to reproduce the bug under identical conditions (compaction disabled, data in 4 separate TsFiles, 3+ devices under the wildcard path).Workaround:
Query specific device paths instead of using
**. Enumerate device paths from application-layer metadata and pass them explicitly to the query.Are you willing to submit a PR?