-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
We are using Druid-0.20.1 as an OLAP for our BI and Operations team, so we have some Kafka CDC pipelines indexing data into druid.
When we run the following code to fetch all records without a deleted_at value assigned, the output does not match the one from the source system.
select __time, id, deleted_at, max(updated_at) as updated_at from "issues.issues"
where deleted_at = 0
GROUP BY __time, id, deleted_at
Several ids from the source system that do not have a deleted_at value are missing from the Druid query. For example, if I run the following code, it shows there is no data:
select __time, id, deleted_at, max(updated_at) as updated_at from "issues.issues"
where deleted_at = 0 and id = 'dad6e1c5-b9b5-4256-8fce-1cd84b035c71'
GROUP BY __time, id, deleted_at
However, if I remove the deleted_at = 0 from the where clause, I can see the original record and it does have the deleted_at value set to 0.
I can only conclude that not all of the relevant records are being fetched by Druid SQL, but not sure if it is a calcite problem or something from Druid.
Any help on this would be much appreciated.