Skip to content

SQL Query Does Not Fetch All Relevant Data #11439

@AneeqYusuf

Description

@AneeqYusuf

We are using Druid-0.20.1 as an OLAP for our BI and Operations team, so we have some Kafka CDC pipelines indexing data into druid.
When we run the following code to fetch all records without a deleted_at value assigned, the output does not match the one from the source system.

select __time, id, deleted_at, max(updated_at) as updated_at from "issues.issues"
where deleted_at = 0
GROUP BY __time, id, deleted_at

Several ids from the source system that do not have a deleted_at value are missing from the Druid query. For example, if I run the following code, it shows there is no data:

select __time, id, deleted_at, max(updated_at) as updated_at from "issues.issues"
where deleted_at = 0 and id = 'dad6e1c5-b9b5-4256-8fce-1cd84b035c71'
GROUP BY __time, id, deleted_at

However, if I remove the deleted_at = 0 from the where clause, I can see the original record and it does have the deleted_at value set to 0.

I can only conclude that not all of the relevant records are being fetched by Druid SQL, but not sure if it is a calcite problem or something from Druid.

Any help on this would be much appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions