Skip to content

IN filtering result are not correct. #11344

@t0mpere

Description

@t0mpere

Hey I'm performing queries like

          SELECT
              job_id,
              DATETRUNC('day', ts) as dt,
              sum(c)
          FROM TABLE 
          WHERE
			job_id in ( '2023_08_14_10_10_01', '2023_08_14_10_25_13')
          GROUP BY job_id, dt

Query plan:
image

I'm expecting this to have the same result as the union of the following queries:

         SELECT
             job_id,
             DATETRUNC('day', ts) as dt,
             sum(c)
         FROM TABLE 
         WHERE
   		job_id = '2023_08_14_10_10_01' 
         GROUP BY job_id, dt
---------------------------------------------------------------
         SELECT
             job_id,
             DATETRUNC('day', ts) as dt,
             sum(c)
         FROM TABLE 
         WHERE
   		job_id = '2023_08_14_10_25_13'
         GROUP BY job_id, dt

But there's some occasions where this is not the case and some job_ids are left out.

Table is configured like this:

    "tableIndexConfig": {
      "rangeIndexVersion": 2,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "sortedColumn": [],
      "bloomFilterColumns": [],
      "noDictionaryColumns": [
        "d"
      ],
      "invertedIndexColumns": [],
      "onHeapDictionaryColumns": [
        "c"
      ],
      "varLengthDictionaryColumns": [
        "b",
        "a"
      ],
      "enableDefaultStarTree": false,
      "starTreeIndexConfigs": [
        {
          "dimensionsSplitOrder": [
            "ts",
            "job_id",
            [...]
          ],
          "skipStarNodeCreationForDimensions": [],
          "functionColumnPairs": [
            "SUM__c",
          ],
          "maxLeafRecords": 10000
        }
      ],
      "enableDynamicStarTreeCreation": true,
      "aggregateMetrics": false,
      "nullHandlingEnabled": false,
      "optimizeDictionary": true,
      "optimizeDictionaryForMetrics": true,
      "noDictionarySizeRatioThreshold": 0,
      "rangeIndexColumns": []
    },

Am I doing something wrong here or is this a bug?

Current configuration:
GKE
version 0.12.1
GCS for deep storage
3 ZK - 8 CPU and 18GB ram
6 Servers - 16CPU and 32 64GB ram 1.45TB SSD
2 Controllers - 16 CPU and 32GB ram
2 Brokers - 5 CPU 16.25GB ram
32 Minions - 2 CPU and 2GB of ram

1M Segments 4TB of data

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions