Skip to content

Boolean dimensions/segments become integers in pre-aggregation tables built with ClickHouse #9494

@igorlukanin

Description

@igorlukanin

Describe the bug
When a pre-aggregation is built, if it contains boolean dimensions or segments and the data source is ClickHouse, the pre-aggregation data will contain integers (0 or 1) instead of booleans (false or true). That will lead to subsequent errors when querying such a pre-aggregation.

To Reproduce
Steps to reproduce the behavior:

  1. Use the data model below.
  2. Use ClickHouse as the data source.
  3. Build the pre-aggregation (no errors).
  4. Run the query below. See the following error: Internal: Error during planning: Filter predicate must return boolean values, not Int64
  5. Check the pre-aggregation data. See ints for boolean members.

Expected behavior
No error when querying.

Screenshots
ClickHouse — see 0s:
Image

For comparison: same pre-agg built on Postgres:
Image

Minimally reproducible data model

cube(`top_card_metrics`, {
  title: `Top Card Metrics`,
  // sql_table: `top_card_metrics_mv`,
  sql: `
    SELECT
      '2025-04-01'::TIMESTAMP AS create_dt,
      123 AS brand_id,
      'foo' AS brand,
      1 AS brand_est_offtake_mrp,
      2 AS offtake_mrp,
      3 AS all_brands_est_offtake_mrp,
      4 AS brand_assortment,
      5 AS wt_osa_sum,
      6 AS doi_fe_be_sum
  `,
  meta: {
    description: `Top card metrics for Blinkit.`
  },
  dimensions: {
    created_at: {
      sql: `create_dt`,
      type: `time`,
      title: `Date`
    },
    brand_id: {
      sql: `brand_id`,
      type: `string`,
      title: `Brand ID`
    },
    brand: {
      sql: `brand`,
      type: `string`,
      title: `Brand`
    },
    is_current_brand: {
      sql: `brand_id = '17390'`,
      type: `boolean`,
      title: `Is Current Brand`,
      description: `Is the current brand Blinkit?`
    }
  },
  measures: {
    // --- Base Measures ---
    _estimated_offtake_mrp_sum: {
      sql: `brand_est_offtake_mrp`,
      type: `sum`,
      shown: false
    },
    _offtake_mrp_sum: {
      sql: `offtake_mrp`,
      type: `sum`,
      shown: false
    },
    _all_estimated_offtake_mrp_sum: {
      sql: `all_brands_est_offtake_mrp`,
      type: `sum`,
      shown: false
    },
    _brand_assortment_count: {
      sql: `brand_assortment`,
      type: `max`,
      shown: false
    },
    _wt_osa_sum: {
      sql: `wt_osa_sum`,
      type: `sum`,
      shown: false
    },
    _wt_osa_max_sum: {
      sql: `wt_osa_sum`,
      type: `sum`,
      filters: [
        {
          sql: `create_dt = toDate(parseDateTime64BestEffort(${FILTER_PARAMS.top_card_metrics.create_dt.filter(
            (x, y) => `${y}`
          )}))`
        }
      ],
      shown: false
    },
    _wt_osa_offtake_mrp_sum: {
      sql: `brand_est_offtake_mrp`,
      type: `sum`,
      filters: [
        {
          sql: `create_dt = toDate(parseDateTime64BestEffort(${FILTER_PARAMS.top_card_metrics.create_dt.filter(
            (x, y) => `${y}`
          )}))`
        }
      ],
      shown: false
    },
    _doi_fe_be_sum: {
      sql: `doi_fe_be_sum`,
      type: `sum`,
      shown: false
    },
    // --- User Requested Metrics ---
    sales_mrp: {
      sql: `${_offtake_mrp_sum}`,
      type: `number`,
      format: `currency`,
      title: `Sales (MRP)`,
      description: `Total sales revenue based on MRP.`
    },
    assortment: {
      sql: `${_brand_assortment_count}`,
      type: `number`,
      title: `Assortment`,
      description: `Total number of products available for the brand.`
    },
    estimated_market_share: {
      sql: `(${CUBE._estimated_offtake_mrp_sum} / NULLIF(${CUBE._all_estimated_offtake_mrp_sum}, 0))*100`,
      type: `number`,
      format: `percent`,
      title: `Est. Market Share`,
      description: `Estimated percentage of the total market captured by the brand. Brand's Sales on a daily basis / Total Sales on a daily basis.`
    },
    estimated_market_size: {
      sql: `${CUBE._offtake_mrp_sum} * (${CUBE._estimated_offtake_mrp_sum} / NULLIF(${CUBE._all_estimated_offtake_mrp_sum}, 0))`,
      type: `number`,
      format: `currency`,
      title: `Est. Market Size`,
      description: `Estimated total market size based on the brand's sales and market share. Brand's Sales * Brands Market Share.`
    },
    wt_osa_pct: {
      sql: `${_wt_osa_sum}/NULLIF(${CUBE._estimated_offtake_mrp_sum}, 0)*100`,
      type: `number`,
      format: `percent`,
      title: `OSA %`,
      description: `On Shelf Availability Percentage. weighted sum of wt_osa / total_est_offtake_mrp.`
    },
    doi_fe_be: {
      sql: `${_doi_fe_be_sum}/NULLIF(${CUBE._offtake_mrp_sum}, 0)`,
      type: `number`,
      title: `DOI FE BE`,
      description: `Days of inventory of a brand`
    }
  },
  segments: {
    curr_brand: {
      sql: `${CUBE}.brand_id = '17390'`
    }
  },
  pre_aggregations: {
    main: {
      type: `rollup`,
      measures: [
        top_card_metrics._offtake_mrp_sum,
        top_card_metrics._estimated_offtake_mrp_sum,
        top_card_metrics._all_estimated_offtake_mrp_sum,
        top_card_metrics._brand_assortment_count,
        top_card_metrics._wt_osa_sum,
        top_card_metrics._doi_fe_be_sum
      ],
      segments: [CUBE.curr_brand],
      dimensions: [
        top_card_metrics.brand_id,
        top_card_metrics.brand,
        top_card_metrics.created_at,
        top_card_metrics.is_current_brand
      ],
      time_dimension: top_card_metrics.created_at,
      granularity: "day",
      partition_granularity: "month",
      refresh_key: {
        every: "5 minute",
        update_window: "24 hour",
        incremental: true
      },
      build_range_start: {
        sql: `SELECT DATE_TRUNC('month', NOW() - INTERVAL '3 months')`
      },
      build_range_end: {
        sql: `SELECT NOW()`
      },
      indexes: {
        brand_index: {
          columns: [top_card_metrics.brand_id, top_card_metrics.created_at]
        }
      }
      // use_original_sql_pre_aggregations: true
    }
  }
});

Query:

{
  "measures": [
    "top_card_metrics.doi_fe_be"
  ],
  "dimensions": [
    "top_card_metrics.brand"
  ],
  "segments": [
    "top_card_metrics.curr_brand"
  ]
}

Version:
1.3.5

Additional context
Explicit casts on boolean dimensions works as a workaround:

    is_current_brand: {
      sql: `(brand_id = '17390')::BOOLEAN`,
      type: `boolean`,
      title: `Is Current Brand`,
      description: `Is the current brand Blinkit?`
    }

// ...

  segments: {
    curr_brand: {
      sql: `(${CUBE}.brand_id = '17390')::BOOLEAN`
    }
  },

Metadata

Metadata

Assignees

No one assigned

    Labels

    driver:clickhouseIssues related to the ClickHouse driverhelp wantedCommunity contributions are welcome.pre-aggregationsIssues related to pre-aggregations

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions