Best Practices for Dynamic Filtering on Array Columns (e.g., ClickHouse Array(Int32))

**Problem**

I'm using Cube.js with a ClickHouse data source and need to support dynamic filtering on an array column (Project IDs of type Array(Int32)) via the GraphQL API. Users should be able to provide an arbitrary list of values (e.g., [123, 456]) to filter rows where the array contains any of those values (OR semantics).
Cube.js doesn't natively support array types or operators like contains for arrays, so standard dimension filters append = ? to the WHERE clause, which breaks array functions (e.g., hasAny). I've tried several workarounds, but each has trade-offs (detailed below). I'm looking for:

Better approaches or best practices for this use case.
Confirmation if the "formatted string" method is acceptable (it works but feels hacky).
Any plans for native array support (e.g., custom operators like arrayContains)?

Attempts and Issues

1. Unnest with arrayJoin:
javascriptproject_id_element: {
  sql: `arrayJoin(${CUBE}."Project IDs")`,
  type: `number`
}
Issue: Filtering by multiple values duplicates measures (e.g., count) when multiple values match in one row.

2. Direct Array Function:
javascripthas_project_id: {
  sql: `arrayExists(x -> x = {filterValue}, ${CUBE}."Project IDs")`,
  type: `boolean`
}
Issue: Cube appends = ? to filters, breaking arrayExists or hasAny.


3. Using FILTER_PARAMS in Segment:
javascriptsegments: {
  project_filter: { sql: `hasAny(${CUBE}."Project IDs", ${FILTER_PARAMS.project_ids})` }
}
Issue: Still gets wrapped in = ? or is hard to maintain with dynamic values.

4. Format Array as String (Current Solution):
javascriptproject_id_filter: {
  type: `string`,
  sql: `arrayStringConcat(arrayMap(x -> concat('$', toString(x), '$'), ${CUBE}."Project IDs"), ',')`
}
Works: Filters with contains on ["$123$", "$456$"] produce WHERE ... LIKE '%$123$%' OR ... LIKE '%$456$%'.
Concerns: Brittle (client must format $x$), potential performance issues with string ops, edge cases (empty arrays).

**Related Cube.js schema**
```javascript
cube(`violations`, {
  sql_table: `violations`,
  
  data_source: `default`,

  measures: {
    // Assuming some measures like count
    count: {
      type: `count`,
    },
  },

  dimensions: {
    id: {
      sql: `${CUBE}."ID"`,
      type: `string`,
      primary_key: true
    },
    // Note: "Project IDs" is the array column (Array(Int32))
    project_id: {
      type: `number`,
      sql: `${CUBE}."Project IDs"`,  // This doesn't work for filtering as-is
    },
    has_project_id: {
      sql: `arrayExists(x -> x = {filterValue}, ${CUBE}."Project IDs")`,
      type: `boolean`,
    },
    project_id_element: {
      sql: `arrayJoin(${CUBE}."Project IDs")`,
      type: `number`,  // Or string
    },
    project_id_filter: {
      type: `string`,
      sql: `arrayStringConcat(arrayMap(x -> concat('$', toString(x), '$'), ${CUBE}."Project IDs"), ',')`,
    },
  }
});```

**Related Cube.js generated SQL**

```sql
SELECT 
  someOtherDim,
  COUNT(*) AS count
FROM violations
WHERE hasAny("Project IDs", [123, 456])  -- OR equivalent with arrayExists
GROUP BY someOtherDim;
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best Practices for Dynamic Filtering on Array Columns (e.g., ClickHouse Array(Int32)) #10048

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Best Practices for Dynamic Filtering on Array Columns (e.g., ClickHouse Array(Int32)) #10048

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions