Skip to content

Enable skip index usage through monotonic function wrappers (assumeNotNull, coalesce, ifNull) #94689

@toddpayi

Description

@toddpayi

Company or project name

Pay-i - AI cost management and observability platform using ClickHouse for real-time analytics

Use case

We use CoalescingMergeTree (CMT) for CDC ingestion where data arrives incrementally from multiple source tables. The CMT pattern requires Nullable columns to handle the merge window where partial data exists.

Our architecture exposes consumption VIEWs that wrap CMT columns with assumeNotNull() to provide a clean, non-nullable interface to downstream queries. This is a standard pattern recommended for CMT usage.

The problem: Skip indexes defined on the underlying CMT table columns are not used when queries filter through the VIEW, because the predicate WHERE ts >= X doesn't match the transformed column assumeNotNull(ts).

This forces a full granule scan on tables with hundreds of millions of rows, turning sub-second queries into multi-second queries.

Describe the solution you'd like

The optimizer should recognize that assumeNotNull(), coalesce(), and ifNull() are monotonic functions that preserve ordering. For range predicates, the skip index condition can be safely pushed through:

  • WHERE assumeNotNull(col) >= X → evaluate minmax index on underlying col
  • WHERE coalesce(col, constant) >= X → same, when default is constant

Reproduction:

CREATE TABLE test_data (
    id UInt64,
    ts DateTime64(3)
) ENGINE = MergeTree()
ORDER BY id
INDEX idx_ts ts TYPE minmax GRANULARITY 1;

INSERT INTO test_data SELECT number, now64() - number FROM numbers(10000000);

CREATE VIEW test_view AS
SELECT id, assumeNotNull(ts) AS ts FROM test_data;

-- Skip index IS used
SELECT count() FROM test_data WHERE ts >= '2025-01-01';

-- Skip index NOT used (full granule scan)
SELECT count() FROM test_view WHERE ts >= '2025-01-01';

Describe alternatives you've considered

  1. Query the underlying table directly - Exposes internal CMT schema to application layer, breaks abstraction
  2. Remove assumeNotNull from VIEW - Works when querying with FINAL (CMT guarantees coalesced data), but the VIEW column type becomes Nullable, pushing NULL handling to all consumers
  3. Create skip index on assumeNotNull(col) - Not supported; skip indexes require direct column references
  4. Handle NULLs in application code - Every query consumer must handle Nullable types even though values are never NULL in practice with FINAL

None of these are optimal for production use.

Additional context

This likely affects anyone using CoalescingMergeTree (CMT) with a consumption VIEW layer

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions