Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MATERIALIZED column as trivial PROJECTION #56588

Open
qoega opened this issue Nov 10, 2023 · 3 comments
Open

MATERIALIZED column as trivial PROJECTION #56588

qoega opened this issue Nov 10, 2023 · 3 comments

Comments

@qoega
Copy link
Member

qoega commented Nov 10, 2023

Use case

Currently we do not use materialised columns to optimise queries automatically.
Projections allow to replace expression to precomputed column in projection.
It makes even more sense when materialized column exists in PK.

If this features and optimisations will be implemented it is possible to add MATERIALIZED columns that will improve user queries without need to change queries itself. Moreover there are several materialized columns and projections ClickHouse can choose best variant.

Example

CREATE TABLE users_m (uid Int16, name String, age Int16, name_len UInt8 MATERIALIZED length(name)) ENGINE=MergeTree ORDER BY (name_len, name);

CREATE TABLE users_p (uid Int16, name String, age Int16,
  PROJECTION p1 
  (
    SELECT 
      uid, name, age, length(name)
    ORDER BY length(name), name
  )
) ENGINE=MergeTree ORDER BY (name);


INSERT INTO users_m VALUES (1231, 'John', 33);
INSERT INTO users_m VALUES (6666, 'Ksenia', 48);
INSERT INTO users_m VALUES (8888, 'Alice', 50);
INSERT INTO users_p VALUES (1231, 'John', 33);
INSERT INTO users_p VALUES (6666, 'Ksenia', 48);
INSERT INTO users_p VALUES (8888, 'Alice', 50);

SELECT 'Here we do full scan';
EXPLAIN indexes=1 SELECT name FROM users_m WHERE length(name) < 5;
SELECT 'Here we use projection and PK';
EXPLAIN indexes=1 SELECT name FROM users_p WHERE length(name) < 5;

Here we do full scan

Expression ((Projection + Before ORDER BY))
  Filter (WHERE)
    ReadFromMergeTree (default.users_m)
    Indexes:
      PrimaryKey
        Condition: true
        Parts: 3/3
        Granules: 3/3

Here we use projection and PK

Expression ((Projection + Before ORDER BY))
  Filter
    ReadFromMergeTree (p1)
    Indexes:
      PrimaryKey
        Keys: 
          length(name)
        Condition: (length(name) in (-Inf, 4])
        Parts: 1/3
        Granules: 1/3

https://fiddle.clickhouse.com/69dfb8f5-5713-4959-8f95-30f3381df81e

Describe the solution you'd like

  1. ClickHouse guarantees that MATERIALIZED column always corresponds to source
  • MATERIALIZED column should be updated during ALTER UPDATE if column that is used in materialized expression is changed.
  • If MATERIALIZED expression is changed in ALTER TABLE query column should be invalidated and recomputed.
  • No manual inserts/alter update of materialized column allowed
  1. ClickHouse can replace same expression in a query to a precomputed materialized column as we do for projection.
  2. Optimisation is applied before index analysis to be able to use PK

Describe alternatives you've considered

  1. Separate type for such materialized column that has more constraints on expression(can't use ephemeral for example)
  2. Simplified PROJECTION case when only new columns are added without different ORDER BY - store this precomputed columns as ordinary columns in the same table.
  3. ASSUME CONSTRAINT / hypothesis: if we can add automatically add hypothesis for existing MATERIALIZED COLUMNS. This one actually works, we just do not add CONSTRAINT for MATERIALIZED expression automatically:
Expression ((Projection + Before ORDER BY))
  Filter (WHERE)
    ReadFromMergeTree (default.users_a)
    Indexes:
      PrimaryKey
        Keys: 
          name_len
        Condition: and((name_len in (-Inf, 4]), (name_len in (-Inf, 4]))
        Parts: 1/3
        Granules: 1/3

https://fiddle.clickhouse.com/30a2e2a1-ed60-4686-b11b-2ecd30445eba

Additional context

I see following problems to solve:

  • ALTER UPDATE can't trigger update materialised column change if this column is used in PK/partition same as
    Updating columns that are used in the calculation of the primary or the partition key is not supported.
  • Currently MATERIALIZED can use EPHEMERAL column in expression. This scenario is valid, but should have same behaviour as now - value is computed on insertion and is not updated even if some other columns in expression are updated.
@qoega qoega added the feature label Nov 10, 2023
@alexey-milovidov
Copy link
Member

We also have a similar mechanism - optimization with CONSTRAINTs.
If we will treat every MATERIALIZED column as a constraint, it should work.

@UnamedRus
Copy link
Contributor

UnamedRus commented Nov 12, 2023

If we will treat every MATERIALIZED column as a constraint, it should work.

#37137

We also have a similar mechanism - optimization with CONSTRAINTs.

It's not really useful, because of limitation, that it only check WHERE conditions.
#33544

@qoega
Copy link
Member Author

qoega commented Nov 13, 2023

We also have a similar mechanism - optimization with CONSTRAINTs.

It is mentioned in alternatives. It works but requires 3 settings to be enabled to work and this feature is even not documented - it may be not production ready. But having feature "materializes_as_constraint" can be a good small step to start with

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants