Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constraints for optimization #18787

Merged
merged 65 commits into from
Nov 22, 2021

Conversation

nikvas0
Copy link
Contributor

@nikvas0 nikvas0 commented Jan 6, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Added CONSTRAINT ... ASSUME ... (without checking during INSERT)
  • Added query transformation to CNF (Optimize boolean conditions using CNF / DNF #11749) for more convenient optimization
  • Added simple query rewriting using constraints (only simple matching now, will be improved to support <,=,>... later)
  • Added ability to replace heavy columns with light
  • Added ability to use the index in queries

Detailed description / Documentation draft:

TODO:

  • documentation
  • more tests for CNF
  • improve constraint matching (now it is a simple all-to-all matching)
  • refactoring
  • settings

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Jan 6, 2021
@nikvas0 nikvas0 marked this pull request as ready for review May 12, 2021 20:46
@nikvas0 nikvas0 changed the title [WIP] Constraints for optimization Constraints for optimization May 16, 2021
@UnamedRus
Copy link
Contributor

EXPLAIN SYNTAX
SELECT count()
FROM
(
    SELECT
        number AS key_a
    FROM numbers(20)
) AS a
WHERE ((key_a >= 1) AND (key_a <= 5)) OR ((key_a >= 5) AND (key_a <= 10)) OR ((key_a >= 10) AND (key_a <= 15))

┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ SELECT count()                                                                                                         │
│ FROM                                                                                                                   │
│ (                                                                                                                      │
│     SELECT number AS key_a                                                                                             │
│     FROM numbers(20)                                                                                                   │
│     WHERE (((((key_a >= 10) AND (key_a >= 5)) AND (key_a <= 5)) AND (key_a <= 10)) AND (key_a >= 1)) AND (key_a <= 15) │
│ ) AS a                                                                                                                 │
│ WHERE (key_a <= 15) AND (key_a >= 1) AND (key_a <= 10) AND (key_a <= 5) AND (key_a >= 5) AND (key_a >= 10)             │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

 set convert_query_to_cnf=0;
 
 EXPLAIN SYNTAX
SELECT count()
FROM
(
    SELECT number AS key_a
    FROM numbers(20)
) AS a
WHERE ((key_a >= 1) AND (key_a <= 5)) OR ((key_a >= 5) AND (key_a <= 10)) OR ((key_a >= 10) AND (key_a <= 15))

Query id: d2d4275f-07b2-412b-9eca-bc40e0b5d180

┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ SELECT count()                                                                                                     │
│ FROM                                                                                                               │
│ (                                                                                                                  │
│     SELECT number AS key_a                                                                                         │
│     FROM numbers(20)                                                                                               │
│     WHERE ((key_a >= 1) AND (key_a <= 5)) OR ((key_a >= 5) AND (key_a <= 10)) OR ((key_a >= 10) AND (key_a <= 15)) │
│ ) AS a                                                                                                             │
│ WHERE ((key_a >= 1) AND (key_a <= 5)) OR ((key_a >= 5) AND (key_a <= 10)) OR ((key_a >= 10) AND (key_a <= 15))     │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

But it looks like it working correctly if we have only 2 groups:

 set convert_query_to_cnf=1;
 
 EXPLAIN SYNTAX
SELECT count()
FROM
(
    SELECT number AS key_a
    FROM numbers(20)
) AS a
WHERE ((key_a >= 1) AND (key_a <= 5)) OR ((key_a >= 5) AND (key_a <= 10))

Query id: 966cb5fb-58a5-4bd2-b8aa-e39c8c7fc9e4

┌─explain─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ SELECT count()                                                                                                                                          │
│ FROM                                                                                                                                                    │
│ (                                                                                                                                                       │
│     SELECT number AS key_a                                                                                                                              │
│     FROM numbers(20)                                                                                                                                    │
│     WHERE ((((key_a <= 5) OR (key_a >= 5)) AND ((key_a <= 10) OR (key_a <= 5))) AND ((key_a >= 1) OR (key_a >= 5))) AND ((key_a >= 1) OR (key_a <= 10)) │
│ ) AS a                                                                                                                                                  │
│ WHERE ((key_a >= 1) OR (key_a <= 10)) AND ((key_a >= 1) OR (key_a >= 5)) AND ((key_a <= 10) OR (key_a <= 5)) AND ((key_a <= 5) OR (key_a >= 5))         │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

@nikvas0
Copy link
Contributor Author

nikvas0 commented May 19, 2021

But it looks like it working correctly if we have only 2 groups:

Fixed

@CurtizJ CurtizJ self-assigned this May 25, 2021
src/Interpreters/InterpreterCreateQuery.cpp Show resolved Hide resolved
src/Storages/ConstraintsDescription.cpp Show resolved Hide resolved
src/Interpreters/ComparisonGraph.cpp Show resolved Hide resolved
src/Interpreters/ComparisonGraph.h Show resolved Hide resolved
src/Interpreters/ComparisonGraph.h Show resolved Hide resolved
@UnamedRus
Copy link
Contributor

It's possible that #10685 going to be covered by this pr?

@CLAassistant
Copy link

CLAassistant commented Sep 28, 2021

CLA assistant check
All committers have signed the CLA.

@CurtizJ
Copy link
Member

CurtizJ commented Nov 18, 2021

@nikvas0 Please, can you sign the CLA? So, I'll be able to merge this PR.

CurtizJ added a commit that referenced this pull request Nov 22, 2021
@CurtizJ CurtizJ merged commit 833652e into ClickHouse:master Nov 22, 2021
@sevirov
Copy link
Contributor

sevirov commented Nov 22, 2021

Internal documentation ticket: DOCSUP-18735

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants