Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite countDistinctIf with count_distinct_implementation configuration. #42008

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/Core/Settings.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ class IColumn;

/** List of settings: type, name, default value, description, flags
*
* This looks rather unconvenient. It is done that way to avoid repeating settings in different places.
* This looks rather inconvenient. It is done that way to avoid repeating settings in different places.
* Note: as an alternative, we could implement settings to be completely dynamic in form of map: String -> Field,
* but we are not going to do it, because settings is used everywhere as static struct fields.
*
Expand Down Expand Up @@ -488,6 +488,7 @@ class IColumn;
M(Bool, optimize_move_functions_out_of_any, false, "Move functions out of aggregate functions 'any', 'anyLast'.", 0) \
M(Bool, optimize_normalize_count_variants, true, "Rewrite aggregate functions that semantically equals to count() as count().", 0) \
M(Bool, optimize_injective_functions_inside_uniq, true, "Delete injective functions of one argument inside uniq*() functions.", 0) \
M(Bool, optimize_rewrite_count_distinct_if, false, "Rewrite countDistinctIf with count_distinct_implementation configuration", 0) \
M(Bool, convert_query_to_cnf, false, "Convert SELECT query to CNF", 0) \
M(Bool, optimize_or_like_chain, false, "Optimize multiple OR LIKE into multiMatchAny. This optimization should not be enabled by default, because it defies index analysis in some cases.", 0) \
M(Bool, optimize_arithmetic_operations_in_aggregate_functions, true, "Move arithmetic operations out of aggregation functions", 0) \
Expand Down
3 changes: 2 additions & 1 deletion src/Core/SettingsChangesHistory.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,8 @@ static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> sett
{"input_format_csv_detect_header", false, true, "Detect header in CSV format by default"},
{"input_format_tsv_detect_header", false, true, "Detect header in TSV format by default"},
{"input_format_custom_detect_header", false, true, "Detect header in CustomSeparated format by default"},
{"query_plan_remove_redundant_sorting", false, true, "Remove redundant sorting in query plan. For example, sorting steps related to ORDER BY clauses in subqueries"}}},
{"query_plan_remove_redundant_sorting", false, true, "Remove redundant sorting in query plan. For example, sorting steps related to ORDER BY clauses in subqueries"},
{"optimize_rewrite_count_distinct_if", false, true, "Rewrite countDistinctIf with count_distinct_implementation configuration"}}},
{"22.12", {{"max_size_to_preallocate_for_aggregation", 10'000'000, 100'000'000, "This optimizes performance"},
{"query_plan_aggregation_in_order", 0, 1, "Enable some refactoring around query plan"},
{"format_binary_max_string_size", 0, 1_GiB, "Prevent allocating large amount of memory"}}},
Expand Down
9 changes: 9 additions & 0 deletions src/Interpreters/TreeRewriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,9 @@ using CustomizeCountDistinctVisitor = InDepthNodeVisitor<OneTypeMatcher<Customiz
char countifdistinct[] = "countifdistinct";
using CustomizeCountIfDistinctVisitor = InDepthNodeVisitor<OneTypeMatcher<CustomizeFunctionsData<countifdistinct>>, true>;

char countdistinctif[] = "countdistinctif";
using CustomizeCountDistinctIfVisitor = InDepthNodeVisitor<OneTypeMatcher<CustomizeFunctionsData<countdistinctif>>, true>;

char in[] = "in";
using CustomizeInVisitor = InDepthNodeVisitor<OneTypeMatcher<CustomizeFunctionsData<in>>, true>;

Expand Down Expand Up @@ -1410,6 +1413,12 @@ void TreeRewriter::normalize(
CustomizeIfDistinctVisitor::Data data_distinct_if{"DistinctIf"};
CustomizeIfDistinctVisitor(data_distinct_if).visit(query);

if (settings.optimize_rewrite_count_distinct_if)
{
CustomizeCountDistinctIfVisitor::Data data_count_distinct_if{settings.count_distinct_implementation.toString() + "If"};
CustomizeCountDistinctIfVisitor(data_count_distinct_if).visit(query);
}

ExistsExpressionVisitor::Data exists;
ExistsExpressionVisitor(exists).visit(query);

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
2
SELECT countDistinctIf(number % 10, (number % 5) = 2)
FROM numbers_mt(1000000)
2
SELECT uniqExactIf(number % 10, (number % 5) = 2)
FROM numbers_mt(1000000)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
SET optimize_rewrite_count_distinct_if = FALSE;
SELECT countDistinctIf(number % 10, number % 5 = 2) FROM numbers_mt(1000000);
EXPLAIN SYNTAX SELECT countDistinctIf(number % 10, number % 5 = 2) FROM numbers_mt(1000000);

SET optimize_rewrite_count_distinct_if = TRUE;
SELECT countDistinctIf(number % 10, number % 5 = 2) FROM numbers_mt(1000000);
EXPLAIN SYNTAX SELECT countDistinctIf(number % 10, number % 5 = 2) FROM numbers_mt(1000000);