Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop unused columns after join on/using #44545

Merged
merged 6 commits into from
Jan 12, 2023
Merged

Conversation

vdimir
Copy link
Member

@vdimir vdimir commented Dec 23, 2022

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

...

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@vdimir vdimir changed the title Set pipeline type in join step description [wip] drop unused columns after join on/using Dec 23, 2022
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Dec 23, 2022
@vdimir
Copy link
Member Author

vdimir commented Dec 27, 2022

Difference in the query plan (running added test on current version):

Details
02514_analyzer_drop_join_on:                                            [ FAIL ] - result differs with reference: 
--- /home/ubuntu/ClickHouseA/tests/queries/0_stateless/02514_analyzer_drop_join_on.reference    2022-12-27 11:16:11.639907351 +0000
+++ /home/ubuntu/ClickHouseA/tests/queries/0_stateless/02514_analyzer_drop_join_on.stdout       2022-12-27 11:16:36.235829022 +0000
@@ -7,17 +7,23 @@
     Header: default.a.a2_4 String
       Join (JOIN FillRightFirst)
       Header: default.a.a2_4 String
+              default.a.a1_1 UInt64
+              default.b.b1_0 UInt64
               default.c.c1_2 UInt64
               default.d.d1_3 UInt64
         Expression ((JOIN actions + DROP unused columns after JOIN))
         Header: default.a.a2_4 String
+                default.a.a1_1 UInt64
+                default.b.b1_0 UInt64
                 default.c.c1_2 UInt64
           Join (JOIN FillRightFirst)
           Header: default.a.a2_4 String
+                  default.a.a1_1 UInt64
                   default.b.b1_0 UInt64
                   default.c.c1_2 UInt64
             Expression ((JOIN actions + DROP unused columns after JOIN))
             Header: default.a.a2_4 String
+                    default.a.a1_1 UInt64
                     default.b.b1_0 UInt64
               Join (JOIN FillRightFirst)
               Header: default.a.a2_4 String
@@ -47,18 +53,24 @@
   Join (JOIN FillRightFirst)
   Header: default.a.k_2 UInt64
           default.a.a2_0 String
+          default.b.k_3 UInt64
+          default.c.k_4 UInt64
           default.d.d2_1 String
           default.d.k_5 UInt64
     Expression (DROP unused columns after JOIN)
     Header: default.a.k_2 UInt64
             default.a.a2_0 String
+            default.b.k_3 UInt64
+            default.c.k_4 UInt64
       Join (JOIN FillRightFirst)
       Header: default.a.k_2 UInt64
               default.a.a2_0 String
+              default.b.k_3 UInt64
               default.c.k_4 UInt64
         Expression (DROP unused columns after JOIN)
         Header: default.a.k_2 UInt64
                 default.a.a2_0 String
+                default.b.k_3 UInt64
           Join (JOIN FillRightFirst)
           Header: default.a.k_2 UInt64
                   default.a.a2_0 String
@@ -93,23 +105,29 @@
             b.bx_0 String
       Join (JOIN FillRightFirst)
       Header: default.a.a2_6 String
+              default.a.a1_2 UInt64
               b.bx_0 String
+              b.b1_1 UInt64
               default.c.c2_5 String
               default.c.c1_3 UInt64
               d.d1_4 UInt64
         Filter (( + (JOIN actions + DROP unused columns after JOIN)))
         Header: default.a.a2_6 String
+                default.a.a1_2 UInt64
                 b.bx_0 String
+                b.b1_1 UInt64
                 default.c.c2_5 String
                 default.c.c1_3 UInt64
           Join (JOIN FillRightFirst)
           Header: default.a.a2_6 String
+                  default.a.a1_2 UInt64
                   b.bx_0 String
                   b.b1_1 UInt64
                   default.c.c2_5 String
                   default.c.c1_3 UInt64
             Expression ((JOIN actions + DROP unused columns after JOIN))
             Header: default.a.a2_6 String
+                    default.a.a1_2 UInt64
                     b.bx_0 String
                     b.b1_1 UInt64
               Join (JOIN FillRightFirst)

@vdimir vdimir changed the title [wip] drop unused columns after join on/using Drop unused columns after join on/using Dec 27, 2022
@vdimir vdimir marked this pull request as ready for review December 27, 2022 11:18
@kitaisreal kitaisreal self-assigned this Jan 2, 2023
Copy link
Collaborator

@kitaisreal kitaisreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Need to fix minor details and merge.

namespace DB
{

using ColumnIdentifierSet = std::unordered_set<ColumnIdentifier>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to move this into TableExpressionData.

if (node->getNodeType() != QueryTreeNodeType::COLUMN)
return;

const auto * column_ident = planner_context->getColumnNodeIdentifierOrNull(node);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to rename into column_identifier.

static bool needChildVisit(VisitQueryTreeNodeType &, VisitQueryTreeNodeType & child)
{
const auto & node_type = child->getNodeType();
return node_type != QueryTreeNodeType::TABLE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to remove multiple two spaces after != .


using ColumnIdentifierSet = std::unordered_set<ColumnIdentifier>;

/// Collect all top level column identifiers from query tree node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to use /** */ style for multiline comments.

@@ -0,0 +1,22 @@
#pragma once
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to rename this file into CollectColumnIdentifiers.h from CollectColumnIndetifiers.h.


using ColumnIdentifierSet = std::unordered_set<ColumnIdentifier>;

/// Collect all top level column identifiers from query tree node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add in comment precondition that table expression data must be already collected in planner context.

@@ -11,10 +11,13 @@
namespace DB
{

using ColumnIdentifierSet = std::unordered_set<ColumnIdentifier>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to move this into TableExpressionData and reuse in CollectColumnIdentifiers.


SET allow_experimental_analyzer = 1;

EXPLAIN PLAN header = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to enable echoOn so test will be easier to verify.

@vdimir vdimir force-pushed the vdimir/analyzer_drop_join_on branch from 76be3b3 to 1b6e036 Compare January 11, 2023 11:54
@kitaisreal kitaisreal merged commit 30de77d into master Jan 12, 2023
@kitaisreal kitaisreal deleted the vdimir/analyzer_drop_join_on branch January 12, 2023 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-not-for-changelog This PR should not be mentioned in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants