Skip to content

Conversation

@yahoNanJing
Copy link
Contributor

@yahoNanJing yahoNanJing commented Mar 13, 2023

Which issue does this PR close?

Closes #5566.

Rationale for this change

Currently the TreeNodeRewriter is as a visitor to transform a node to another. However, sometimes we don't need to do the transformation and what we want is only to collect some info from the node. To achieve this, it's better to introduce another visitor for collecting info and keep the node unchanged.

What changes are included in this PR?

  • Introduced a common tree node visitor trait
  • Implemented the visitable trait for ExecutionPlan
  • Introduced a utility function to get all of the PartitionedFile for an ExecutionPlan based on the visitor mentioned above.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Mar 13, 2023
@yahoNanJing
Copy link
Contributor Author

Hi @mingmwang, @alamb, could you help review this PR?

@mingmwang
Copy link
Contributor

I think you can move the TreeNodeRewriter to the new added tree_node.rs also.

@alamb
Copy link
Contributor

alamb commented Mar 13, 2023

@mustafasrepo do you have time to review this PR?

@yahoNanJing
Copy link
Contributor Author

I think you can move the TreeNodeRewriter to the new added tree_node.rs also.

I'll add another commit to this PR for this.

@mustafasrepo
Copy link
Contributor

@mustafasrepo do you have time to review this PR?

Sure. I will look at it.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 14, 2023
@yahoNanJing
Copy link
Contributor Author

Unfortunately, due to the Orphan rule, I fail to combine the trait TreeNodeRewriter for both of the ExecutionPan and PhysicalExpr

use datafusion_common::Result;
use std::sync::Arc;

impl TreeNodeRewritable for Arc<dyn ExecutionPlan> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for implementing TreeNodeRewritable trait in another file. I guess, you could have implement this trait in the file where ExecutionPlan is implemented. I am not familiar with these pattern. I am asking to understand better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Orphan rule is relaxed in the future, the same trait of TreeNodeRewritable can be used for the PhysicalExpr. The traits defined in the tree_node file are common abstractions. They are not just for ExecutionPlan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why TreeNodeRewritable trait is defined in datafusion/core/src/physical_plan/tree_node/mod.rs. What I mean was that Arc<dyn ExecutionPlan> implements TreeNodeRewritable trait. I guess, we could do this implementation in datafusion/core/src/physical_plan/mod.rs. However, it is just a stylistic issue. I just wondered the reason why you did it in datafusion/core/src/physical_plan/tree_node/rewritable.rs

use crate::physical_plan::ExecutionPlan;
use std::sync::Arc;

impl TreeNodeVisitable for Arc<dyn ExecutionPlan> {
Copy link
Contributor

@mustafasrepo mustafasrepo Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above case I guess TreeNodeVisitable could have been implemented where ExecutionPlan is defined. Is there any reason for this specific pattern.

@mustafasrepo
Copy link
Contributor

LGTM!. Thanks @yahoNanJing.

@yahoNanJing
Copy link
Contributor Author

Thanks @mustafasrepo. I'll merge this PR.

@yahoNanJing yahoNanJing merged commit 6bfede0 into apache:main Mar 15, 2023
@mingmwang
Copy link
Contributor

@yahoNanJing
I think if we just to collect/search some information from a tree like structure, we do not need to define a Visitor Trait.
A visitor in this case is actually just a map function (closure) we want to apply to a tree structure.
For example, if we want to traverse a Vec/List container and collect/search some information, we do not need to define a Visitor.

@yahoNanJing
Copy link
Contributor Author

Thanks @mingmwang for your suggestion. I'll raise another PR to replace the Visitor by closure which will be much easier to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a utility function to collect all of the PartitionedFile which is needed for an ExecutionPlan

4 participants