Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples of using TreeNode APIs to walk and manipulate LogicalPlans #10628

Open
alamb opened this issue May 22, 2024 · 6 comments
Open

Examples of using TreeNode APIs to walk and manipulate LogicalPlans #10628

alamb opened this issue May 22, 2024 · 6 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed

Comments

@alamb
Copy link
Contributor

alamb commented May 22, 2024

Is your feature request related to a problem or challenge?

The usecase of walking LogicalPlan to either analyze the plan or rewrite it has come up multiple times recently

Specifically I think both @goldmedal and @cisaacson have asked about this recently and so I think the documentation is not clear enough

Describe the solution you'd like

In order to help people do this, I would like to add some examples of how to walk the tree

I suggest first adding simple doc example in the docs of LogicalPlan itself

The example would do somethig like create a plan using LogicalPlanBuilder

  Project
    Filter
     Scan

And then show how to:

  1. Use TreeNode::apply to recursively find something about the scan (maybe the table name?)
  2. Use the TreeNode::rewrite to recursively rewrite and remove the Filter (or something)

I think those would be relatively short examples to write and would illustrate the key apis

Describe alternatives you've considered

Bonus points for a more full featured example (perhaps linked from the LogicalPlan docs) of how use / walk / manipulate LogicalPlans

Similarly to https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/expr_api.rs, perahaps datafusion-examples/examples/plan_walk.rs or something

Additional context

No response

@alamb alamb added enhancement New feature or request documentation Improvements or additions to documentation help wanted Extra attention is needed labels May 22, 2024
@cisaacson
Copy link
Contributor

@alamb This is a great idea. As I learn more perhaps I can help.

@alamb
Copy link
Contributor Author

alamb commented May 23, 2024

Thank you @cisaacson

@alamb
Copy link
Contributor Author

alamb commented May 24, 2024

Moving out of slack into Github so it might be more easily found

If your usecase is to to get the list of filters and tables that appear in a query, one way to do this is:

  • Get the list of tables by finding all LogicalPlan::TableScan (any Join will have an input from a TableScan)
  • Get the list of filters/predicates, look in LogicalPlan::Filter and potentially the filters on the LogicalPlan::TableScan

So I think it could look something like (untested)

let mut referenced_tables = HashSet::new();
let mut filters = vec![];

// recursively visit all nodes in the tree (including subqueries)
logical_plan.apply_with_subqueries(|plan| {
  match plan {
   // record table names
    LogicalPlan::TableScan(table_scan) => { 
      referenced_tables.insert(table_scan.table_name);
   },
    // record filters
    LogicalPlan::Filter(filter) => {
      filters.push(filter.predicate);
     }
    // ignore other nodes
    _ => {}
     
      Ok(TreeNodeRecursion::Continue)
  }
 })?;

@cisaacson
Copy link
Contributor

This looks very good, pretty much what I implemented. The only question I have remaining is:

If TableScan has filters why would that not catch all filters? What does LogicalPlan::Filter catch that the TableScan.filters would not? For my specific use case I only need filters on the tables that are present in the query.

@alamb
Copy link
Contributor Author

alamb commented May 24, 2024

If TableScan has filters why would that not catch all filters?

Depending on the value of https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.TableProviderFilterPushDown.html sometimes filters would not be pushed down into the provider

Also filters can be duplicated in the table scan and filter (again depending on the value of the FilterPushDown

@cisaacson
Copy link
Contributor

Got it, that makes total sense. I will only care about the accepted pushdown filters, so the TableScan will work for what we need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants