Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Walker to Simplify Dataflow DAG Traversal #1206

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented May 11, 2024

Description

As per title.

@cla-bot cla-bot bot added the cla:yes label May 11, 2024
@plypaul plypaul marked this pull request as ready for review May 11, 2024 01:20
logger = logging.getLogger(__name__)


class DataflowDagWalker(DataflowPlanNodeVisitor, Generic[VisitorOutputT], ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof. In my opinion, the need to write another layer of abstraction to make it easier to write logic against a design pattern abstraction is a fireworks display level signal that maybe the pattern - or at least the existing implementation of it - was not the right choice.

As with the generic visitor itself, this makes it a little easier to implement arbitrary things, and ESPECIALLY arbitrary things that turn out to be glorified property accessors. In exchange, it makes it harder to both read the code and understand what's happening in the runtime, because you have to use your human brain to remember that anything hitting a DagWalker means huge chunks of the stack trace can be ignored, except of course when they can't, and good luck distinguishing those scenarios.

An alternative approach to doubling and tripling down on these generic interfaces to make it marginally easier to do things like count the nodes in the graph by type is to parcel out the operations into concrete needs and then develop strongly typed interfaces around them. As an example of what we currently need:

Node level -

  1. Optimization classes. Then nodes can be typed according to whether or not they can reasonably be optimized. At the moment the DataflowPlan only has 3 optimizable nodes, so this can refine things in a type-safe way that doesn't require us to add these nested layers of boilerplate.
  2. Conversion classes (well, class, really, because as a practical matter we only actually care about dataflow -> sql). Every node needs an implementation, so every node gets one.
  3. Recursive node properties - these can just be property methods, no visitors required, and we should move to that model regardless of what happens with the DataflowDagWalker

Plan level -

  1. Node classification by type, which can be done via a single walk in the initializer for the DataflowPlan
  2. Plan level properties, which can be boostrapped by initializer-level node classification and recursive node properties

What are the future operations we're expecting to implement here? How many of these are not better resolved via updates to our type specifications? How many of them truly require heterogenous operations coded to node subtypes vs implementing a "default handler" that applies to 90% of the object types in the hierarchy?

Copy link
Contributor Author

@plypaul plypaul May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your points are well noted and they are similar to the ones raised in the visitor discussion. This is intended more as an incremental improvement in the current structure. One potential change that I'm down to do in the short term is replace the visitor with the walker if you have the concern about an additional class.

What are the future operations we're expecting to implement here? ...

I was going for the incremental improvement, so these will require an opportunity to think / brainstorm more about upcoming work.

Copy link
Contributor

@tlento tlento May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going for the incremental improvement, so these will require an opportunity to think / brainstorm more about upcoming work.

Fair enough! I have a specific set of ideas and I'll be setting to work on making some changes once I'm out of the weeds. I also have a concept for some type structure improvements, would be great to chat about those as well.

This won't matter to me much one way or the other, and if it eases the boilerplate of writing traversals that seems fine.

I will say, I think a DfsWalker that does not do a walk is a baffling construct. I just looked at the link you sent me on top of this to the execution plan update and went through this set of stages of confusion:

Step 1 - it doesn't walk the DAG
Step 2 - wait, maybe it does, let me look at the original cllass
Step 3 - oh wow, it DOES walk the DAG, that's a bug, better tell Paul
Step 4 - no, wait, what's this default recursion thing?
Step 5 - oh, let me check the implementing class
Step 6 - oh ok I had it right the first time

If you like this simplification we can proceed and I'll review this tomorrow. Please do consider taking out the "disable recursion" flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like this simplification we can proceed and I'll review this tomorrow. Please do consider taking out the "disable recursion" flag.

For that class, I was using walk in a very general sense. Maybe traverse is a better term. For traversal, you could specify traversal behavior. Let me see what that looks like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a maximum traversal depth parameter? None means full, etc.? Seems more complicated though.

Base automatically changed from p--smr--04 to main May 16, 2024 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants