Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make source_semantic_models property accessible from a DataflowPlanNode #1218

Merged
merged 5 commits into from
May 24, 2024

Conversation

tlento
Copy link
Contributor

@tlento tlento commented May 16, 2024

This change ultimately adds the source_semantic_models property to
the DataflowPlan and adds a hook for enabling access to it from any
arbitrary DataflowPlanNode.

We currently have two use-cases for this, one in the cloud codebase
that needs the semantic model inputs for a dataflow plan, and the
upcoming predicate pushdown evaluation which needs the semantic model
inputs for a given DataflowPlanNode.

An earlier version of this change added the property directly to the
DataflowPlanNode, which would satisfy both use cases above. The issue
with having this property assigned directly to a DataflowPlanNode
is that the property might be considered both a node-level and graph-level
attribute, so it's not clear where to put the accessor.

The solution we came up with for this was to allow access to a DataflowPlan
DAG object built from the node, which would effectively encapsulate the
subgraph represented by the node and its ancestors. Then we can access
these subgraph properties through the DataflowPlan while making it clear
to the caller that what they are asking for is a subgraph-level, rather
than a node-level attribute.

Copy link

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link
Contributor

@courtneyholcomb courtneyholcomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!! 🚀

inside of each node, we make those properties of the DataflowPlan, and this node-level converter makes
such properties easily accessible.
"""
return DataflowPlan(sink_nodes=(self,))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So simple 🤯

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right? It was borderline impossible before @plypaul merged #1205

@@ -188,3 +208,24 @@ def __init__(self, sink_nodes: Sequence[DataflowPlanNode], plan_id: Optional[Dag
@property
def sink_node(self) -> DataflowPlanNode: # noqa: D102
return self._sink_nodes[0]

def __complete_subgraph(self, node: DataflowPlanNode) -> Sequence[DataflowPlanNode]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a staticmethod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that.

Copy link
Contributor

@courtneyholcomb courtneyholcomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🚢 🚢

@tlento tlento force-pushed the use-pushdown-params-for-disabling-time-constraint-pushdown branch from 31ee2df to 3a02ea7 Compare May 23, 2024 23:41
@tlento tlento force-pushed the make-source-semantic-models-available-from-nodes branch from 9ef3244 to 45cf01f Compare May 23, 2024 23:41
Copy link
Contributor Author

tlento commented May 24, 2024

Merge activity

  • May 23, 5:33 PM PDT: @tlento started a stack merge that includes this pull request via Graphite.
  • May 23, 5:47 PM PDT: Graphite rebased this pull request as part of a merge.
  • May 23, 5:52 PM PDT: @tlento merged this pull request with Graphite.

@tlento tlento force-pushed the use-pushdown-params-for-disabling-time-constraint-pushdown branch from 3a02ea7 to 9769994 Compare May 24, 2024 00:40
Base automatically changed from use-pushdown-params-for-disabling-time-constraint-pushdown to main May 24, 2024 00:46
This change ultimately adds the source_semantic_models property to
the DataflowPlan and adds a hook for enabling access to it from any
arbitrary DataflowPlanNode.

We currently have two use-cases for this, one in the cloud codebase
that needs the semantic model inputs for a dataflow plan, and the
upcoming predicate pushdown evaluation which needs the semantic model
inputs for a given DataflowPlanNode.

An earlier version of this change added the property directly to the
DataflowPlanNode, which would satisfy both use cases above. The issue
with having this property assigned directly to a DataflowPlanNode
is that the property might be considered both a node-level and graph-level
attribute, so it's not clear where to put the accessor.

The solution we came up with for this was to allow access to a DataflowPlan
DAG object built from the node, which would effectively encapsulate the
subgraph represented by the node and its ancestors. Then we can access
these subgraph properties through the DataflowPlan while making it clear
to the caller that what they are asking for is a subgraph-level, rather
than a node-level attribute.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants