Enable DataflowPlanOptimizers for query rendering tests #1263

tlento · 2024-06-12T02:16:47Z

Enable DataflowPlanOptimizers for query rendering tests

The metricflow query rendering tests do snapshot generation and comparison
for standard rendering and optimized rendering. However, these plans only
run the SqlQueryPlanOptimizers - they do not use the DataflowPlanOptimizers.

This means our optimized plans were only partially optimized. Now with
predicate pushdown it would be helpful to see the complete optimization
effect on query plan rendering to SQL.

This change makes that possible by including DataflowPlanOptimizers in
the comparison helper function. For the time being, and to minimize
thrash in snapshot plans, we only include the no-op PredicatePushdownOptimizer.
This will allow us to track the impact of enabling predicate pushdown via
that optimizer through query plan snapshot changes.

A later change will add the branch combiner and update snapshot rendering
accordingly.

Note the distinct values tests needed a quick hack to keep working, which proved less
silly than a local refactor of the helper method.

Snapshot changes should be limited to ID number updates.

The metricflow query rendering tests do snapshot generation and comparison for standard rendering and optimized rendering. However, these plans only run the SqlQueryPlanOptimizers - they do not use the DataflowPlanOptimizers. This means our optimized plans were only partially optimized. Now with predicate pushdown it would be helpful to see the complete optimization effect on query plan rendering to SQL. This change makes that possible by including DataflowPlanOptimizers in the comparison helper function. For the time being, and to minimize thrash in snapshot plans, we only include the no-op PredicatePushdownOptimizer. This will allow us to track the impact of enabling predicate pushdown via that optimizer through query plan snapshot changes. A later change will add the branch combiner and update snapshot rendering accordingly.

Quick hack to get these working. Note for reviewers, this was less silly than factoring out the common internal logic in this method and splitting the entry-point - it's a whole lot of duplication of mf_test_configuration and stuff no matter what, so we might as well just jam in this conditional.

github-actions · 2024-06-12T02:17:01Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

tlento · 2024-06-12T02:17:09Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tlento and the rest of your teammates on Graphite

courtneyholcomb

Thank you for fixing this!!

courtneyholcomb · 2024-06-12T17:18:02Z

tests_metricflow/query_rendering/test_cumulative_metric_rendering.py

+        dimension_specs=(),
+        time_dimension_specs=(
+            TimeDimensionSpec(
+                element_name="ds",


nit: not really related to the PR, but we don't allow non-metric_time time dims without entity links so it would make this more realistic to add one here.

courtneyholcomb · 2024-06-12T17:18:24Z

tests_metricflow/query_rendering/test_cumulative_metric_rendering.py

+        time_dimension_specs=(
+            TimeDimensionSpec(
+                element_name="ds",
+                entity_links=(),


same nit here

courtneyholcomb · 2024-06-12T17:18:36Z

tests_metricflow/query_rendering/test_cumulative_metric_rendering.py

+        dimension_specs=(),
+        time_dimension_specs=(
+            TimeDimensionSpec(
+                element_name="ds",


courtneyholcomb · 2024-06-12T17:22:15Z

...ng.py/SqlQueryPlan/BigQuery/test_conversion_metric_with_time_constraint__plan0_optimized.sql

-    COALESCE(subq_27.visit__referrer_id, subq_38.visit__referrer_id) AS visit__referrer_id
-    , MAX(subq_27.visits) AS visits
-    , MAX(subq_38.buys) AS buys
+    COALESCE(subq_31.visit__referrer_id, subq_42.visit__referrer_id) AS visit__referrer_id


confused how we ended up with MORE aliases after adding more optimization, which I would expect to reduce aliases 🤔 but alas, not particularly important!

So it's not aliases, it's subgraph ID numbers. The reason for the change has to do with the shift in the number of calls to the DataflowPlanBuilder.build_plan() methods.

Basically, we add subquery aliases inside the DataflowToSqlQueryPlanConverter. This happens in like 20 different places in there, because it makes a lot of subqueries. Fair enough.

The thing is, we have another class - the DataflowPlanNodeOutputDatasetResolver - which we use both for creating source nodes and also for doing certain operations inside the DataflowPlanBuilder. This class is a direct subclass of the DataflowToSqlQueryPlanConverter, and when it does its work it basically creates a whole bunch of new subqueries, which means every time we add calls to it we change these subquery numbers.

That's also why we have these huge ID number values - we've basically gone over the input DAG a whole bunch of times, incrementing ID numbers all the way, before we produce the final query output. So an unoptimized query still has high numbers and a bunch of hard to explain gaps in the sequence.

Adding more optimizers to the set won't change this going forward, since we will still be generating the same number of plans (although the plan emitted by the optimizer might have totally different ID number ranges, because optimizers actually have their own ), but the update in ID numbers is one of the reasons why I didn't add the branch combiner to this PR.

tlento added 3 commits June 11, 2024 18:45

Update snapshots - ID number changes

96006b6

cla-bot bot added the cla:yes label Jun 12, 2024

tlento mentioned this pull request Jun 12, 2024

Add PredicatePushdownOptimizer in tracking-only mode #1262

Open

tlento added Skip Changelog Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Jun 12, 2024

tlento temporarily deployed to DW_INTEGRATION_TESTS June 12, 2024 02:18 — with GitHub Actions Inactive

courtneyholcomb approved these changes Jun 12, 2024

View reviewed changes

github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable DataflowPlanOptimizers for query rendering tests #1263

Enable DataflowPlanOptimizers for query rendering tests #1263

tlento commented Jun 12, 2024 •

edited

github-actions bot commented Jun 12, 2024

tlento commented Jun 12, 2024 •

edited

courtneyholcomb left a comment

courtneyholcomb Jun 12, 2024

courtneyholcomb Jun 12, 2024

courtneyholcomb Jun 12, 2024

courtneyholcomb Jun 12, 2024

tlento Jun 12, 2024

Enable DataflowPlanOptimizers for query rendering tests #1263

Are you sure you want to change the base?

Enable DataflowPlanOptimizers for query rendering tests #1263

Conversation

tlento commented Jun 12, 2024 • edited

github-actions bot commented Jun 12, 2024

tlento commented Jun 12, 2024 • edited

courtneyholcomb left a comment

Choose a reason for hiding this comment

courtneyholcomb Jun 12, 2024

Choose a reason for hiding this comment

courtneyholcomb Jun 12, 2024

Choose a reason for hiding this comment

courtneyholcomb Jun 12, 2024

Choose a reason for hiding this comment

courtneyholcomb Jun 12, 2024

Choose a reason for hiding this comment

tlento Jun 12, 2024

Choose a reason for hiding this comment

tlento commented Jun 12, 2024 •

edited

tlento commented Jun 12, 2024 •

edited