-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable DataflowPlanOptimizers for query rendering tests #1263
base: add-tracking-only-pushdown-optimizer
Are you sure you want to change the base?
Enable DataflowPlanOptimizers for query rendering tests #1263
Conversation
The metricflow query rendering tests do snapshot generation and comparison for standard rendering and optimized rendering. However, these plans only run the SqlQueryPlanOptimizers - they do not use the DataflowPlanOptimizers. This means our optimized plans were only partially optimized. Now with predicate pushdown it would be helpful to see the complete optimization effect on query plan rendering to SQL. This change makes that possible by including DataflowPlanOptimizers in the comparison helper function. For the time being, and to minimize thrash in snapshot plans, we only include the no-op PredicatePushdownOptimizer. This will allow us to track the impact of enabling predicate pushdown via that optimizer through query plan snapshot changes. A later change will add the branch combiner and update snapshot rendering accordingly.
Quick hack to get these working. Note for reviewers, this was less silly than factoring out the common internal logic in this method and splitting the entry-point - it's a whole lot of duplication of mf_test_configuration and stuff no matter what, so we might as well just jam in this conditional.
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing this!!
dimension_specs=(), | ||
time_dimension_specs=( | ||
TimeDimensionSpec( | ||
element_name="ds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not really related to the PR, but we don't allow non-metric_time time dims without entity links so it would make this more realistic to add one here.
time_dimension_specs=( | ||
TimeDimensionSpec( | ||
element_name="ds", | ||
entity_links=(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nit here
dimension_specs=(), | ||
time_dimension_specs=( | ||
TimeDimensionSpec( | ||
element_name="ds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
COALESCE(subq_27.visit__referrer_id, subq_38.visit__referrer_id) AS visit__referrer_id | ||
, MAX(subq_27.visits) AS visits | ||
, MAX(subq_38.buys) AS buys | ||
COALESCE(subq_31.visit__referrer_id, subq_42.visit__referrer_id) AS visit__referrer_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confused how we ended up with MORE aliases after adding more optimization, which I would expect to reduce aliases 🤔 but alas, not particularly important!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's not aliases, it's subgraph ID numbers. The reason for the change has to do with the shift in the number of calls to the DataflowPlanBuilder.build_plan()
methods.
Basically, we add subquery aliases inside the DataflowToSqlQueryPlanConverter. This happens in like 20 different places in there, because it makes a lot of subqueries. Fair enough.
The thing is, we have another class - the DataflowPlanNodeOutputDatasetResolver - which we use both for creating source nodes and also for doing certain operations inside the DataflowPlanBuilder. This class is a direct subclass of the DataflowToSqlQueryPlanConverter, and when it does its work it basically creates a whole bunch of new subqueries, which means every time we add calls to it we change these subquery numbers.
That's also why we have these huge ID number values - we've basically gone over the input DAG a whole bunch of times, incrementing ID numbers all the way, before we produce the final query output. So an unoptimized query still has high numbers and a bunch of hard to explain gaps in the sequence.
Adding more optimizers to the set won't change this going forward, since we will still be generating the same number of plans (although the plan emitted by the optimizer might have totally different ID number ranges, because optimizers actually have their own ), but the update in ID numbers is one of the reasons why I didn't add the branch combiner to this PR.
Enable DataflowPlanOptimizers for query rendering tests
The metricflow query rendering tests do snapshot generation and comparison
for standard rendering and optimized rendering. However, these plans only
run the SqlQueryPlanOptimizers - they do not use the DataflowPlanOptimizers.
This means our optimized plans were only partially optimized. Now with
predicate pushdown it would be helpful to see the complete optimization
effect on query plan rendering to SQL.
This change makes that possible by including DataflowPlanOptimizers in
the comparison helper function. For the time being, and to minimize
thrash in snapshot plans, we only include the no-op PredicatePushdownOptimizer.
This will allow us to track the impact of enabling predicate pushdown via
that optimizer through query plan snapshot changes.
A later change will add the branch combiner and update snapshot rendering
accordingly.
Note the distinct values tests needed a quick hack to keep working, which proved less
silly than a local refactor of the helper method.
Snapshot changes should be limited to ID number updates.