Speed up the recursive dimensions graph CTE query#1385
Speed up the recursive dimensions graph CTE query#1385shangyian merged 11 commits intoDataJunction:mainfrom
Conversation
✅ Deploy Preview for thriving-cassata-78ae72 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
agorajek
left a comment
There was a problem hiding this comment.
I see you just added local_dimensions. This is super cool. I am going to test it locally as well and then respond again.
| type=str(col.type), | ||
| path=path, | ||
| ) | ||
| return None |
There was a problem hiding this comment.
Yeah, I just had an annoying linter flag this function for not having returns in every case 😱
|
|
||
| paths = dag.union_all( | ||
| select( | ||
| dag.c.node_revision_id, |
There was a problem hiding this comment.
It's how you reference a column on a SQLAlchemy query or table object
There was a problem hiding this comment.
Wow. What a shortcut... Show how much I know ORMs :)
| type=str(col.type), | ||
| path=[], | ||
| ) | ||
| for col in node.current.columns |
There was a problem hiding this comment.
Should we skip some numerical columns here, or just let the user hurt themselves?
There was a problem hiding this comment.
What do you mean by numerical columns? Never mind I see what you mean!
| ) | ||
| for col in node.current.columns | ||
| ] | ||
| local_dimensions = [ |
There was a problem hiding this comment.
Oh never mind, here is the filtering I asked about above. We could probably merge these two statements, cause there may be some unused work in the first one. But up to you.
There was a problem hiding this comment.
Ohhh good catch, let me merge these two.
| key=lambda x: x[0], | ||
| ) == sorted( | ||
| [ | ||
| ("default.users.account_type", ["default.events", "default.users"]), |
There was a problem hiding this comment.
I am trying to figure out why the default.users item was added here. Any ideas?
There was a problem hiding this comment.
Actually, it seems like it was added in more unit tests in the output.
agorajek
left a comment
There was a problem hiding this comment.
This is so much better. I think the stages of building the dimension list are much clearer now.
Summary
The existing dimensions graph query is a recursive CTE that is increasingly slow on even fairly small graphs. I had a closer look at the recursive CTE query we're using, and it looks like it's exhaustively tracing the graph in a way that finds every possible path to a dimension node, rather than just picking up the shortest paths.
This PR changes things so that the dimensions graph function does the following:
Test Plan
make checkpassesmake testshows 100% unit test coverageDeployment Plan