SQL Generation Speed: Store pickled query AST#1280
SQL Generation Speed: Store pickled query AST#1280shangyian merged 8 commits intoDataJunction:mainfrom
Conversation
✅ Deploy Preview for thriving-cassata-78ae72 canceled.
|
f55f194 to
d802006
Compare
c4996cf to
6a617dc
Compare
| dimension_node_names, | ||
| options=[ | ||
| joinedload(Node.current).options( | ||
| selectinload(NodeRevision.columns).options( |
There was a problem hiding this comment.
These extra joins brought in are fairly expensive and not used in SQL buidling
| measure_columns.append(expr) | ||
| expr.set_semantic_type(SemanticType.MEASURE) # type: ignore | ||
| await parent_ast.compile(context) | ||
| dependencies, _ = await parent_ast.extract_dependencies( |
There was a problem hiding this comment.
By extracting dependencies at this stage, we avoid calling another query.compile() call on the final query, which is expensive and without much benefit (all it does is add type inference).
There was a problem hiding this comment.
That one fallout of imprecise function names. Every time I see build or compile I need to go back to the source and remind myself what it means.
There was a problem hiding this comment.
Agreed... compile is kind of like a "hydrate query with metadata" stage, and build actually changes the structure of the query. If you can come up with a better function name that represents the concept happy to rename compile! 😅
| def type(self) -> ColumnType: | ||
| return self.select.type | ||
|
|
||
| async def build( # pylint: disable=R0913,C0415 |
There was a problem hiding this comment.
legacy query build that is now completely unused
6a617dc to
997c305
Compare
|
|
||
| def upgrade(): | ||
| with op.batch_alter_table("noderevision", schema=None) as batch_op: | ||
| batch_op.add_column(sa.Column("query_ast", sa.PickleType(), nullable=True)) |
There was a problem hiding this comment.
Wow, SQL alchemy has an answer for any type these days...
There was a problem hiding this comment.
Yeah, this one is convenient!
| measure_columns.append(expr) | ||
| expr.set_semantic_type(SemanticType.MEASURE) # type: ignore | ||
| await parent_ast.compile(context) | ||
| dependencies, _ = await parent_ast.extract_dependencies( |
There was a problem hiding this comment.
That one fallout of imprecise function names. Every time I see build or compile I need to go back to the source and remind myself what it means.
997c305 to
43f4be8
Compare
Summary
This PR speeds up SQL generation quite significantly by storing and loading pickled query ASTs rather than recompiling a node query every time we need it. The primary changes include:
query_astcolumn on thenoderevisiontable to store compressed pickled ASTs.Our queries are structured in a way where we can easily reuse these compiled query ASTs across many query builds.
Test Plan
Local testing + unit tests.
make checkpassesmake testshows 100% unit test coverageDeployment Plan
We should do some additional testing prior to release.