Skip to content

Remove parsed-AST cache from bulk metric derivation#2055

Merged
shangyian merged 2 commits intoDataJunction:mainfrom
shangyian:fix-bulk-derive-deepcopy-cache
Apr 24, 2026
Merged

Remove parsed-AST cache from bulk metric derivation#2055
shangyian merged 2 commits intoDataJunction:mainfrom
shangyian:fix-bulk-derive-deepcopy-cache

Conversation

@shangyian
Copy link
Copy Markdown
Collaborator

@shangyian shangyian commented Apr 24, 2026

Summary

After the frozen measures storage change in #2050, deployments could cause errors like:

dj                 | Traceback (most recent call last):
dj                 |   File "/code/datajunction_server/api/deployments.py", line 199, in _run_deployment
dj                 |     execute_result = await deploy(
dj                 |   File "/code/datajunction_server/internal/deployment/deployment.py", line 32, in deploy
dj                 |     return await orchestrator.execute()
dj                 |   File "/code/datajunction_server/internal/deployment/orchestrator.py", line 378, in execute
dj                 |     downstream = await self._execute_deployment_plan(deployment_plan)
dj                 |   File "/code/datajunction_server/internal/deployment/orchestrator.py", line 1206, in _execute_deployment_plan
dj                 |     derived = await self._derive_measures_for_deployed_metrics()
dj                 |   File "/code/datajunction_server/internal/deployment/orchestrator.py", line 1301, in _derive_measures_for_deployed_metrics
dj                 |     await derive_frozen_measures_bulk(self.session, revision_ids)
dj                 |   File "/code/datajunction_server/internal/nodes.py", line 816, in derive_frozen_measures_bulk
dj                 |     measures, derived_sql = await extractor.extract(
dj                 |   File "/code/datajunction_server/sql/decompose.py", line 829, in extract
dj                 |     base_components, derived_ast = self._extract_base(base_ast)
dj                 |   File "/code/datajunction_server/sql/decompose.py", line 993, in _extract_base
dj                 |     func.parent.replace(from_=func, to=result.combiner)  # type: ignore
dj                 | AttributeError: 'NoneType' object has no attribute 'replace'

This is because derive_frozen_measures_bulk was passing a shared query cache into MetricComponentExtractor.extract, which returned deepcopy views of cached query ASTs. Because Function.__deepcopy__ returns self, Function nodes ended up shared across copies with stale parent back-pointers, and _extract_base crashed at func.parent.replace(from_=func, to=...) during any deployment that touched metrics.

The fix drops the cache entirely: extract now always calls parse(query) fresh, which yields an AST with intact parent pointers. Re-parsing per metric is cheap relative to the DB work the bulk path already avoids.

Test Plan

  • PR has an associated issue: #
  • make check passes
  • make test shows 100% unit test coverage

Deployment Plan

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 24, 2026

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit 2f83309
🔍 Latest deploy log https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/69ebf525b0887c0008584da7

@shangyian shangyian changed the title Fix bulk derive deepcopy cache Remove parsed-AST cache from bulk metric derivation Apr 24, 2026
@shangyian shangyian marked this pull request as ready for review April 24, 2026 23:21
@shangyian shangyian added the bug Something isn't working label Apr 24, 2026
@shangyian shangyian merged commit 4236308 into DataJunction:main Apr 24, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant