Column metadata should include origin for columns#1397
Column metadata should include origin for columns#1397shangyian merged 3 commits intoDataJunction:mainfrom
Conversation
…d include both columns
✅ Deploy Preview for thriving-cassata-78ae72 canceled.
|
| "column": "dispatcher_id", | ||
| "name": "default_DOT_dispatcher_DOT_dispatcher_id", | ||
| "node": "default.dispatcher", | ||
| "node": "default.repair_orders_fact", |
There was a problem hiding this comment.
The reason this is being renamed is that the origin of dispatcher_id is actually default.repair_orders_fact.dispatcher_id. The semantic_entity reflects something separate.
There was a problem hiding this comment.
Pull Request Overview
This PR refactors column metadata handling so that the "node" and "column" fields now accurately reflect the origin information rather than duplicating the semantic entity. Key changes include:
- Updating test cases to reflect the new origin values.
- Modifying query construction and column renaming to use the new semantic metadata.
- Adjusting the assemble_column_metadata logic and its invocations across the codebase.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| datajunction-server/tests/api/sql_v2_test.py | Updated node assignments in test cases. |
| datajunction-server/tests/api/sql_test.py | Updated semantic_type to "dimension" in test assertions. |
| datajunction-server/tests/api/dimension_links_test.py | Revised column and node values for dimension links tests. |
| datajunction-server/datajunction_server/construction/build_v2.py | Modified measures query building to incorporate new origin handling and semantic metadata flags. |
| datajunction-server/datajunction_server/construction/build.py | Adjusted column renaming logic to respect pre-existing semantic_entity values. |
| datajunction-server/datajunction_server/api/sql.py | Updated assembly of column metadata to use semantic metadata. |
| datajunction-server/datajunction_server/api/helpers.py | Revised the assemble_column_metadata function for improved origin column derivation. |
Comments suppressed due to low confidence (2)
datajunction-server/datajunction_server/construction/build_v2.py:257
- [nitpick] Consider renaming the 'preaggregate' parameter or adding a clarifying comment that it is used as the 'use_semantic_metadata' flag in assemble_column_metadata to improve code readability.
assemble_column_metadata(cast(ast.Column, col), preaggregate)
datajunction-server/datajunction_server/api/helpers.py:902
- Consider adding a comment that explains the assumed format of 'semantic_entity' (e.g. 'node.column') used to derive column_name and node_name, for future maintainability.
if use_semantic_metadata and has_semantic_entity:
agorajek
left a comment
There was a problem hiding this comment.
This is a cool change. I was wondering if you ran into an issue and made this fix or just accidentally though of it?
|
@agorajek I did run into issues... will explain separately since it's from a downstream consumer of this metadata. |
Summary
At the moment, the
ColumnMetadataprovided by the measures SQL query includes several fields:node,column, andsemantic_entity. At the moment,nodeandcolumnare just split-up versions ofsemantic_entity, which is not helpful for downstream consumers, since it's providing the exact same metadata that can be derived.Instead, it would be more helpful if
nodeandcolumnactually represented the origin column of the parent node(s), where possible. This PR makes that switch.Test Plan
make checkpassesmake testshows 100% unit test coverageDeployment Plan