Skip to content

Column metadata should include origin for columns#1397

Merged
shangyian merged 3 commits intoDataJunction:mainfrom
shangyian:sqlgen-dim-rename
Jun 4, 2025
Merged

Column metadata should include origin for columns#1397
shangyian merged 3 commits intoDataJunction:mainfrom
shangyian:sqlgen-dim-rename

Conversation

@shangyian
Copy link
Copy Markdown
Collaborator

@shangyian shangyian commented Jun 4, 2025

Summary

At the moment, the ColumnMetadata provided by the measures SQL query includes several fields: node, column, and semantic_entity. At the moment, node and column are just split-up versions of semantic_entity, which is not helpful for downstream consumers, since it's providing the exact same metadata that can be derived.

Instead, it would be more helpful if node and column actually represented the origin column of the parent node(s), where possible. This PR makes that switch.

Test Plan

  • PR has an associated issue: #
  • make check passes
  • make test shows 100% unit test coverage

Deployment Plan

@netlify
Copy link
Copy Markdown

netlify bot commented Jun 4, 2025

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit d32e24e
🔍 Latest deploy log https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/68405848c2ae42000835f6bc

"column": "dispatcher_id",
"name": "default_DOT_dispatcher_DOT_dispatcher_id",
"node": "default.dispatcher",
"node": "default.repair_orders_fact",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this is being renamed is that the origin of dispatcher_id is actually default.repair_orders_fact.dispatcher_id. The semantic_entity reflects something separate.

@shangyian shangyian changed the title Fix an issue where renaming dimensions for measures SQL creates shoul… Column metadata should include semantic metadata on columns Jun 4, 2025
@shangyian shangyian changed the title Column metadata should include semantic metadata on columns Column metadata should include origin for columns Jun 4, 2025
@shangyian shangyian marked this pull request as ready for review June 4, 2025 15:39
@shangyian shangyian requested a review from Copilot June 4, 2025 15:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors column metadata handling so that the "node" and "column" fields now accurately reflect the origin information rather than duplicating the semantic entity. Key changes include:

  • Updating test cases to reflect the new origin values.
  • Modifying query construction and column renaming to use the new semantic metadata.
  • Adjusting the assemble_column_metadata logic and its invocations across the codebase.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
datajunction-server/tests/api/sql_v2_test.py Updated node assignments in test cases.
datajunction-server/tests/api/sql_test.py Updated semantic_type to "dimension" in test assertions.
datajunction-server/tests/api/dimension_links_test.py Revised column and node values for dimension links tests.
datajunction-server/datajunction_server/construction/build_v2.py Modified measures query building to incorporate new origin handling and semantic metadata flags.
datajunction-server/datajunction_server/construction/build.py Adjusted column renaming logic to respect pre-existing semantic_entity values.
datajunction-server/datajunction_server/api/sql.py Updated assembly of column metadata to use semantic metadata.
datajunction-server/datajunction_server/api/helpers.py Revised the assemble_column_metadata function for improved origin column derivation.
Comments suppressed due to low confidence (2)

datajunction-server/datajunction_server/construction/build_v2.py:257

  • [nitpick] Consider renaming the 'preaggregate' parameter or adding a clarifying comment that it is used as the 'use_semantic_metadata' flag in assemble_column_metadata to improve code readability.
assemble_column_metadata(cast(ast.Column, col), preaggregate)

datajunction-server/datajunction_server/api/helpers.py:902

  • Consider adding a comment that explains the assumed format of 'semantic_entity' (e.g. 'node.column') used to derive column_name and node_name, for future maintainability.
if use_semantic_metadata and has_semantic_entity:

Copy link
Copy Markdown
Member

@agorajek agorajek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool change. I was wondering if you ran into an issue and made this fix or just accidentally though of it?

@shangyian
Copy link
Copy Markdown
Collaborator Author

@agorajek I did run into issues... will explain separately since it's from a downstream consumer of this metadata.

@shangyian shangyian merged commit b5c1ecc into DataJunction:main Jun 4, 2025
16 checks passed
@shangyian shangyian deleted the sqlgen-dim-rename branch June 4, 2025 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants