Skip to content

Unify single-node and bulk deployment validation paths#2037

Merged
shangyian merged 5 commits intoDataJunction:mainfrom
shangyian:unify-validation
Apr 21, 2026
Merged

Unify single-node and bulk deployment validation paths#2037
shangyian merged 5 commits intoDataJunction:mainfrom
shangyian:unify-validation

Conversation

@shangyian
Copy link
Copy Markdown
Collaborator

@shangyian shangyian commented Apr 20, 2026

Summary

Single-node validate and bulk deployment historically ran two separate validators that gave different verdicts for the same SQL. This branch adds validate_node_data_v2 alongside the legacy validator, built on the same validate_node_query primitive deployment uses, and wires it into POST /nodes/{name}/validate/.

Shared primitives are called by both extract_node_graph (deployment) and validate_node_data_v2 (single-node), so candidate extraction and parent/missing classification produce identical results from both directions. The deployment path now also emits MissingParent records for unresolved ast.Table refs (matching single-node behavior) and filters non-metric resolutions out of derived-metric parent lists.

Validation changes on v2 over legacy:

  • Ambiguous unqualified column refs across joins are flagged (e.g., SELECT col FROM a JOIN b ON a.col = b.col where col lives on both)
  • EXPLODE(array<struct<a, b>>) AS (c1, c2): struct fields unpack positionally, both in LATERAL VIEW and projection forms
  • CROSS JOIN UNNEST(t.arr) AS u(x): UNNEST sees the left-side table in scope; struct unpacking and POSEXPLODE handled
  • Supports from-less series-generator pattern SELECT … LATERAL VIEW EXPLODE(sequence(1, N)) AS x
  • Multiple LATERAL VIEW EXPLODE(…) AS c in one select no longer collide on the default alias
  • VALUES (NULL, 'a'), ('x', 'b') AS t(c1, c2): column type comes from the first typed row, not any given null
  • arr[1] on a ListType column resolves to the element type
  • from_json(string_col, 'MAP<STRING, STRING>') resolves its return type so downstream references work
  • col.field struct access on a deep chain resolves correctly
  • When EXPLODE's arg references a missing column, the real Column X not found cause surfaces alongside the downstream Unable to infer type noise

Test Plan

Two fixture bugs surfaced and fixed.

  • PR has an associated issue: #
  • make check passes
  • make test shows 100% unit test coverage

Deployment Plan

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 20, 2026

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit 539363b
🔍 Latest deploy log https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/69e7d2206553d100080bab4b

@shangyian shangyian changed the title Unify validation between bulk and single node validators Unify single-node and bulk deployment validation paths Apr 21, 2026
@shangyian shangyian marked this pull request as ready for review April 21, 2026 22:44
@shangyian shangyian merged commit 3d71cf5 into DataJunction:main Apr 21, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant