-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-57186][SQL] Handle NullType in ExtractValue to return NULL instead of throwing #56237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
33d8f1e
a026c82
bb27326
9ccd46f
636496e
9d38932
ea0191c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| -- Automatically generated by SQLQueryTestSuite | ||
| -- !query | ||
| SELECT col.a FROM (SELECT null AS col) t | ||
| -- !query analysis | ||
| Project [null AS a#x] | ||
| +- SubqueryAlias t | ||
| +- Project [null AS col#x] | ||
| +- OneRowRelation |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| -- SPARK-57186: multipart field access (col.a) on a NullType base propagates NULL under the | ||
| -- single-pass resolver as well, consistently with the legacy analyzer. Dual-running both analyzers | ||
| -- locks in that consistency (no HYBRID_ANALYZER_EXCEPTION). | ||
| -- The col[0]/col['key'] subscript forms are intentionally not covered here: the single-pass | ||
| -- resolver does not resolve subscript extraction (UnresolvedExtractValue) at all -- a pre-existing | ||
| -- limitation independent of NullType -- so they are exercised only under the legacy analyzer in | ||
| -- extract-value-resolution-edge-cases.sql. | ||
| --SET spark.sql.analyzer.singlePassResolver.dualRunWithLegacy=true | ||
|
|
||
| SELECT col.a FROM (SELECT null AS col) t; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,3 +8,13 @@ SELECT col1.a, a FROM t1 ORDER BY col1.a; | |
| SELECT split(col1, '-')[1] AS a FROM VALUES('a-b') ORDER BY split(col1, '-')[1]; | ||
|
|
||
| DROP TABLE t1; | ||
|
|
||
| -- SPARK-57186: extracting a field/element/key from a NullType base returns NULL instead of | ||
| -- throwing INVALID_EXTRACT_BASE_FIELD_TYPE (SQL NULL propagation; a NullType column can arise e.g. | ||
| -- from schema evolution with missing columns). This applies uniformly to dotted field access | ||
| -- (`col.a`) and the subscript forms (`col[0]`, `col['key']`), and is implemented at the | ||
| -- user-facing resolution sites (ExtractValue.applyOrNull) without changing the shared | ||
| -- ExtractValue.extractValue utility. | ||
| SELECT col.a FROM (SELECT null AS col) t; | ||
| SELECT col[0] FROM (SELECT null AS col) t; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed the column-name nit: |
||
| SELECT col['key'] FROM (SELECT null AS col) t; | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| -- Automatically generated by SQLQueryTestSuite | ||
| -- !query | ||
| SELECT col.a FROM (SELECT null AS col) t | ||
| -- !query schema | ||
| struct<a:void> | ||
| -- !query output | ||
| NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: for a NullType base,
col.anow returns NULL while the equivalent subscript formscol[0]/col['key']still throwINVALID_EXTRACT_BASE_FIELD_TYPE.col.aandcol['a']are equivalent struct-field-access syntaxes elsewhere in Spark, so this asymmetry is worth confirming as intentional. It's largely tied to the analyzer-divergence finding above — a consistent cross-analyzer fix is the natural point to decide whether the subscript forms should followcol.ahere.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the legacy analyzer all three (
col.a,col[0],col['key']) now propagate NULL - the asymmetry is gone there. They're only dual-run forcol.abecause the single-pass resolver doesn't resolve subscript extraction at all today (normala[0]/m['k']fail under single-pass too;ExtractValueResolveris unwired) - a pre-existing limitation independent ofNullType, so the subscript forms run under the legacy analyzer only.