-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48067][SQL] Fix variant default columns #46312
Conversation
@@ -84,9 +84,16 @@ object ResolveDefaultColumns extends QueryErrorsBase | |||
if (SQLConf.get.enableDefaultColumns) { | |||
val newFields: Seq[StructField] = tableSchema.fields.map { field => | |||
if (field.metadata.contains(CURRENT_DEFAULT_COLUMN_METADATA_KEY)) { | |||
val analyzed: Expression = analyze(field, statementType) | |||
val defaultSql: String = if (field.dataType.isInstanceOf[VariantType]) { | |||
// A variant's SQL/string representation is its JSON string which cannot be directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like we should implement the type coercion instead of doing this approach. Then we can correctly store the variant value in as the column default expression rather than a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, i believe all default values are already stored as string sql expressions (usually literal expressions). I.e. the existing values (as a string sql expression) is retrieved here and is analyzed using analyze
and eventually coerced into the target type here. We could "coerce" the variant string into a variant by wrapping it with parse_json
as mentioned in the PR description. However, this feels inefficient. We would evaluate the default expression to produce a variant, transform it to its string representation (by expr.sql
) and then transform it back to its variant value later. Instead, we could just leave the expression as is and lazily evaluate it to avoid the extra variant-> string and string -> variant conversions.
Regarding your other comment, I believe this doesn't work even if ParseJson
can be constant folded (I quickly tried it) because analyze
attempts to analyze the expression as a variant {"k": "v"}
, which isn't valid sql syntax (there aren't even quotations around it). The root of the problem here is that there isn't a way to represent variant types as a literal
@richardc-db this would work if the Here's an example of marking an expression as constant-foldable: Line 37 in ad63eef
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
Outdated
Show resolved
Hide resolved
Sync with @richardc-db offline. I suggest changing the |
@@ -549,6 +549,7 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression { | |||
s"${Literal(kv._1, mapType.keyType).sql}, ${Literal(kv._2, mapType.valueType).sql}" | |||
} | |||
s"MAP(${keysAndValues.mkString(", ")})" | |||
case (v: VariantVal, variantType: VariantType) => s"PARSE_JSON('${v.toJson(timeZoneId)}')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some test cases for the change here.
Thanks, merging to master |
### What changes were proposed in this pull request? Changes the literal `sql` representation of a variant value to `parse_json(variant.toJson)`. This is because there is no other representation of a literal variant. This allows variant default columns to work because default columns store a literal string representation in the schema struct fields metadata as the default value. ### Why are the changes needed? previously we could not set a variant default column like ``` create table t( v6 variant default parse_json('{\"k\": \"v\"}') ) ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? added UT ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#46312 from richardc-db/fix_variant_default_cols. Authored-by: Richard Chen <r.chen@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
Changes the literal
sql
representation of a variant value toparse_json(variant.toJson)
. This is because there is no other representation of a literal variant.This allows variant default columns to work because default columns store a literal string representation in the schema struct fields metadata as the default value.
Why are the changes needed?
previously we could not set a variant default column like
Does this PR introduce any user-facing change?
no
How was this patch tested?
added UT
Was this patch authored or co-authored using generative AI tooling?
no