-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45827][SQL] Move data type checks to CreatableRelationProvider #45409
[SPARK-45827][SQL] Move data type checks to CreatableRelationProvider #45409
Conversation
def supportsDataType( | ||
dt: DataType | ||
): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def supportsDataType( | |
dt: DataType | |
): Boolean = { | |
def supportsDataType(dt: DataType): Boolean = { |
case MapType(k, v, _) => | ||
supportsDataType(k) && supportsDataType(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case MapType(k, v, _) => | |
supportsDataType(k) && supportsDataType(v) | |
case MapType(k, v, _) => supportsDataType(k) && supportsDataType(v) |
case udt: UserDefinedType[_] => supportsDataType(udt.sqlType) | ||
case _: AnsiIntervalType | CalendarIntervalType | VariantType => false | ||
case BinaryType | BooleanType | ByteType | CalendarIntervalType | CharType(_) | DateType | | ||
DayTimeIntervalType(_, _) | _ : DecimalType | DoubleType | FloatType | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's confusing to have the interval types in both the true and false case matches. Shall we stick with the allowlist approach and only list the supported types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that was an accident. I'll remove from this list. I can have it only list supported types.
val e = intercept[AnalysisException] { | ||
dataSource.planForWriting(SaveMode.ErrorIfExists, df.logicalPlan) | ||
} | ||
assert(e.getMessage.contains("UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use checkError
for it.
@cloud-fan Thanks for the review! I've updated with your feedback. |
* Check if the relation supports the given data type. | ||
* | ||
* @param dt Data type to check | ||
* @return True if the data type is supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add @since 4.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
thanks, merging to master! |
### What changes were proposed in this pull request? #45409 created a default allow-list of types for data sources. The intent was to only prevent creation of the two types that had already been prevented elsewhere in code, but the match expression matched `StringType`, which is an object representing the default collation, instead of the `StringType` class, which represents any collation. This PR fixes the issue. ### Why are the changes needed? Without it, the previous PR would be a breaking change for data sources that write StringType with a non-default collation. ### Does this PR introduce _any_ user-facing change? It reverts the previous unintentional user-facing change. ### How was this patch tested? Unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45463 from cashmand/SPARK-45827-followup. Authored-by: cashmand <david.cashman@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? In DataSource.scala, there are checks to prevent writing Variant and Interval types to a `CreatableRelationalProvider`. This PR unifies the checks in a method on `CreatableRelationalProvider` so that data sources can override in order to specify a different set of supported data types. ### Why are the changes needed? Allows data sources to specify what types they support, while providing a sensible default for most data sources. ### Does this PR introduce _any_ user-facing change? The error message for Variant and Interval are now shared, and are a bit more generic. The intent is to otherwise not have any user-facing change. ### How was this patch tested? Unit tests added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45409 from cashmand/SPARK-45827-CreatableRelationProvider. Authored-by: cashmand <david.cashman@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? apache#45409 created a default allow-list of types for data sources. The intent was to only prevent creation of the two types that had already been prevented elsewhere in code, but the match expression matched `StringType`, which is an object representing the default collation, instead of the `StringType` class, which represents any collation. This PR fixes the issue. ### Why are the changes needed? Without it, the previous PR would be a breaking change for data sources that write StringType with a non-default collation. ### Does this PR introduce _any_ user-facing change? It reverts the previous unintentional user-facing change. ### How was this patch tested? Unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45463 from cashmand/SPARK-45827-followup. Authored-by: cashmand <david.cashman@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? apache#45409 created a default allow-list of types for data sources. The intent was to only prevent creation of the two types that had already been prevented elsewhere in code, but the match expression matched `StringType`, which is an object representing the default collation, instead of the `StringType` class, which represents any collation. This PR fixes the issue. ### Why are the changes needed? Without it, the previous PR would be a breaking change for data sources that write StringType with a non-default collation. ### Does this PR introduce _any_ user-facing change? It reverts the previous unintentional user-facing change. ### How was this patch tested? Unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45463 from cashmand/SPARK-45827-followup. Authored-by: cashmand <david.cashman@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
In DataSource.scala, there are checks to prevent writing Variant and Interval types to a
CreatableRelationalProvider
. This PR unifies the checks in a method onCreatableRelationalProvider
so that data sources can override in order to specify a different set of supported data types.Why are the changes needed?
Allows data sources to specify what types they support, while providing a sensible default for most data sources.
Does this PR introduce any user-facing change?
The error message for Variant and Interval are now shared, and are a bit more generic. The intent is to otherwise not have any user-facing change.
How was this patch tested?
Unit tests added.
Was this patch authored or co-authored using generative AI tooling?
No.