Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45827][SQL] Move data type checks to CreatableRelationProvider #45409

Closed

Conversation

cashmand
Copy link
Contributor

@cashmand cashmand commented Mar 6, 2024

What changes were proposed in this pull request?

In DataSource.scala, there are checks to prevent writing Variant and Interval types to a CreatableRelationalProvider. This PR unifies the checks in a method on CreatableRelationalProvider so that data sources can override in order to specify a different set of supported data types.

Why are the changes needed?

Allows data sources to specify what types they support, while providing a sensible default for most data sources.

Does this PR introduce any user-facing change?

The error message for Variant and Interval are now shared, and are a bit more generic. The intent is to otherwise not have any user-facing change.

How was this patch tested?

Unit tests added.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 6, 2024
Comment on lines 185 to 187
def supportsDataType(
dt: DataType
): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def supportsDataType(
dt: DataType
): Boolean = {
def supportsDataType(dt: DataType): Boolean = {

Comment on lines 190 to 191
case MapType(k, v, _) =>
supportsDataType(k) && supportsDataType(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case MapType(k, v, _) =>
supportsDataType(k) && supportsDataType(v)
case MapType(k, v, _) => supportsDataType(k) && supportsDataType(v)

case udt: UserDefinedType[_] => supportsDataType(udt.sqlType)
case _: AnsiIntervalType | CalendarIntervalType | VariantType => false
case BinaryType | BooleanType | ByteType | CalendarIntervalType | CharType(_) | DateType |
DayTimeIntervalType(_, _) | _ : DecimalType | DoubleType | FloatType |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing to have the interval types in both the true and false case matches. Shall we stick with the allowlist approach and only list the supported types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that was an accident. I'll remove from this list. I can have it only list supported types.

val e = intercept[AnalysisException] {
dataSource.planForWriting(SaveMode.ErrorIfExists, df.logicalPlan)
}
assert(e.getMessage.contains("UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use checkError for it.

@cashmand cashmand requested a review from cloud-fan March 7, 2024 05:02
@cashmand
Copy link
Contributor Author

cashmand commented Mar 7, 2024

@cloud-fan Thanks for the review! I've updated with your feedback.

* Check if the relation supports the given data type.
*
* @param dt Data type to check
* @return True if the data type is supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add @since 4.0.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yaooqinn yaooqinn changed the title [SPARK-45827] Move data type checks to CreatableRelationProvider [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider Mar 7, 2024
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 0f91642 Mar 7, 2024
dongjoon-hyun pushed a commit that referenced this pull request Mar 11, 2024
### What changes were proposed in this pull request?

#45409 created a default allow-list of types for data sources. The intent was to only prevent creation of the two types that had already been prevented elsewhere in code, but the match expression matched `StringType`, which is an object representing the default collation, instead of the `StringType` class, which represents any collation. This PR fixes the issue.

### Why are the changes needed?

Without it, the previous PR would be a breaking change for data sources that write StringType with a non-default collation.

### Does this PR introduce _any_ user-facing change?

It reverts the previous unintentional user-facing change.

### How was this patch tested?

Unit test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45463 from cashmand/SPARK-45827-followup.

Authored-by: cashmand <david.cashman@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
jpcorreia99 pushed a commit to jpcorreia99/spark that referenced this pull request Mar 12, 2024
### What changes were proposed in this pull request?

In DataSource.scala, there are checks to prevent writing Variant and Interval types to a `CreatableRelationalProvider`. This PR unifies the checks in a method on `CreatableRelationalProvider` so that data sources can override in order to specify a different set of supported data types.

### Why are the changes needed?

Allows data sources to specify what types they support, while providing a sensible default for most data sources.

### Does this PR introduce _any_ user-facing change?

The error message for Variant and Interval are now shared, and are a bit more generic. The intent is to otherwise not have any user-facing change.

### How was this patch tested?

Unit tests added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45409 from cashmand/SPARK-45827-CreatableRelationProvider.

Authored-by: cashmand <david.cashman@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jpcorreia99 pushed a commit to jpcorreia99/spark that referenced this pull request Mar 12, 2024
### What changes were proposed in this pull request?

apache#45409 created a default allow-list of types for data sources. The intent was to only prevent creation of the two types that had already been prevented elsewhere in code, but the match expression matched `StringType`, which is an object representing the default collation, instead of the `StringType` class, which represents any collation. This PR fixes the issue.

### Why are the changes needed?

Without it, the previous PR would be a breaking change for data sources that write StringType with a non-default collation.

### Does this PR introduce _any_ user-facing change?

It reverts the previous unintentional user-facing change.

### How was this patch tested?

Unit test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45463 from cashmand/SPARK-45827-followup.

Authored-by: cashmand <david.cashman@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

apache#45409 created a default allow-list of types for data sources. The intent was to only prevent creation of the two types that had already been prevented elsewhere in code, but the match expression matched `StringType`, which is an object representing the default collation, instead of the `StringType` class, which represents any collation. This PR fixes the issue.

### Why are the changes needed?

Without it, the previous PR would be a breaking change for data sources that write StringType with a non-default collation.

### Does this PR introduce _any_ user-facing change?

It reverts the previous unintentional user-facing change.

### How was this patch tested?

Unit test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45463 from cashmand/SPARK-45827-followup.

Authored-by: cashmand <david.cashman@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants