Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4293][SQL] Make Cast be able to handle complex types. #3150

Closed
wants to merge 6 commits into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Nov 7, 2014

Inserting data of type including ArrayType.containsNull == false or MapType.valueContainsNull == false or StructType.fields.exists(_.nullable == false) into Hive table will fail because Cast inserted by HiveMetastoreCatalog.PreInsertionCasts rule of Analyzer can't handle these types correctly.

Complex type cast rule proposal:

  • Cast for non-complex types should be able to cast the same as before.
  • Cast for ArrayType can evaluate if
    • Element type can cast
    • Nullability rule doesn't break
  • Cast for MapType can evaluate if
    • Key type can cast
    • Nullability for casted key type is false
    • Value type can cast
    • Nullability rule for value type doesn't break
  • Cast for StructType can evaluate if
    • The field size is the same
    • Each field can cast
    • Nullability rule for each field doesn't break
  • The nested structure should be the same.

Nullability rule:

  • If the casted type is nullable == true, the target nullability should be true

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23041 has started for PR 3150 at commit 287f410.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23041 has finished for PR 3150 at commit 287f410.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23041/
Test PASSed.

toField.nullable)
}

case _ => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if throwing exception will be more informative, than plain UnresolvedException thrown in logical plan analyzing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think the resolve check should be in logical plan analyzing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some expressions are checking the resolved in the dataType method, though.

@chenghao-intel
Copy link
Contributor

It looks good to me in general, and I like the idea of summarizing the convertible data type checking, but in the meantime, I am a little afraid it might be error-prone for future maintenance or new data type added.
Or can we remove the resolve method?

@ueshin
Copy link
Member Author

ueshin commented Nov 7, 2014

@chenghao-intel, Thank you for your comments.
If resolve method is removed, the nullability check (e.g. cast from ArrayType(IntegerType, containsNull = true) to ArrayType(IntegerType, containsNull = false) is apparently invalid) is also removed and it will cause unexpected errors. If there is a better way to ensure the nullability check, we can remove the method.

Conflicts:
	sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
@SparkQA
Copy link

SparkQA commented Nov 8, 2014

Test build #23081 has started for PR 3150 at commit f677c30.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 8, 2014

Test build #23081 has finished for PR 3150 at commit f677c30.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23081/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 8, 2014

Test build #23096 has started for PR 3150 at commit 8999868.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 8, 2014

Test build #23096 has finished for PR 3150 at commit 8999868.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23096/
Test PASSed.

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@SparkQA
Copy link

SparkQA commented Nov 15, 2014

Test build #23408 has started for PR 3150 at commit ba14003.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 15, 2014

Test build #23408 has finished for PR 3150 at commit ba14003.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23408/
Test PASSed.

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23899 has started for PR 3150 at commit e935939.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23899 has finished for PR 3150 at commit e935939.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23899/
Test PASSed.

@marmbrus
Copy link
Contributor

Sorry for the delay and thanks for working on this! Merging to master.

@marmbrus
Copy link
Contributor

Actually, apache is down :( LGTM, will merge later.

@asfgit asfgit closed this in 3344803 Dec 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants