Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18950][SQL] Report conflicting fields when merging two StructTypes #16365

Closed
wants to merge 4 commits into from

Conversation

Projects
None yet
4 participants
@bravo-zhang
Copy link
Contributor

commented Dec 21, 2016

What changes were proposed in this pull request?

Currently, StructType.merge() only reports data types of conflicting fields when merging two incompatible schemas. It would be nice to also report the field names for easier debugging.

How was this patch tested?

Unit test in DataTypeSuite.
Print exception message when conflict is triggered.

@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Dec 21, 2016

This is actually the message users face in some cases. Isn't it :)?

val df1 = spark.range(10).selectExpr("id as intcol", "cast(id as int) as longcol")
df1.write.parquet("/tmp/a")
val df2 = spark.range(10).selectExpr("id as intcol", "id as longcol")
df2.write.parquet("/tmp/b")
spark.read.option("mergeSchema", true).parquet("/tmp/a", "/tmp/b").show()

Before

...
Caused by: org.apache.spark.SparkException: Failed to merge incompatible data types IntegerType and LongType
  at org.apache.spark.sql.types.StructType$.merge(StructType.scala:515)
  at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:473)
  at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:471)
...

After

...
Caused by: org.apache.spark.SparkException: Failed to merge field longcol: Failed to merge incompatible data types IntegerType and LongType
  at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:480)
  at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:471)
  at scala.Option.map(Option.scala:146)
...

BTW, it looks the test in DataTypeSuite was not added by mistake.

@bravo-zhang

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2016

Thanks for the review @HyukjinKwon !
Are your stacktrace of Before and After swapped?
Do you mean the message is confusing? How about correcting it to Failed to merge field longcol: incompatible data types IntegerType and LongType?
For the missing test case, do you mean a merge case that are not StructType as in your example?
Or is it something else that I'm missing?

@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Dec 21, 2016

Doh, yeap, I just swapped back. I just simply meant I support this PR because this improves user's experience with a better message as well :).

dataType = dataType,
nullable = leftNullable || rightNullable)
case Failure(e) =>
throw new SparkException(s"Failed to merge field $leftName: " + e.getMessage)

This comment has been minimized.

Copy link
@gatorsmile

gatorsmile Jun 13, 2017

Member

$leftName -> '$leftName'

@gatorsmile

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

ok to test

@gatorsmile

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

@bravo-zhang Sorry for the late response. Could you please also add a test case for capturing the new error message?

@SparkQA

This comment has been minimized.

Copy link

commented Jun 13, 2017

Test build #77970 has started for PR 16365 at commit 9bc9f8a.

@gatorsmile

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

retest this please

@SparkQA

This comment has been minimized.

Copy link

commented Jun 13, 2017

Test build #77999 has finished for PR 16365 at commit 9bc9f8a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Jul 24, 2017

ping @bravo-zhang for adding the test.

@SparkQA

This comment has been minimized.

Copy link

commented Jul 25, 2017

Test build #79921 has finished for PR 16365 at commit 54a898f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@bravo-zhang

This comment has been minimized.

Copy link
Contributor Author

commented Jul 25, 2017

@gatorsmile

This comment has been minimized.

Copy link
Member

commented Jul 31, 2017

retest this please

leftField.copy(
dataType = merge(leftType, rightType),
nullable = leftNullable || rightNullable)
Try {

This comment has been minimized.

Copy link
@gatorsmile
dataType = dataType,
nullable = leftNullable || rightNullable)
case Failure(e) =>
throw new SparkException(s"Failed to merge field '$leftName': " + e.getMessage)

This comment has been minimized.

Copy link
@gatorsmile

gatorsmile Jul 31, 2017

Member

Could we throw an AnalysisException with both sides, left and right? Thanks!

This comment has been minimized.

Copy link
@bravo-zhang

bravo-zhang Jul 31, 2017

Author Contributor

Other exceptions in this class are also SparkException, for example the precision conflicts. Should we keep it as SparkException?
For "with both sides, left and right", do you mean just to modify the message a bit to include both left and right names(though they are the same)?
@gatorsmile your other comments are resolved.

left.merge(right)
}
}.getMessage
assert(message.contains("conflictColumn"))

This comment has been minimized.

Copy link
@gatorsmile

gatorsmile Jul 31, 2017

Member

Could we capture the whole message? It can help us review the error message.

@SparkQA

This comment has been minimized.

Copy link

commented Jul 31, 2017

Test build #80064 has finished for PR 16365 at commit 54a898f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Jul 31, 2017

Test build #80088 has finished for PR 16365 at commit f500a11.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Copy link

commented Jul 31, 2017

Test build #80090 has finished for PR 16365 at commit f4892c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@gatorsmile

This comment has been minimized.

Copy link
Member

commented Aug 1, 2017

LGTM

Thanks! Merging to master.

@asfgit asfgit closed this in 6b186c9 Aug 1, 2017

@bravo-zhang bravo-zhang deleted the bravo-zhang:spark-18950 branch Aug 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.