-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16126] [SQL] Better Error Message When using DataFrameReader without path
#13837
Conversation
path
path
Test build #61016 has finished for PR 13837 at commit
|
Test build #61044 has started for PR 13837 at commit |
Weird? How to stop this test case run? |
retest this please |
Test build #61074 has finished for PR 13837 at commit
|
@@ -40,7 +40,7 @@ private[sql] class ParquetOptions( | |||
if (!shortParquetCompressionCodecNames.contains(codecName)) { | |||
val availableCodecs = shortParquetCompressionCodecNames.keys.map(_.toLowerCase) | |||
throw new IllegalArgumentException(s"Codec [$codecName] " + | |||
s"is not available. Available codecs are ${availableCodecs.mkString(", ")}.") | |||
s"is not available. Known codecs are ${availableCodecs.mkString(", ")}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make it consistent with the output of the other cases. See the code:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CompressionCodecs.scala
Lines 49 to 51 in d6dc12e
case e: ClassNotFoundException => | |
throw new IllegalArgumentException(s"Codec [$codecName] " + | |
s"is not available. Known codecs are ${shortCompressionCodecNames.keys.mkString(", ")}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Available
was intentionally used because Parquet only supports snappy, gzip or lzo whereas Known
was used for text-based ones (Please see #10805 (comment)) as they support compression codecs including other codecs but that lists the known ones.
Test build #68556 has finished for PR 13837 at commit
|
Test build #68567 has finished for PR 13837 at commit
|
Test build #68579 has finished for PR 13837 at commit
|
@@ -2684,8 +2684,7 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume | |||
# It makes sure that we can omit path argument in read.df API and then it calls | |||
# DataFrameWriter.load() without path. | |||
expect_error(read.df(source = "json"), | |||
paste("Error in loadDF : analysis error - Unable to infer schema for JSON at .", | |||
"It must be specified manually")) | |||
paste("Error in loadDF : illegal argument - 'path' is not specified")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall this test is intentionally testing without path argument?
cc @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cc'ing me. Yes, I did. It seems the changes are reasonable as it seems this checking applies to the data sources that need path
.
@@ -322,6 +323,9 @@ case class DataSource( | |||
val equality = sparkSession.sessionState.conf.resolver | |||
StructType(schema.filterNot(f => partitionColumns.exists(equality(_, f.name)))) | |||
}.orElse { | |||
if (allPaths.isEmpty && !format.isInstanceOf[TextFileFormat]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gatorsmile, would this be better if we explain here text data source is excluded because text datasource always uses a schema consisting of a string field if the schema is not explicitly given?
BTW, should we maybe change text.TextFileFormat
to TextFileFormat
https://github.com/gatorsmile/spark/blob/45110370fb1889f244a6750ef2a18dbc9f1ba9c2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L139 ?
hi - where are we on this one? |
(gentle ping) |
What changes were proposed in this pull request?
When users do not specify the path in
DataFrameReader
APIs, it can get a confusing error message. For example,Error message:
After the fix, the error message will be like:
Another major goal of this PR is to add test cases for the latest changes in #13727.
prevent all column partitioning
How was this patch tested?
Test cases are added.