-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15143][SPARK-15144][SQL] Add CSV tests with HadoopFsRelationTest and support for nullValue for other types #12921
Conversation
@@ -447,7 +446,7 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { | |||
|
|||
verifyCars(cars, withHeader = true, checkValues = false) | |||
val results = cars.collect() | |||
assert(results(0).toSeq === Array(2012, "Tesla", "S", "null", "null")) | |||
assert(results(0).toSeq === Array(2012, "Tesla", "S", null, null)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being tested against the data as below:
year,make,model,comment,blank
"2012","Tesla","S",null,
1997,Ford,E350,"Go get one now they are going fast",
null,Chevy,Volt
Since the header is year
,make
,model
,comment
,blank
, this should produce the values 2012
,Tesla
,S
,null
,null
because nullValue
is set to "null"
.
Test build #57845 has finished for PR 12921 at commit
|
Test build #57846 has finished for PR 12921 at commit
|
Test build #57847 has finished for PR 12921 at commit
|
DateTimeUtils.millisToDays(DateTimeUtils.stringToTime(datum).getTime) | ||
case _: StringType => UTF8String.fromString(datum) | ||
case _ => throw new RuntimeException(s"Unsupported type: ${castType.typeName}") | ||
if (datum == null || (datum == options.nullValue && nullable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply the logic below was added just like inferField()
:
if (datum == null || (datum == options.nullValue && nullable)) {
null
} else {
...
Test build #57856 has finished for PR 12921 at commit
|
UTF8String.fromString("")) | ||
assert( | ||
CSVTypeCast.castTo("", StringType, nullable = false, CSVOptions()) == | ||
CSVTypeCast.castTo("", StringType, nullable = false, CSVOptions("nullValue", null)) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@falaki I just noticed and thought this test implies nullValue
does not apply for StringType
. Is this intendedly being exclusive? I thought nullValue
should be applied for all the types equivalently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, null
s for StringType
will be lost in the roundtrip of reading and writing.
Test build #57871 has finished for PR 12921 at commit
|
Test build #58045 has finished for PR 12921 at commit
|
Test build #58319 has finished for PR 12921 at commit
|
Hi @cloud-fan, Could you please take a look? |
Closing this since another PR has (I think) a better change. I will maybe submit another PR for adding some tests in the future. |
What changes were proposed in this pull request?
Currently,
nullValue
option does not work for some types,BooleanType
,TimestampType
,DateType
andStringType
. So, currently there is no way to read null for those types. This PR adds the support just like the other types.Also, CSV data source is not being tested with
HadoopFsRelationTest
as aHadoopFsRelation
.HadoopFsRelationTest
includes 50 more tests (eg. partitioned table tests).This PR adds two variables,
extraReadOptions
andextraWriteOptions
inHadoopFsRelationTest
so that the child class gives some options for reading and writing. In order to get the tests inHadoopFsRelationTest
passed, CSV data source needs to give optionsheader
andinferSchema
astrue
for reading andheader
astrue
for writing.How was this patch tested?
Unittests in
CSVHadoopFsRelationTest
,CSVTypeCastSuite
and edited tests inCSVSuite