[SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour#13372
[SPARK-15585][SQL] Fix NULL handling along with a spark-csv behaivour#13372maropu wants to merge 3 commits intoapache:masterfrom
Conversation
|
@rxin Could you check this to satisfy your suggestion? If no problem, I'll also set default values in json options in a similar way. |
|
Test build #59551 has finished for PR 13372 at commit
|
|
Whether it should be done here or not, shouldn't we maybe define explicit behavioirs when the options are set to |
|
Test build #59559 has finished for PR 13372 at commit
|
|
yea, it is a difficult question. We must define an explicit behaviour for Anyway, yes, we need to document this change. After we get consensus, I'll add this doc. |
|
I think at least we need to handle the other options as well with a clearer message rather than |
|
This is the meaningful suggestion. I think all the option validation should be done in a single place and, if there is an invalidate option, spark should throw clear messages for making users easily understood. Anyway, I'm not sure these fixes should be included in this pr and follow-up prs seems to be okay to me. |
|
Thanks - I'm going to merge this in master/2.0. and make some fixes myself. |
## What changes were proposed in this pull request?
This pr fixes the behaviour of `format("csv").option("quote", null)` along with one of spark-csv.
Also, it explicitly sets default values for CSV options in python.
## How was this patch tested?
Added tests in CSVSuite.
Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
Closes #13372 from maropu/SPARK-15585.
(cherry picked from commit b7e8d1c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
Actually sorry -- I thought about this more and unfortunately we can't do it this way. The main problem is that we would break existing option use case, e.g. the following code: df.option("sep", "|").csv("...")In this case, the default sep would still be chosen. I'm going to revert this patch, and then think about a workaround instead. |
|
I think the best way is probably to document and ask users to use |
|
okay though, I'm getting on the flight to SF. So, I'll check in a day, thanks! |
What changes were proposed in this pull request?
This pr fixes the behaviour of
format("csv").option("quote", null)along with one of spark-csv.Also, it explicitly sets default values for CSV options in python.
How was this patch tested?
Added tests in CSVSuite.