[SPARK-21024][SQL] CSV parse mode handles Univocity parser exceptions#18250
[SPARK-21024][SQL] CSV parse mode handles Univocity parser exceptions#18250maropu wants to merge 2 commits intoapache:masterfrom
Conversation
|
Test build #77839 has finished for PR 18250 at commit
|
| import scala.util.Try | ||
| import scala.util.control.NonFatal | ||
|
|
||
| import com.univocity.parsers.common.TextParsingException |
There was a problem hiding this comment.
It looks this one can be removed.
| .option("maxColumns", "2") | ||
| .option("mode", "PERMISSIVE") | ||
| .load(path.getAbsolutePath) | ||
| checkAnswer(df, Row(0, 1) :: Row(null, null) :: Nil) |
There was a problem hiding this comment.
Should we maybe also check what is put in the malformed column?
There was a problem hiding this comment.
okay, I'll set columnNameOfCorruptRecord column.
|
@maropu, just to make sure, do you mind if I ask test this with |
|
oh, sure. I'll add tests. |
|
@HyukjinKwon I tried to fix this case even in |
| .option("columnNameOfCorruptRecord", columnNameOfCorruptRecord) | ||
| .option("wholeFile", wholeFile) | ||
| .load(path.getAbsolutePath) | ||
| checkAnswer(df, Row(0, 1, null) :: Row(null, null, "0,1,2,") :: Nil) |
There was a problem hiding this comment.
@HyukjinKwon weird behaviour..., when we set maxColumns in a Univocity parser, it seems currentParsedContent returns the (maxColumns + 1) elements in inputs.
|
Test build #77956 has finished for PR 18250 at commit
|
|
I found that if you using inferSchema option it will throw the same error. |
|
Yea, I thinks so. But, to fix the |
|
@gatorsmile @HyukjinKwon ping |
|
I am sorry @maropu. I can't think of a good way to handle this for now ... Will be back after thinking more maybe .. |
|
ok, thanks! |
|
I think we could be back if we have better way to handle this, so I'll close this for now (we better keeping this discussion in jira). |
What changes were proposed in this pull request?
This pr fixed code to handle Univocity parser exceptions by CSV parse modes.
The current master cannot skip the illegal records that Univocity parsers:
How was this patch tested?
Added tests in
CSVSuite.