[SPARK-29101][SQL][2.4] Fix count API for csv file when DROPMALFORMED mode is selected by sandeep-katta · Pull Request #25843 · apache/spark

sandeep-katta · 2019-09-19T03:18:01Z

What changes were proposed in this pull request?

#DataSet
fruit,color,price,quantity
apple,red,1,3
banana,yellow,2,4
orange,orange,3,5
xxx

This PR aims to fix the below

scala> spark.conf.set("spark.sql.csv.parser.columnPruning.enabled", false)
scala> spark.read.option("header", "true").option("mode", "DROPMALFORMED").csv("fruit.csv").count
res1: Long = 4

This is caused by the issue SPARK-24645.
SPARK-24645 issue can also be solved by SPARK-25387

Why are the changes needed?

SPARK-24645 caused this regression, so reverted the code as it can also be solved by SPARK-25387

Does this PR introduce any user-facing change?

No,

How was this patch tested?

Added UT, and also tested the bug SPARK-24645

SPARK-24645 regression

dongjoon-hyun · 2019-09-19T03:21:15Z

ok to test

dongjoon-hyun · 2019-09-19T03:21:30Z

Thank you for backporting, @sandeep-katta .

dongjoon-hyun · 2019-09-19T03:21:55Z

cc @HyukjinKwon

SparkQA · 2019-09-19T03:48:19Z

Test build #110964 has finished for PR 25843 at commit c8d8ff5.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-19T03:54:16Z

retest this please

SparkQA · 2019-09-19T04:25:02Z

Test build #110966 has finished for PR 25843 at commit c8d8ff5.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-09-19T04:51:47Z

Retest this please.

SparkQA · 2019-09-19T07:05:02Z

Test build #110969 has finished for PR 25843 at commit c8d8ff5.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-19T07:34:31Z

retest this please

SparkQA · 2019-09-19T11:59:29Z

Test build #110980 has finished for PR 25843 at commit c8d8ff5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Looks OK. It's worth noting this is a backport of #25820

dongjoon-hyun

+1, LGTM. Merged to branch-2.4
Thank you, @sandeep-katta , @HyukjinKwon , @srowen !

… mode is selected ### What changes were proposed in this pull request? #DataSet fruit,color,price,quantity apple,red,1,3 banana,yellow,2,4 orange,orange,3,5 xxx This PR aims to fix the below ``` scala> spark.conf.set("spark.sql.csv.parser.columnPruning.enabled", false) scala> spark.read.option("header", "true").option("mode", "DROPMALFORMED").csv("fruit.csv").count res1: Long = 4 ``` This is caused by the issue [SPARK-24645](https://issues.apache.org/jira/browse/SPARK-24645). SPARK-24645 issue can also be solved by [SPARK-25387](https://issues.apache.org/jira/browse/SPARK-25387) ### Why are the changes needed? SPARK-24645 caused this regression, so reverted the code as it can also be solved by SPARK-25387 ### Does this PR introduce any user-facing change? No, ### How was this patch tested? Added UT, and also tested the bug SPARK-24645 **SPARK-24645 regression** ![image](https://user-images.githubusercontent.com/35216143/65067957-4c08ff00-d9a5-11e9-8d43-a4a23a61e8b8.png) Closes #25843 from sandeep-katta/SPARK-29101_branch2.4. Authored-by: sandeep katta <sandeep.katta2007@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

fix malformed record issue

c8d8ff5

dongjoon-hyun changed the title ~~[SPARK-29101][SQL] [Backport]Fix count API for csv file when DROPMALFORMED mode is selected~~ [SPARK-29101][SQL][2.4] Fix count API for csv file when DROPMALFORMED mode is selected Sep 19, 2019

dongjoon-hyun added the SQL label Sep 19, 2019

srowen reviewed Sep 19, 2019

View reviewed changes

dongjoon-hyun approved these changes Sep 19, 2019

View reviewed changes

dongjoon-hyun closed this Sep 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29101][SQL][2.4] Fix count API for csv file when DROPMALFORMED mode is selected#25843

[SPARK-29101][SQL][2.4] Fix count API for csv file when DROPMALFORMED mode is selected#25843
sandeep-katta wants to merge 1 commit intoapache:branch-2.4from
sandeep-katta:SPARK-29101_branch2.4

sandeep-katta commented Sep 19, 2019

Uh oh!

dongjoon-hyun commented Sep 19, 2019

Uh oh!

dongjoon-hyun commented Sep 19, 2019

Uh oh!

dongjoon-hyun commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

HyukjinKwon commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

dongjoon-hyun commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

HyukjinKwon commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

srowen left a comment

Uh oh!

dongjoon-hyun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sandeep-katta commented Sep 19, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Sep 19, 2019

Uh oh!

dongjoon-hyun commented Sep 19, 2019

Uh oh!

dongjoon-hyun commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

HyukjinKwon commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

dongjoon-hyun commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

HyukjinKwon commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants