[SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch by HyukjinKwon · Pull Request #24818 · apache/spark

HyukjinKwon · 2019-06-07T01:59:29Z

What changes were proposed in this pull request?

This PR is the same fix as #24816 but in vectorized dapply in SparkR.

How was this patch tested?

Manually tested.

…st batch

SparkQA · 2019-06-07T04:59:21Z

Test build #106264 has finished for PR 24818 at commit d848354.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-06-08T09:06:59Z

cc @BryanCutler and @mengxr

viirya · 2019-06-08T16:22:14Z

sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala

+        val actualDataTypes = (0 until batch.numCols()).map(i => batch.column(i).dataType())
+        assert(outputTypes == actualDataTypes, "Invalid schema from dapply(): " +
+          s"expected ${outputTypes.mkString(", ")}, got ${actualDataTypes.mkString(", ")}")
+        batch.rowIterator.asScala


My guess is that it was wanting to check the data type at the first batch only. I think this fix is right.

dongjoon-hyun · 2019-06-08T18:17:47Z

Retest this please

SparkQA · 2019-06-08T21:14:55Z

Test build #106309 has finished for PR 24818 at commit d848354.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-06-09T00:06:51Z

Hi, @HyukjinKwon .
#24816 said Issued fixed in #24734 but that PR might takes longer to merge. and #24734 is merged yesterday. Do we still need this?

HyukjinKwon · 2019-06-09T02:13:00Z

Yea here's a different code path for R.

HyukjinKwon · 2019-06-09T02:38:56Z

Let me merge this one. R vectorization and Python vectorization code paths are being matched for the possibility to deduplicate both code path.

felixcheung · 2019-06-09T03:27:17Z

just curious, is it possible to share more infra between python and R so we don't constantly trying to match up with python afterwards?

HyukjinKwon · 2019-06-09T04:40:57Z

Yea .. I was thinking about that. I just wanted to avoid .. like .. ugly deduplication ... let me probably try it in the near future with some neat deduplication. Let me cc you guys there.

…t eagerly read the first batch ## What changes were proposed in this pull request? This PR is the same fix as apache#24816 but in vectorized `dapply` in SparkR. ## How was this patch tested? Manually tested. Closes apache#24818 from HyukjinKwon/SPARK-27971. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the fir…

d848354

…st batch

viirya reviewed Jun 8, 2019

View reviewed changes

HyukjinKwon closed this in 6dcf09b Jun 9, 2019

dongjoon-hyun added the SQL label Feb 5, 2020

HyukjinKwon deleted the SPARK-27971 branch March 3, 2020 01:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch#24818

[SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch#24818
HyukjinKwon wants to merge 1 commit intoapache:masterfrom
HyukjinKwon:SPARK-27971

HyukjinKwon commented Jun 7, 2019

Uh oh!

SparkQA commented Jun 7, 2019

Uh oh!

HyukjinKwon commented Jun 8, 2019

Uh oh!

viirya Jun 8, 2019

Uh oh!

dongjoon-hyun commented Jun 8, 2019

Uh oh!

SparkQA commented Jun 8, 2019

Uh oh!

dongjoon-hyun commented Jun 9, 2019

Uh oh!

HyukjinKwon commented Jun 9, 2019

Uh oh!

HyukjinKwon commented Jun 9, 2019

Uh oh!

felixcheung commented Jun 9, 2019

Uh oh!

HyukjinKwon commented Jun 9, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

HyukjinKwon commented Jun 7, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 7, 2019

Uh oh!

HyukjinKwon commented Jun 8, 2019

Uh oh!

viirya Jun 8, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 8, 2019

Uh oh!

SparkQA commented Jun 8, 2019

Uh oh!

dongjoon-hyun commented Jun 9, 2019

Uh oh!

HyukjinKwon commented Jun 9, 2019

Uh oh!

HyukjinKwon commented Jun 9, 2019

Uh oh!

felixcheung commented Jun 9, 2019

Uh oh!

HyukjinKwon commented Jun 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HyukjinKwon commented Jun 9, 2019 •

edited

Loading