[SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch#24818
[SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch#24818HyukjinKwon wants to merge 1 commit intoapache:masterfrom
Conversation
|
Test build #106264 has finished for PR 24818 at commit
|
|
cc @BryanCutler and @mengxr |
| val actualDataTypes = (0 until batch.numCols()).map(i => batch.column(i).dataType()) | ||
| assert(outputTypes == actualDataTypes, "Invalid schema from dapply(): " + | ||
| s"expected ${outputTypes.mkString(", ")}, got ${actualDataTypes.mkString(", ")}") | ||
| batch.rowIterator.asScala |
There was a problem hiding this comment.
My guess is that it was wanting to check the data type at the first batch only. I think this fix is right.
|
Retest this please |
|
Test build #106309 has finished for PR 24818 at commit
|
|
Hi, @HyukjinKwon . |
|
Yea here's a different code path for R. |
|
Let me merge this one. R vectorization and Python vectorization code paths are being matched for the possibility to deduplicate both code path. |
|
just curious, is it possible to share more infra between python and R so we don't constantly trying to match up with python afterwards? |
|
Yea .. I was thinking about that. I just wanted to avoid .. like .. ugly deduplication ... let me probably try it in the near future with some neat deduplication. Let me cc you guys there. |
…t eagerly read the first batch ## What changes were proposed in this pull request? This PR is the same fix as apache#24816 but in vectorized `dapply` in SparkR. ## How was this patch tested? Manually tested. Closes apache#24818 from HyukjinKwon/SPARK-27971. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR is the same fix as #24816 but in vectorized
dapplyin SparkR.How was this patch tested?
Manually tested.