[SPARK-25072][DOC] Update migration guide for behavior change by xuanyuanking · Pull Request #22369 · apache/spark

xuanyuanking · 2018-09-09T04:29:05Z

What changes were proposed in this pull request?

Update the document for the behavior change in PySpark Row creation #22140.

How was this patch tested?

Existing UT.

SparkQA · 2018-09-09T04:47:45Z

Test build #95842 has finished for PR 22369 at commit d257a38.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

BryanCutler

Thanks @xuanyuanking! I had some thoughts on slightly different wording.

BryanCutler · 2018-09-09T05:29:44Z

docs/sql-programming-guide.md

 ## Upgrading From Spark SQL 2.3.0 to 2.3.1 and above

  - As of version 2.3.1 Arrow functionality, including `pandas_udf` and `toPandas()`/`createDataFrame()` with `spark.sql.execution.arrow.enabled` set to `True`, has been marked as experimental. These are still evolving and not currently recommended for use in production.
+  - In version 2.3.1 and earlier, it is possible for PySpark to create a Row object by providing more value than column number through the customized Row class. Since Spark 2.3.3, Spark will confirm value length is less or equal than column length in PySpark. See [SPARK-25072](https://issues.apache.org/jira/browse/SPARK-25072) for details.


Maybe say ..by providing more values than number of fields through a customized Row class. As of Spark 2.3.3, PySpark will raise a ValueError if the number of values are more than the number of fields. See...

Thanks Bryan, I'll address this after discussion.

HyukjinKwon · 2018-09-10T02:24:10Z

@xuanyuanking, no need to rush. Let's wait and discuss a bit more before proposing a change.

xuanyuanking · 2018-09-10T03:17:15Z

Got it, thanks @HyukjinKwon.

xuanyuanking · 2018-09-11T08:13:10Z

As the comment in #22140 (comment), I think this doc change is no more needed, I just close this, thanks @BryanCutler and @HyukjinKwon !

Update doc for SPARK-25072

d257a38

xuanyuanking mentioned this pull request Sep 9, 2018

[SPARK-25072][PySpark] Forbid extra value for custom Row #22140

Closed

BryanCutler reviewed Sep 9, 2018

View reviewed changes

xuanyuanking closed this Sep 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-25072][DOC] Update migration guide for behavior change#22369

[SPARK-25072][DOC] Update migration guide for behavior change#22369
xuanyuanking wants to merge 1 commit intoapache:masterfrom
xuanyuanking:SPARK-25072-DOC

xuanyuanking commented Sep 9, 2018 •

edited

Loading

Uh oh!

SparkQA commented Sep 9, 2018

Uh oh!

BryanCutler left a comment

Uh oh!

BryanCutler Sep 9, 2018

Uh oh!

xuanyuanking Sep 10, 2018

Uh oh!

HyukjinKwon commented Sep 10, 2018

Uh oh!

xuanyuanking commented Sep 10, 2018

Uh oh!

xuanyuanking commented Sep 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xuanyuanking commented Sep 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 9, 2018

Uh oh!

BryanCutler left a comment

Choose a reason for hiding this comment

Uh oh!

BryanCutler Sep 9, 2018

Choose a reason for hiding this comment

Uh oh!

xuanyuanking Sep 10, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Sep 10, 2018

Uh oh!

xuanyuanking commented Sep 10, 2018

Uh oh!

xuanyuanking commented Sep 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xuanyuanking commented Sep 9, 2018 •

edited

Loading