[SPARK-25072][DOC] Update migration guide for behavior change#22369
[SPARK-25072][DOC] Update migration guide for behavior change#22369xuanyuanking wants to merge 1 commit intoapache:masterfrom
Conversation
|
Test build #95842 has finished for PR 22369 at commit
|
BryanCutler
left a comment
There was a problem hiding this comment.
Thanks @xuanyuanking! I had some thoughts on slightly different wording.
| ## Upgrading From Spark SQL 2.3.0 to 2.3.1 and above | ||
|
|
||
| - As of version 2.3.1 Arrow functionality, including `pandas_udf` and `toPandas()`/`createDataFrame()` with `spark.sql.execution.arrow.enabled` set to `True`, has been marked as experimental. These are still evolving and not currently recommended for use in production. | ||
| - In version 2.3.1 and earlier, it is possible for PySpark to create a Row object by providing more value than column number through the customized Row class. Since Spark 2.3.3, Spark will confirm value length is less or equal than column length in PySpark. See [SPARK-25072](https://issues.apache.org/jira/browse/SPARK-25072) for details. |
There was a problem hiding this comment.
Maybe say ..by providing more values than number of fields through a customized Row class. As of Spark 2.3.3, PySpark will raise a ValueError if the number of values are more than the number of fields. See...
There was a problem hiding this comment.
Thanks Bryan, I'll address this after discussion.
|
@xuanyuanking, no need to rush. Let's wait and discuss a bit more before proposing a change. |
|
Got it, thanks @HyukjinKwon. |
|
As the comment in #22140 (comment), I think this doc change is no more needed, I just close this, thanks @BryanCutler and @HyukjinKwon ! |
What changes were proposed in this pull request?
Update the document for the behavior change in PySpark Row creation #22140.
How was this patch tested?
Existing UT.