Skip to content

Commit

Permalink
[SPARK-31853][DOCS] Mention removal of params mixins setter in migrat…
Browse files Browse the repository at this point in the history
…ion guide

### What changes were proposed in this pull request?
The Pyspark Migration Guide needs to mention a breaking change of the Pyspark ML API.

### Why are the changes needed?
In SPARK-29093, all setters have been removed from `Params` mixins in `pyspark.ml.param.shared`. Those setters had been part of the public pyspark ML API, hence this is a breaking change.

### Does this PR introduce _any_ user-facing change?
Only documentation.

### How was this patch tested?
Visually.

Closes #28663 from EnricoMi/branch-pyspark-migration-guide-setters.

Authored-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 4bbe3c2)
Signed-off-by: Sean Owen <srowen@gmail.com>
  • Loading branch information
EnricoMi authored and srowen committed Jun 3, 2020
1 parent 22d5d03 commit 3bb0824
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/pyspark-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.

- As of Spark 3.0, `Row` field names are no longer sorted alphabetically when constructing with named arguments for Python versions 3.6 and above, and the order of fields will match that as entered. To enable sorted fields by default, as in Spark 2.4, set the environment variable `PYSPARK_ROW_FIELD_SORTING_ENABLED` to `true` for both executors and driver - this environment variable must be consistent on all executors and driver; otherwise, it may cause failures or incorrect answers. For Python versions less than 3.6, the field names will be sorted alphabetically as the only option.

- In Spark 3.0, `pyspark.ml.param.shared.Has*` mixins do not provide any `set*(self, value)` setter methods anymore, use the respective `self.set(self.*, value)` instead. See [SPARK-29093](https://issues.apache.org/jira/browse/SPARK-29093) for details.

## Upgrading from PySpark 2.3 to 2.4

- In PySpark, when Arrow optimization is enabled, previously `toPandas` just failed when Arrow optimization is unable to be used whereas `createDataFrame` from Pandas DataFrame allowed the fallback to non-optimization. Now, both `toPandas` and `createDataFrame` from Pandas DataFrame allow the fallback by default, which can be switched off by `spark.sql.execution.arrow.fallback.enabled`.
Expand Down

0 comments on commit 3bb0824

Please sign in to comment.