From 4fedff1695d8ec6d524420e07779e2c808d652aa Mon Sep 17 00:00:00 2001 From: Yuming Wang Date: Fri, 1 Dec 2017 17:01:23 +0800 Subject: [PATCH 1/3] migration --- docs/sql-programming-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index a1b9c3bbfd059..4528b2e1668f6 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1776,6 +1776,8 @@ options. Note that, for DecimalType(38,0)*, the table above intentionally does not cover all other combinations of scales and precisions because currently we only infer decimal type like `BigInteger`/`BigInt`. For example, 1.1 is inferred as double type. - In PySpark, now we need Pandas 0.19.2 or upper if you want to use Pandas related functionalities, such as `toPandas`, `createDataFrame` from Pandas DataFrame, etc. - In PySpark, the behavior of timestamp values for Pandas related functionalities was changed to respect session timezone. If you want to use the old behavior, you need to set a configuration `spark.sql.execution.pandas.respectSessionTimeZone` to `False`. See [SPARK-22395](https://issues.apache.org/jira/browse/SPARK-22395) for details. + + - Since Spark 2.3, broadcast behaviour changed to broadcast the join side with an explicit broadcast hint first. See [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489) for details. ## Upgrading From Spark SQL 2.1 to 2.2 From 76148f582404a9f5272ae55d9994133020832536 Mon Sep 17 00:00:00 2001 From: Yuming Wang Date: Sat, 2 Dec 2017 08:30:17 +0800 Subject: [PATCH 2/3] Fix review comments --- docs/sql-programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 4528b2e1668f6..836c2874713c0 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1777,7 +1777,7 @@ options. - In PySpark, now we need Pandas 0.19.2 or upper if you want to use Pandas related functionalities, such as `toPandas`, `createDataFrame` from Pandas DataFrame, etc. - In PySpark, the behavior of timestamp values for Pandas related functionalities was changed to respect session timezone. If you want to use the old behavior, you need to set a configuration `spark.sql.execution.pandas.respectSessionTimeZone` to `False`. See [SPARK-22395](https://issues.apache.org/jira/browse/SPARK-22395) for details. - - Since Spark 2.3, broadcast behaviour changed to broadcast the join side with an explicit broadcast hint first. See [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489) for details. + - Since Spark 2.3, when either broadcast hash join or broadcast nested loop join is applicable, we prefer to broadcasting the table that is explicitly specified in a broadcast hint. For details, see the section [Broadcast Hint](#broadcast-hint-for-sql-queries) and [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489) for details. ## Upgrading From Spark SQL 2.1 to 2.2 From 069f8b6036fb3b23a00e9bb539d7128aedeaa084 Mon Sep 17 00:00:00 2001 From: Yuming Wang Date: Sun, 3 Dec 2017 08:57:22 +0800 Subject: [PATCH 3/3] Remove duplicate --- docs/sql-programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 836c2874713c0..b76be9132dd03 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1777,7 +1777,7 @@ options. - In PySpark, now we need Pandas 0.19.2 or upper if you want to use Pandas related functionalities, such as `toPandas`, `createDataFrame` from Pandas DataFrame, etc. - In PySpark, the behavior of timestamp values for Pandas related functionalities was changed to respect session timezone. If you want to use the old behavior, you need to set a configuration `spark.sql.execution.pandas.respectSessionTimeZone` to `False`. See [SPARK-22395](https://issues.apache.org/jira/browse/SPARK-22395) for details. - - Since Spark 2.3, when either broadcast hash join or broadcast nested loop join is applicable, we prefer to broadcasting the table that is explicitly specified in a broadcast hint. For details, see the section [Broadcast Hint](#broadcast-hint-for-sql-queries) and [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489) for details. + - Since Spark 2.3, when either broadcast hash join or broadcast nested loop join is applicable, we prefer to broadcasting the table that is explicitly specified in a broadcast hint. For details, see the section [Broadcast Hint](#broadcast-hint-for-sql-queries) and [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489). ## Upgrading From Spark SQL 2.1 to 2.2