Sync with PySpark upstream in Apache Spark #1211

HyukjinKwon · 2020-01-22T07:29:42Z

This PR syncs Koalas to support PySpark 3.0. It's development purpose mainly.

HyukjinKwon · 2020-01-22T07:30:37Z

databricks/koalas/plot.py

@@ -325,7 +325,7 @@ def _compute_stats(data, colname, whis, precision):
        # Computes mean, median, Q1 and Q3 with approx_percentile and precision
        pdf = (data._kdf._sdf
               .agg(*[F.expr('approx_percentile({}, {}, {})'.format(colname, q,
-                                                                    1. / precision))
+                                                                    int(1. / precision)))


It's because of SPARK-30266

HyukjinKwon · 2020-01-22T07:31:01Z

databricks/koalas/frame.py

@@ -8186,7 +8186,7 @@ def explain(self, extended: bool = False):
        == Optimized Logical Plan ==
        ...
        == Physical Plan ==
-        Scan ExistingRDD[__index_level_0__#...,id#...]
+        ...


Expected: == Physical Plan == Scan ExistingRDD[__index_level_0__#...,id#...] Got: == Physical Plan == *(1) Scan ExistingRDD[__index_level_0__#9308L,id#9309L]

HyukjinKwon · 2020-01-22T07:32:19Z

databricks/koalas/tests/test_indexes.py

-                self.assert_eq(kidx.to_series(), pidx.to_series())
-                self.assert_eq(kidx.to_series(name='a'), pidx.to_series(name='a'))
-        else:
+        with self.sql_conf({'spark.sql.execution.arrow.enabled': False}):


Struct type is supported from Spark 3.0. It failed in 2.3 because fallback isn't available there.

In Spark 3.0, it's supported when Arrow optimization is enabled. They produce different results:

E Left: E b E 0 4 {'__index_level_0__': 0, 'b': 4} E 1 5 {'__index_level_0__': 1, 'b': 5} E 3 6 {'__index_level_0__': 3, 'b': 6} E 5 3 {'__index_level_0__': 5, 'b': 3} E 6 2 {'__index_level_0__': 6, 'b': 2} E 8 1 {'__index_level_0__': 8, 'b': 1} E 9 0 {'__index_level_0__': 9, 'b': 0} E 0 {'__index_level_0__': 9, 'b': 0} E 0 {'__index_level_0__': 9, 'b': 0} E dtype: object E object E E Right: E b E 0 4 (0, 4) E 1 5 (1, 5) E 3 6 (3, 6) E 5 3 (5, 3) E 6 2 (6, 2) E 8 1 (8, 1) E 9 0 (9, 0) E 0 (9, 0) E 0 (9, 0) E dtype: object E object

codecov-io · 2020-01-22T08:03:39Z

Codecov Report

Merging #1211 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1211      +/-   ##
==========================================
+ Coverage   95.18%   95.18%   +<.01%     
==========================================
  Files          35       35              
  Lines        7204     7205       +1     
==========================================
+ Hits         6857     6858       +1     
  Misses        347      347

Impacted Files	Coverage Δ
databricks/koalas/plot.py	`94.28% <ø> (ø)`	⬆️
databricks/koalas/frame.py	`96.96% <ø> (ø)`	⬆️
databricks/koalas/utils.py	`95.34% <100%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e69edee...99f1fa0. Read the comment docs.

Sync with PySpark upstream in Apache Spark

99f1fa0

HyukjinKwon force-pushed the prepare-spark-3.0 branch from d77646f to 99f1fa0 Compare January 22, 2020 07:30

HyukjinKwon commented Jan 22, 2020

View reviewed changes

HyukjinKwon merged commit 8d838ae into databricks:master Jan 22, 2020

HyukjinKwon deleted the prepare-spark-3.0 branch September 11, 2020 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with PySpark upstream in Apache Spark #1211

Sync with PySpark upstream in Apache Spark #1211

HyukjinKwon commented Jan 22, 2020

HyukjinKwon Jan 22, 2020

HyukjinKwon Jan 22, 2020

HyukjinKwon Jan 22, 2020

codecov-io commented Jan 22, 2020 •

edited

Sync with PySpark upstream in Apache Spark #1211

Sync with PySpark upstream in Apache Spark #1211

Conversation

HyukjinKwon commented Jan 22, 2020

HyukjinKwon Jan 22, 2020

Choose a reason for hiding this comment

HyukjinKwon Jan 22, 2020

Choose a reason for hiding this comment

HyukjinKwon Jan 22, 2020

Choose a reason for hiding this comment

codecov-io commented Jan 22, 2020 • edited

Codecov Report

codecov-io commented Jan 22, 2020 •

edited