Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with PySpark upstream in Apache Spark #1211

Merged
merged 1 commit into from Jan 22, 2020

Conversation

HyukjinKwon
Copy link
Member

This PR syncs Koalas to support PySpark 3.0. It's development purpose mainly.

@@ -325,7 +325,7 @@ def _compute_stats(data, colname, whis, precision):
# Computes mean, median, Q1 and Q3 with approx_percentile and precision
pdf = (data._kdf._sdf
.agg(*[F.expr('approx_percentile({}, {}, {})'.format(colname, q,
1. / precision))
int(1. / precision)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because of SPARK-30266

@@ -8186,7 +8186,7 @@ def explain(self, extended: bool = False):
== Optimized Logical Plan ==
...
== Physical Plan ==
Scan ExistingRDD[__index_level_0__#...,id#...]
...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected:
    == Physical Plan ==
    Scan ExistingRDD[__index_level_0__#...,id#...]
Got:
    == Physical Plan ==
    *(1) Scan ExistingRDD[__index_level_0__#9308L,id#9309L]

self.assert_eq(kidx.to_series(), pidx.to_series())
self.assert_eq(kidx.to_series(name='a'), pidx.to_series(name='a'))
else:
with self.sql_conf({'spark.sql.execution.arrow.enabled': False}):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struct type is supported from Spark 3.0. It failed in 2.3 because fallback isn't available there.

In Spark 3.0, it's supported when Arrow optimization is enabled. They produce different results:

E   Left:
E      b
E   0  4    {'__index_level_0__': 0, 'b': 4}
E   1  5    {'__index_level_0__': 1, 'b': 5}
E   3  6    {'__index_level_0__': 3, 'b': 6}
E   5  3    {'__index_level_0__': 5, 'b': 3}
E   6  2    {'__index_level_0__': 6, 'b': 2}
E   8  1    {'__index_level_0__': 8, 'b': 1}
E   9  0    {'__index_level_0__': 9, 'b': 0}
E      0    {'__index_level_0__': 9, 'b': 0}
E      0    {'__index_level_0__': 9, 'b': 0}
E   dtype: object
E   object
E   
E   Right:
E      b
E   0  4    (0, 4)
E   1  5    (1, 5)
E   3  6    (3, 6)
E   5  3    (5, 3)
E   6  2    (6, 2)
E   8  1    (8, 1)
E   9  0    (9, 0)
E      0    (9, 0)
E      0    (9, 0)
E   dtype: object
E   object

@codecov-io
Copy link

codecov-io commented Jan 22, 2020

Codecov Report

Merging #1211 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1211      +/-   ##
==========================================
+ Coverage   95.18%   95.18%   +<.01%     
==========================================
  Files          35       35              
  Lines        7204     7205       +1     
==========================================
+ Hits         6857     6858       +1     
  Misses        347      347
Impacted Files Coverage Δ
databricks/koalas/plot.py 94.28% <ø> (ø) ⬆️
databricks/koalas/frame.py 96.96% <ø> (ø) ⬆️
databricks/koalas/utils.py 95.34% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e69edee...99f1fa0. Read the comment docs.

@HyukjinKwon HyukjinKwon merged commit 8d838ae into databricks:master Jan 22, 2020
@HyukjinKwon HyukjinKwon deleted the prepare-spark-3.0 branch September 11, 2020 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants