New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync with PySpark upstream in Apache Spark #1211
Conversation
d77646f
to
99f1fa0
Compare
@@ -325,7 +325,7 @@ def _compute_stats(data, colname, whis, precision): | |||
# Computes mean, median, Q1 and Q3 with approx_percentile and precision | |||
pdf = (data._kdf._sdf | |||
.agg(*[F.expr('approx_percentile({}, {}, {})'.format(colname, q, | |||
1. / precision)) | |||
int(1. / precision))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because of SPARK-30266
@@ -8186,7 +8186,7 @@ def explain(self, extended: bool = False): | |||
== Optimized Logical Plan == | |||
... | |||
== Physical Plan == | |||
Scan ExistingRDD[__index_level_0__#...,id#...] | |||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected:
== Physical Plan ==
Scan ExistingRDD[__index_level_0__#...,id#...]
Got:
== Physical Plan ==
*(1) Scan ExistingRDD[__index_level_0__#9308L,id#9309L]
self.assert_eq(kidx.to_series(), pidx.to_series()) | ||
self.assert_eq(kidx.to_series(name='a'), pidx.to_series(name='a')) | ||
else: | ||
with self.sql_conf({'spark.sql.execution.arrow.enabled': False}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Struct type is supported from Spark 3.0. It failed in 2.3 because fallback isn't available there.
In Spark 3.0, it's supported when Arrow optimization is enabled. They produce different results:
E Left:
E b
E 0 4 {'__index_level_0__': 0, 'b': 4}
E 1 5 {'__index_level_0__': 1, 'b': 5}
E 3 6 {'__index_level_0__': 3, 'b': 6}
E 5 3 {'__index_level_0__': 5, 'b': 3}
E 6 2 {'__index_level_0__': 6, 'b': 2}
E 8 1 {'__index_level_0__': 8, 'b': 1}
E 9 0 {'__index_level_0__': 9, 'b': 0}
E 0 {'__index_level_0__': 9, 'b': 0}
E 0 {'__index_level_0__': 9, 'b': 0}
E dtype: object
E object
E
E Right:
E b
E 0 4 (0, 4)
E 1 5 (1, 5)
E 3 6 (3, 6)
E 5 3 (5, 3)
E 6 2 (6, 2)
E 8 1 (8, 1)
E 9 0 (9, 0)
E 0 (9, 0)
E 0 (9, 0)
E dtype: object
E object
Codecov Report
@@ Coverage Diff @@
## master #1211 +/- ##
==========================================
+ Coverage 95.18% 95.18% +<.01%
==========================================
Files 35 35
Lines 7204 7205 +1
==========================================
+ Hits 6857 6858 +1
Misses 347 347
Continue to review full report at Codecov.
|
This PR syncs Koalas to support PySpark 3.0. It's development purpose mainly.