[SPARK-39807][PYTHON][PS] Respect Series.concat sort parameter to follow 1.4.3 behavior #37217

Yikun · 2022-07-18T11:32:52Z

What changes were proposed in this pull request?

Respect Series.concat sort parameter when num_series == 1 to follow 1.4.3 behavior.

Why are the changes needed?

In #36711, we follow the pandas 1.4.2 behaviors to respect Series.concat sort parameter except num_series == 1 case.

In pandas 1.4.3, fix the issue pandas-dev/pandas#47127. The bug of num_series == 1 is also fixed, so we add this PR to follow panda 1.4.3 behavior.

Does this PR introduce any user-facing change?

Yes, we already cover this case in:
https://github.com/apache/spark/blob/master/python/docs/source/migration_guide/pyspark_3.3_to_3.4.rst

In Spark 3.4, the Series.concat sort parameter will be respected to follow pandas 1.4 behaviors.

How was this patch tested?

CI passed
test_concat_index_axis passed with panda 1.3.5, 1.4.2, 1.4.3.

Yikun · 2022-07-18T12:54:45Z

Ready to go

dongjoon-hyun · 2022-07-18T16:59:03Z

python/pyspark/pandas/tests/test_namespace.py

@@ -334,19 +334,21 @@ def test_concat_index_axis(self):
            ([psdf.reset_index(), psdf], [pdf.reset_index(), pdf]),
            ([psdf, psdf[["C", "A"]]], [pdf, pdf[["C", "A"]]]),
            ([psdf[["C", "A"]], psdf], [pdf[["C", "A"]], pdf]),
-            # only one Series
-            ([psdf, psdf["C"]], [pdf, pdf["C"]]),
-            ([psdf["C"], psdf], [pdf["C"], pdf]),


nit. I believe we can keep the test coverage to prevent a future regression at Series.concat.

Thanks for reivew.

Yes, actually I also moved this to L347-L348, that means we will always check all case with latest pandas, to avoid regression. I will also bump infra pandas version to 1.4.3 after all fixes complete.

For pandas<1.4.3, these two cases will failed because pandas on Spark only follow the latest pandas behaviors, so I just skip them.

If you have any other concern, feel free to comments. Thanks!

Okay, fiar enough.

Perfect. Thanks.

HyukjinKwon · 2022-07-19T00:34:09Z

Merged to master.

Respect Series.concat sort parameter to follow 1.4.3 behavior

c943e3f

github-actions bot added CORE PANDAS API ON SPARK PYTHON labels Jul 18, 2022

HyukjinKwon approved these changes Jul 18, 2022

View reviewed changes

Yikun marked this pull request as ready for review July 18, 2022 12:54

dongjoon-hyun approved these changes Jul 18, 2022

View reviewed changes

dongjoon-hyun reviewed Jul 18, 2022

View reviewed changes

HyukjinKwon closed this in dcccbf4 Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-39807][PYTHON][PS] Respect Series.concat sort parameter to follow 1.4.3 behavior #37217

[SPARK-39807][PYTHON][PS] Respect Series.concat sort parameter to follow 1.4.3 behavior #37217

Yikun commented Jul 18, 2022

Yikun commented Jul 18, 2022

dongjoon-hyun Jul 18, 2022

Yikun Jul 18, 2022

HyukjinKwon Jul 19, 2022

dongjoon-hyun Jul 19, 2022

HyukjinKwon commented Jul 19, 2022

[SPARK-39807][PYTHON][PS] Respect Series.concat sort parameter to follow 1.4.3 behavior #37217

[SPARK-39807][PYTHON][PS] Respect Series.concat sort parameter to follow 1.4.3 behavior #37217

Conversation

Yikun commented Jul 18, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Yikun commented Jul 18, 2022

dongjoon-hyun Jul 18, 2022

Choose a reason for hiding this comment

Yikun Jul 18, 2022

Choose a reason for hiding this comment

HyukjinKwon Jul 19, 2022

Choose a reason for hiding this comment

dongjoon-hyun Jul 19, 2022

Choose a reason for hiding this comment

HyukjinKwon commented Jul 19, 2022