Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39807][PYTHON][PS] Respect Series.concat sort parameter to follow 1.4.3 behavior #37217

Closed
wants to merge 1 commit into from

Conversation

Yikun
Copy link
Member

@Yikun Yikun commented Jul 18, 2022

What changes were proposed in this pull request?

Respect Series.concat sort parameter when num_series == 1 to follow 1.4.3 behavior.

Why are the changes needed?

In #36711, we follow the pandas 1.4.2 behaviors to respect Series.concat sort parameter except num_series == 1 case.

In pandas 1.4.3, fix the issue pandas-dev/pandas#47127. The bug of num_series == 1 is also fixed, so we add this PR to follow panda 1.4.3 behavior.

Does this PR introduce any user-facing change?

Yes, we already cover this case in:
https://github.com/apache/spark/blob/master/python/docs/source/migration_guide/pyspark_3.3_to_3.4.rst

In Spark 3.4, the Series.concat sort parameter will be respected to follow pandas 1.4 behaviors.

How was this patch tested?

  • CI passed
  • test_concat_index_axis passed with panda 1.3.5, 1.4.2, 1.4.3.

@Yikun Yikun marked this pull request as ready for review July 18, 2022 12:54
@Yikun
Copy link
Member Author

Yikun commented Jul 18, 2022

Ready to go

@@ -334,19 +334,21 @@ def test_concat_index_axis(self):
([psdf.reset_index(), psdf], [pdf.reset_index(), pdf]),
([psdf, psdf[["C", "A"]]], [pdf, pdf[["C", "A"]]]),
([psdf[["C", "A"]], psdf], [pdf[["C", "A"]], pdf]),
# only one Series
([psdf, psdf["C"]], [pdf, pdf["C"]]),
([psdf["C"], psdf], [pdf["C"], pdf]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. I believe we can keep the test coverage to prevent a future regression at Series.concat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reivew.

Yes, actually I also moved this to L347-L348, that means we will always check all case with latest pandas, to avoid regression. I will also bump infra pandas version to 1.4.3 after all fixes complete.

For pandas<1.4.3, these two cases will failed because pandas on Spark only follow the latest pandas behaviors, so I just skip them.

If you have any other concern, feel free to comments. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fiar enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Thanks.

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants