Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38821][PYTHON] Skip nsmall/nlarge nan test under pandas 1.4.[0,1,2] #36356

Closed
wants to merge 1 commit into from

Conversation

Yikun
Copy link
Member

@Yikun Yikun commented Apr 26, 2022

What changes were proposed in this pull request?

Skip nsmall/nlarge nan test under pandas 1.4.[0,1,2].

Pandas get wrong results when np.nan in the sorting column since pandas-dev/pandas@16d2f59 (v1.4.0)

I confirmed this issue are fixed by:
pandas-dev/pandas@2886388

Why are the changes needed?

No

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI passed

Comment on lines +1818 to +1822
if not (LooseVersion("1.4.0") <= LooseVersion(pd.__version__) <= LooseVersion("1.4.2")):
self.assert_eq(psdf.nlargest(5, columns="a"), pdf.nlargest(5, columns="a"))
self.assert_eq(
psdf.nlargest(5, columns=["a", "b"]), pdf.nlargest(5, columns=["a", "b"])
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you still think I need to compare with real results rather than skip, I'd also like to change. We need to change index=np.random.rand(7) to a certain range, and construct a result df.

Because this is only failed with panda 1.4.0~1.4.2, so I thought skip is enough.

@Yikun
Copy link
Member Author

Yikun commented Apr 26, 2022

cc @itholic @xinrong-databricks @HyukjinKwon

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants