Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36537][PYTHON] Revisit disabled tests for CategoricalDtype #33817

Closed
wants to merge 4 commits into from
Closed

[SPARK-36537][PYTHON] Revisit disabled tests for CategoricalDtype #33817

wants to merge 4 commits into from

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Aug 24, 2021

What changes were proposed in this pull request?

This PR proposes to enable the tests, disabled since different behavior with pandas 1.3.

  • inplace argument for CategoricalDtype functions is deprecated from pandas 1.3, and seems they have bug. So we manually created the expected result and test them.
  • Fixed the GroupBy.transform since it doesn't work properly for CategoricalDtype.

Why are the changes needed?

We should enable the tests as much as possible even if pandas has a bug.

And we should follow the behavior of latest pandas.

Does this PR introduce any user-facing change?

Yes, GroupBy.transform now follow the behavior of latest pandas.

How was this patch tested?

Unittests.

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Test build #142716 has finished for PR 33817 at commit 10f6ff2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47216/

@SparkQA
Copy link

SparkQA commented Aug 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47216/

@HyukjinKwon HyukjinKwon changed the title [SPARK-36537][PYTHON] Take care of other tests disabled for CategoricalDtype. [SPARK-36537][PYTHON] Revisit disabled tests for CategoricalDtype Aug 24, 2021
@SparkQA
Copy link

SparkQA commented Aug 25, 2021

Test build #142745 has finished for PR 33817 at commit c272351.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47245/

@SparkQA
Copy link

SparkQA commented Aug 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47245/

python/pyspark/pandas/groupby.py Outdated Show resolved Hide resolved
python/pyspark/pandas/tests/test_categorical.py Outdated Show resolved Hide resolved
@itholic
Copy link
Contributor Author

itholic commented Aug 26, 2021

Thanks for the correction, @ueshin

@SparkQA
Copy link

SparkQA commented Aug 26, 2021

Test build #142777 has finished for PR 33817 at commit c6418e9.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests.

@SparkQA
Copy link

SparkQA commented Aug 26, 2021

Test build #142779 has finished for PR 33817 at commit bbb1501.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47277/

@SparkQA
Copy link

SparkQA commented Aug 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47277/

@SparkQA
Copy link

SparkQA commented Aug 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47279/

@SparkQA
Copy link

SparkQA commented Aug 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47279/

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Aug 26, 2021

Merged to master.

HyukjinKwon pushed a commit that referenced this pull request Aug 27, 2021
This PR proposes to enable the tests, disabled since different behavior with pandas 1.3.

- `inplace` argument for `CategoricalDtype` functions is deprecated from pandas 1.3, and seems they have bug. So we manually created the expected result and test them.
- Fixed the `GroupBy.transform` since it doesn't work properly for `CategoricalDtype`.

We should enable the tests as much as possible even if pandas has a bug.

And we should follow the behavior of latest pandas.

Yes, `GroupBy.transform` now follow the behavior of latest pandas.

Unittests.

Closes #33817 from itholic/SPARK-36537.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit fe48618)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon
Copy link
Member

Merged to branch-3.2 too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants