Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39611][PYTHON][PS] Fix wrong aliases in __array_ufunc__ #37078

Closed
wants to merge 1 commit into from

Conversation

Yikun
Copy link
Member

@Yikun Yikun commented Jul 5, 2022

What changes were proposed in this pull request?

This PR fix the wrong aliases in __array_ufunc__

Why are the changes needed?

When running test with numpy 1.23.0 (current latest), hit a bug: NotImplementedError: pandas-on-Spark objects currently do not support <ufunc 'divide'>.

In __array_ufunc__ we first call maybe_dispatch_ufunc_to_dunder_op to try dunder methods first, and then we try pyspark API. maybe_dispatch_ufunc_to_dunder_op is from pandas code.

pandas fix a bug pandas-dev/pandas#44822 (comment) pandas-dev/pandas@206b249 when upgrade to numpy 1.23.0, we need to also sync this.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Current CI passed
  • The exsiting UT test_series_datetime already cover this, I also test it in my local env with 1.23.0
pip install "numpy==1.23.0"
python/run-tests --testnames 'pyspark.pandas.tests.test_series_datetime SeriesDateTimeTest.test_arithmetic_op_exceptions'

@Yikun Yikun marked this pull request as ready for review July 5, 2022 11:50
@HyukjinKwon
Copy link
Member

Merged to master, branch-3.3 and branch-3.2.

HyukjinKwon pushed a commit that referenced this pull request Jul 5, 2022
### What changes were proposed in this pull request?
This PR fix the wrong aliases in `__array_ufunc__`

### Why are the changes needed?
When running test with numpy 1.23.0 (current latest), hit a bug: `NotImplementedError: pandas-on-Spark objects currently do not support <ufunc 'divide'>.`

In `__array_ufunc__` we first call `maybe_dispatch_ufunc_to_dunder_op` to try dunder methods first, and then we try pyspark API. `maybe_dispatch_ufunc_to_dunder_op` is from pandas code.

pandas fix a bug pandas-dev/pandas#44822 (comment) pandas-dev/pandas@206b249 when upgrade to numpy 1.23.0, we need to also sync this.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Current CI passed
- The exsiting UT `test_series_datetime` already cover this, I also test it in my local env with 1.23.0
```shell
pip install "numpy==1.23.0"
python/run-tests --testnames 'pyspark.pandas.tests.test_series_datetime SeriesDateTimeTest.test_arithmetic_op_exceptions'
```

Closes #37078 from Yikun/SPARK-39611.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit fb48a14)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Jul 5, 2022
### What changes were proposed in this pull request?
This PR fix the wrong aliases in `__array_ufunc__`

### Why are the changes needed?
When running test with numpy 1.23.0 (current latest), hit a bug: `NotImplementedError: pandas-on-Spark objects currently do not support <ufunc 'divide'>.`

In `__array_ufunc__` we first call `maybe_dispatch_ufunc_to_dunder_op` to try dunder methods first, and then we try pyspark API. `maybe_dispatch_ufunc_to_dunder_op` is from pandas code.

pandas fix a bug pandas-dev/pandas#44822 (comment) pandas-dev/pandas@206b249 when upgrade to numpy 1.23.0, we need to also sync this.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Current CI passed
- The exsiting UT `test_series_datetime` already cover this, I also test it in my local env with 1.23.0
```shell
pip install "numpy==1.23.0"
python/run-tests --testnames 'pyspark.pandas.tests.test_series_datetime SeriesDateTimeTest.test_arithmetic_op_exceptions'
```

Closes #37078 from Yikun/SPARK-39611.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit fb48a14)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Jul 13, 2022
### What changes were proposed in this pull request?
Remove infra numpy<1.23.0 version limit to support numpy 1.23+ (latest) in infra.

### Why are the changes needed?
After below two PRs merged:

#37117: Fix annotation: `python/pyspark/pandas/frame.py:9970: error: Need type annotation for "raveled_column_labels"  [var-annotated]`
#37078: Fix wrong aliases in __array_ufunc__: `NotImplementedError: pandas-on-Spark objects currently do not support <ufunc 'divide'>`

We can now remove limit on infra file to support numpy > 1.23.0.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed and [numpy 1.23.1](https://github.com/Yikun/spark/runs/7314545823?check_suite_focus=true#step:9:49) installed in CI

Closes #37175 from Yikun/patch-24.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
### What changes were proposed in this pull request?
This PR fix the wrong aliases in `__array_ufunc__`

### Why are the changes needed?
When running test with numpy 1.23.0 (current latest), hit a bug: `NotImplementedError: pandas-on-Spark objects currently do not support <ufunc 'divide'>.`

In `__array_ufunc__` we first call `maybe_dispatch_ufunc_to_dunder_op` to try dunder methods first, and then we try pyspark API. `maybe_dispatch_ufunc_to_dunder_op` is from pandas code.

pandas fix a bug pandas-dev/pandas#44822 (comment) pandas-dev/pandas@206b249 when upgrade to numpy 1.23.0, we need to also sync this.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Current CI passed
- The exsiting UT `test_series_datetime` already cover this, I also test it in my local env with 1.23.0
```shell
pip install "numpy==1.23.0"
python/run-tests --testnames 'pyspark.pandas.tests.test_series_datetime SeriesDateTimeTest.test_arithmetic_op_exceptions'
```

Closes apache#37078 from Yikun/SPARK-39611.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit fb48a14)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants