Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][Python] Nightly test for PySpark 3.2.0 fail with AttributeError on numpy.bool #33697

Closed
raulcd opened this issue Jan 16, 2023 · 5 comments · Fixed by #33714
Closed

[CI][Python] Nightly test for PySpark 3.2.0 fail with AttributeError on numpy.bool #33697

raulcd opened this issue Jan 16, 2023 · 5 comments · Fixed by #33714

Comments

@raulcd
Copy link
Member

raulcd commented Jan 16, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Nightly integration tests with PySpark 3.2.0 are failing with the following error:
test-conda-python-3.8-spark-v3.2.0

ERROR: test_with_key_complex (pyspark.sql.tests.test_pandas_cogrouped_map.CogroupedMapInPandasTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/spark/python/pyspark/sql/tests/test_pandas_cogrouped_map.py", line 160, in test_with_key_complex
    result = self.data1 \
  File "/spark/python/pyspark/sql/pandas/conversion.py", line 168, in toPandas
    pandas_type = PandasConversionMixin._to_corrected_pandas_type(field.dataType)
  File "/spark/python/pyspark/sql/pandas/conversion.py", line 238, in _to_corrected_pandas_type
    return np.bool
  File "/opt/conda/envs/arrow/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'bool'

Component(s)

Continuous Integration, Python

@raulcd
Copy link
Member Author

raulcd commented Jan 16, 2023

cc @AlenkaF @jorisvandenbossche

@AlenkaF
Copy link
Member

AlenkaF commented Jan 16, 2023

Numpy deprecated the use of np.bool in version 1.24:

the CI build for Nightly integration tests with PySpark 3.2.0 is using this (1.24) version of NumPy, but the fix in Apache Spark has already been merged: apache/spark#37817.

Am continuing to look into it ...

@AlenkaF
Copy link
Member

AlenkaF commented Jan 16, 2023

... ah yes, the version of PySpark is 3.2.0 in the CI build (could just look at the title, not the CI setup :D).

I guess we should change numpy version till the new release of PySpark is out with the fix?

@jorisvandenbossche
Copy link
Member

I guess we should change numpy version till the new release of PySpark is out with the fix?

That sounds correct

@AlenkaF
Copy link
Member

AlenkaF commented Jan 16, 2023

Great, will make a PR.

raulcd pushed a commit that referenced this issue Mar 1, 2023
…buteError on numpy.bool (#33714)

### Rationale for this change
Fix for nightly integration tests with PySpark 3.2.0 failure.

### What changes are included in this PR?
NumPy version pin in `docker-compose.yml`.

### Are these changes tested?
Will test on the open PR with the CI.

### Are there any user-facing changes?
No.
* Closes: #33697

Lead-authored-by: Alenka Frim <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
@raulcd raulcd added this to the 12.0.0 milestone Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment