Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import Pickler class directly from the pickle module #458

Closed
jvesely opened this issue Nov 3, 2021 · 6 comments · Fixed by #469
Closed

import Pickler class directly from the pickle module #458

jvesely opened this issue Nov 3, 2021 · 6 comments · Fixed by #469

Comments

@jvesely
Copy link
Contributor

jvesely commented Nov 3, 2021

from _pickle import Pickler # noqa: F401

uses _pickle module to import Pickler. However, as per python 3.8 documentation[0] this class is available directly in the pickle module.

from pickle import Pickler

works ok and it is equivalent to _pickle.Pickler

$ python 
Python 3.9.7 (default, Nov  1 2021, 14:08:06) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from _pickle import Pickler as p
>>> import pickle
>>> p is pickle.Pickler
True
>>>

[0] https://docs.python.org/3.8/library/pickle.html#pickle.Pickler

@jvesely
Copy link
Contributor Author

jvesely commented Nov 3, 2021

This also fixes issues with pypy3.8:

    import sys
    
    
    if sys.version_info < (3, 8):
        try:
            import pickle5 as pickle  # noqa: F401
            from pickle5 import Pickler  # noqa: F401
        except ImportError:
            import pickle  # noqa: F401
            from pickle import _Pickler as Pickler  # noqa: F401
    else:
        import pickle  # noqa: F401
>       from _pickle import _Pickler as Pickler  # noqa: F401
E       ModuleNotFoundError: No module named '_pickle'

../pypy38-venv/lib/pypy3.8/site-packages/joblib/externals/cloudpickle/compat.py:13: ModuleNotFoundError

@pierreglaser
Copy link
Member

pierreglaser commented Feb 4, 2022

Hi @jvesely, Is there a rationale for changing from _pickle import Pickler to from pickle import Pickler, other than fixing cloudpickle on pypy 3.8? If not, i'd rather keep this line, which need not be changed for other implementations than pypy, and branch the behavior for pypy in other to restore compatibility between cloudpickle, and pypy >= 3.8

@jvesely
Copy link
Contributor Author

jvesely commented Feb 4, 2022

Hi @jvesely, Is there a rationale for changing from _pickle import Pickler to from pickle import Pickler, other than fixing cloudpickle on pypy 3.8? If not, i'd rather keep this line, which need not be changed for other implementations than pypy, and branch the behavior for pypy in other to restore compatibility between cloudpickle, and pypy >= 3.8

This is not specific to version >= 3.8. Python 3.7 includes pickler.Pickler [0] and the same short script proves that _pickle.Pickler and pickle.Pickler. The same is true for Python3.6 as well.[1]

according to the python3 specs [2]: "Users should always import the standard version,". Using _pickle is just a non-portable way of doing the same thing.

This non-portability goes beyond PyPY. The same docs [2]: "In Python 3.0, the accelerated versions are considered implementation details of the pure Python versions." so the names and availability of any _pickle attributes can change between CPython versions.

[0] https://docs.python.org/3.7/library/pickle.html#pickle.Pickler
[1] https://docs.python.org/3.6/library/pickle.html#pickle.Pickler
[2] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes

@bsnresearcher
Copy link

I hope that this problem will be solve.
Scikit-learn package uses cloudpickle inside. And it causes the same above problem.

Perhaps, there are a lot of people want to use pypy because their code like using scikit-learn that scientific calculation code may require a lot of calculation so need speed up.

So solved this will contribute to their work a lot.

@jvesely
Copy link
Contributor Author

jvesely commented Apr 22, 2022

@pierreglaser any ETA on a fix for this? It's rather straightforward. I can open a PR if needed.

@pierreglaser
Copy link
Member

Hey @jvesely, PR welcome, I'll review it.

jvesely added a commit to jvesely/cloudpickle that referenced this issue May 6, 2022
The latter is an implementation detail (see [0]),
and the docs instruct users to "always import the standard version"[0].
Moreover, "pickle.Pickler" is the same as "_pickle.Pickler",
for all recent CPython releases (tested 3.6->3.9).

[0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes

Fixes: cloudpipe#458

Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
jvesely added a commit to jvesely/cloudpickle that referenced this issue May 18, 2022
The latter is an implementation detail (see [0]),
and the docs instruct users to "always import the standard version"[0].
Moreover, "pickle.Pickler" is the same as "_pickle.Pickler",
for all recent CPython releases (tested 3.6->3.9).

[0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes

Fixes: cloudpipe#458

Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
jvesely added a commit to jvesely/cloudpickle that referenced this issue May 20, 2022
The latter is an implementation detail (see [0]),
and the docs instruct users to "always import the standard version"[0].
Moreover, "pickle.Pickler" is the same as "_pickle.Pickler",
for all recent CPython releases (tested 3.6->3.9).

[0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes

Fixes: cloudpipe#458

Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
ogrisel pushed a commit that referenced this issue May 24, 2022
The latter is an implementation detail (see [0]),
and the docs instruct users to "always import the standard version"[0].
Moreover, "pickle.Pickler" is the same as "_pickle.Pickler",
for all recent CPython releases (tested 3.6->3.9).

[0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes

Fixes: #458

Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
dongjoon-hyun added a commit to apache/spark that referenced this issue May 3, 2023
### What changes were proposed in this pull request?

This PR aims two goals.
1. Make PySpark support Python 3.8+ with PyPy3
2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage

Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation.

### Why are the changes needed?

Previously, PySpark fails at PyPy3 `Python 3.8` environment.
```
pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13)
[PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log)
Traceback (most recent call last):
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module>
    from pyspark.java_gateway import local_connect_and_auth
  File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module>
    from pyspark.serializers import read_int, write_with_length, UTF8Deserializer
  File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module>
    from pyspark import cloudpickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module>
    from pyspark.cloudpickle.cloudpickle import *  # noqa
  File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module>
    from .compat import pickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module>
    from _pickle import Pickler  # noqa: F401
ModuleNotFoundError: No module named '_pickle'
```

To support Python 3.8 in PyPy3.
- From PyPy3.8, `_pickle` is removed.
  - cloudpipe/cloudpickle#458
- We need this change.
  - cloudpipe/cloudpickle#469

### Does this PR introduce _any_ user-facing change?

This is an additional support.

### How was this patch tested?

Pass the CIs.

Closes #41024 from dongjoon-hyun/SPARK-43348.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
LuciferYang pushed a commit to LuciferYang/spark that referenced this issue May 10, 2023
### What changes were proposed in this pull request?

This PR aims two goals.
1. Make PySpark support Python 3.8+ with PyPy3
2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage

Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation.

### Why are the changes needed?

Previously, PySpark fails at PyPy3 `Python 3.8` environment.
```
pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13)
[PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log)
Traceback (most recent call last):
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module>
    from pyspark.java_gateway import local_connect_and_auth
  File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module>
    from pyspark.serializers import read_int, write_with_length, UTF8Deserializer
  File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module>
    from pyspark import cloudpickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module>
    from pyspark.cloudpickle.cloudpickle import *  # noqa
  File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module>
    from .compat import pickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module>
    from _pickle import Pickler  # noqa: F401
ModuleNotFoundError: No module named '_pickle'
```

To support Python 3.8 in PyPy3.
- From PyPy3.8, `_pickle` is removed.
  - cloudpipe/cloudpickle#458
- We need this change.
  - cloudpipe/cloudpickle#469

### Does this PR introduce _any_ user-facing change?

This is an additional support.

### How was this patch tested?

Pass the CIs.

Closes apache#41024 from dongjoon-hyun/SPARK-43348.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants