import Pickler class directly from the pickle module #458

jvesely · 2021-11-03T06:13:03Z

Line 13 in 7ddb089

from _pickle import Pickler # noqa: F401

uses _pickle module to import Pickler. However, as per python 3.8 documentation[0] this class is available directly in the pickle module.

from pickle import Pickler

works ok and it is equivalent to _pickle.Pickler

$ python 
Python 3.9.7 (default, Nov  1 2021, 14:08:06) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from _pickle import Pickler as p
>>> import pickle
>>> p is pickle.Pickler
True
>>>

[0] https://docs.python.org/3.8/library/pickle.html#pickle.Pickler

The text was updated successfully, but these errors were encountered:

jvesely · 2021-11-03T06:13:49Z

This also fixes issues with pypy3.8:

    import sys
    
    
    if sys.version_info < (3, 8):
        try:
            import pickle5 as pickle  # noqa: F401
            from pickle5 import Pickler  # noqa: F401
        except ImportError:
            import pickle  # noqa: F401
            from pickle import _Pickler as Pickler  # noqa: F401
    else:
        import pickle  # noqa: F401
>       from _pickle import _Pickler as Pickler  # noqa: F401
E       ModuleNotFoundError: No module named '_pickle'

../pypy38-venv/lib/pypy3.8/site-packages/joblib/externals/cloudpickle/compat.py:13: ModuleNotFoundError

pierreglaser · 2022-02-04T17:01:53Z

Hi @jvesely, Is there a rationale for changing from _pickle import Pickler to from pickle import Pickler, other than fixing cloudpickle on pypy 3.8? If not, i'd rather keep this line, which need not be changed for other implementations than pypy, and branch the behavior for pypy in other to restore compatibility between cloudpickle, and pypy >= 3.8

jvesely · 2022-02-04T19:34:06Z

Hi @jvesely, Is there a rationale for changing from _pickle import Pickler to from pickle import Pickler, other than fixing cloudpickle on pypy 3.8? If not, i'd rather keep this line, which need not be changed for other implementations than pypy, and branch the behavior for pypy in other to restore compatibility between cloudpickle, and pypy >= 3.8

This is not specific to version >= 3.8. Python 3.7 includes pickler.Pickler [0] and the same short script proves that _pickle.Pickler and pickle.Pickler. The same is true for Python3.6 as well.[1]

according to the python3 specs [2]: "Users should always import the standard version,". Using _pickle is just a non-portable way of doing the same thing.

This non-portability goes beyond PyPY. The same docs [2]: "In Python 3.0, the accelerated versions are considered implementation details of the pure Python versions." so the names and availability of any _pickle attributes can change between CPython versions.

[0] https://docs.python.org/3.7/library/pickle.html#pickle.Pickler
[1] https://docs.python.org/3.6/library/pickle.html#pickle.Pickler
[2] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes

bsnresearcher · 2022-04-22T00:38:19Z

I hope that this problem will be solve.
Scikit-learn package uses cloudpickle inside. And it causes the same above problem.

Perhaps, there are a lot of people want to use pypy because their code like using scikit-learn that scientific calculation code may require a lot of calculation so need speed up.

So solved this will contribute to their work a lot.

jvesely · 2022-04-22T15:10:28Z

@pierreglaser any ETA on a fix for this? It's rather straightforward. I can open a PR if needed.

pierreglaser · 2022-04-22T15:19:41Z

Hey @jvesely, PR welcome, I'll review it.

The latter is an implementation detail (see [0]), and the docs instruct users to "always import the standard version"[0]. Moreover, "pickle.Pickler" is the same as "_pickle.Pickler", for all recent CPython releases (tested 3.6->3.9). [0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes Fixes: cloudpipe#458 Signed-off-by: Jan Vesely <jano.vesely@gmail.com>

The latter is an implementation detail (see [0]), and the docs instruct users to "always import the standard version"[0]. Moreover, "pickle.Pickler" is the same as "_pickle.Pickler", for all recent CPython releases (tested 3.6->3.9). [0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes Fixes: #458 Signed-off-by: Jan Vesely <jano.vesely@gmail.com>

### What changes were proposed in this pull request? This PR aims two goals. 1. Make PySpark support Python 3.8+ with PyPy3 2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation. ### Why are the changes needed? Previously, PySpark fails at PyPy3 `Python 3.8` environment. ``` pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13) [PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log) Traceback (most recent call last): File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details __import__(pkg_name) File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module> from pyspark.rdd import RDD, RDDBarrier File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module> from pyspark.java_gateway import local_connect_and_auth File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module> from pyspark.serializers import read_int, write_with_length, UTF8Deserializer File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module> from pyspark import cloudpickle File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module> from pyspark.cloudpickle.cloudpickle import * # noqa File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module> from .compat import pickle File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module> from _pickle import Pickler # noqa: F401 ModuleNotFoundError: No module named '_pickle' ``` To support Python 3.8 in PyPy3. - From PyPy3.8, `_pickle` is removed. - cloudpipe/cloudpickle#458 - We need this change. - cloudpipe/cloudpickle#469 ### Does this PR introduce _any_ user-facing change? This is an additional support. ### How was this patch tested? Pass the CIs. Closes #41024 from dongjoon-hyun/SPARK-43348. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This PR aims two goals. 1. Make PySpark support Python 3.8+ with PyPy3 2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation. ### Why are the changes needed? Previously, PySpark fails at PyPy3 `Python 3.8` environment. ``` pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13) [PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log) Traceback (most recent call last): File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details __import__(pkg_name) File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module> from pyspark.rdd import RDD, RDDBarrier File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module> from pyspark.java_gateway import local_connect_and_auth File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module> from pyspark.serializers import read_int, write_with_length, UTF8Deserializer File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module> from pyspark import cloudpickle File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module> from pyspark.cloudpickle.cloudpickle import * # noqa File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module> from .compat import pickle File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module> from _pickle import Pickler # noqa: F401 ModuleNotFoundError: No module named '_pickle' ``` To support Python 3.8 in PyPy3. - From PyPy3.8, `_pickle` is removed. - cloudpipe/cloudpickle#458 - We need this change. - cloudpipe/cloudpickle#469 ### Does this PR introduce _any_ user-facing change? This is an additional support. ### How was this patch tested? Pass the CIs. Closes apache#41024 from dongjoon-hyun/SPARK-43348. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

jvesely mentioned this issue Nov 3, 2021

cloudpickle should import Pickler class directly from the pickle module joblib/joblib#1238

Closed

jlaasonen mentioned this issue Jan 20, 2022

Update to PyPy 3.8 ElectronicBabylonianLiterature/ebl-api#184

Merged

jvesely mentioned this issue May 6, 2022

compat: Import Pickler from "pickle" instead of "_pickle" #469

Merged

zwelz3 mentioned this issue May 11, 2022

Upstream Forensics jupyrdf/ipyradiant#127

Open

ogrisel closed this as completed in #469 May 24, 2022

dongjoon-hyun mentioned this issue May 3, 2023

[SPARK-43348][PYTHON] Support Python 3.8 in PyPy3 apache/spark#41024

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

import Pickler class directly from the pickle module #458

import Pickler class directly from the pickle module #458

jvesely commented Nov 3, 2021

jvesely commented Nov 3, 2021

pierreglaser commented Feb 4, 2022 •

edited

jvesely commented Feb 4, 2022 •

edited

bsnresearcher commented Apr 22, 2022

jvesely commented Apr 22, 2022

pierreglaser commented Apr 22, 2022

import Pickler class directly from the pickle module #458

import Pickler class directly from the pickle module #458

Comments

jvesely commented Nov 3, 2021

jvesely commented Nov 3, 2021

pierreglaser commented Feb 4, 2022 • edited

jvesely commented Feb 4, 2022 • edited

bsnresearcher commented Apr 22, 2022

jvesely commented Apr 22, 2022

pierreglaser commented Apr 22, 2022

pierreglaser commented Feb 4, 2022 •

edited

jvesely commented Feb 4, 2022 •

edited