New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import Pickler class directly from the pickle module #458
Comments
This also fixes issues with pypy3.8:
|
Hi @jvesely, Is there a rationale for changing |
This is not specific to version >= 3.8. Python 3.7 includes according to the python3 specs [2]: "Users should always import the standard version,". Using This non-portability goes beyond PyPY. The same docs [2]: "In Python 3.0, the accelerated versions are considered implementation details of the pure Python versions." so the names and availability of any [0] https://docs.python.org/3.7/library/pickle.html#pickle.Pickler |
I hope that this problem will be solve. Perhaps, there are a lot of people want to use pypy because their code like using scikit-learn that scientific calculation code may require a lot of calculation so need speed up. So solved this will contribute to their work a lot. |
@pierreglaser any ETA on a fix for this? It's rather straightforward. I can open a PR if needed. |
Hey @jvesely, PR welcome, I'll review it. |
The latter is an implementation detail (see [0]), and the docs instruct users to "always import the standard version"[0]. Moreover, "pickle.Pickler" is the same as "_pickle.Pickler", for all recent CPython releases (tested 3.6->3.9). [0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes Fixes: cloudpipe#458 Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
The latter is an implementation detail (see [0]), and the docs instruct users to "always import the standard version"[0]. Moreover, "pickle.Pickler" is the same as "_pickle.Pickler", for all recent CPython releases (tested 3.6->3.9). [0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes Fixes: cloudpipe#458 Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
The latter is an implementation detail (see [0]), and the docs instruct users to "always import the standard version"[0]. Moreover, "pickle.Pickler" is the same as "_pickle.Pickler", for all recent CPython releases (tested 3.6->3.9). [0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes Fixes: cloudpipe#458 Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
The latter is an implementation detail (see [0]), and the docs instruct users to "always import the standard version"[0]. Moreover, "pickle.Pickler" is the same as "_pickle.Pickler", for all recent CPython releases (tested 3.6->3.9). [0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes Fixes: #458 Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
### What changes were proposed in this pull request? This PR aims two goals. 1. Make PySpark support Python 3.8+ with PyPy3 2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation. ### Why are the changes needed? Previously, PySpark fails at PyPy3 `Python 3.8` environment. ``` pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13) [PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log) Traceback (most recent call last): File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details __import__(pkg_name) File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module> from pyspark.rdd import RDD, RDDBarrier File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module> from pyspark.java_gateway import local_connect_and_auth File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module> from pyspark.serializers import read_int, write_with_length, UTF8Deserializer File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module> from pyspark import cloudpickle File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module> from pyspark.cloudpickle.cloudpickle import * # noqa File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module> from .compat import pickle File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module> from _pickle import Pickler # noqa: F401 ModuleNotFoundError: No module named '_pickle' ``` To support Python 3.8 in PyPy3. - From PyPy3.8, `_pickle` is removed. - cloudpipe/cloudpickle#458 - We need this change. - cloudpipe/cloudpickle#469 ### Does this PR introduce _any_ user-facing change? This is an additional support. ### How was this patch tested? Pass the CIs. Closes #41024 from dongjoon-hyun/SPARK-43348. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This PR aims two goals. 1. Make PySpark support Python 3.8+ with PyPy3 2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation. ### Why are the changes needed? Previously, PySpark fails at PyPy3 `Python 3.8` environment. ``` pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13) [PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log) Traceback (most recent call last): File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details __import__(pkg_name) File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module> from pyspark.rdd import RDD, RDDBarrier File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module> from pyspark.java_gateway import local_connect_and_auth File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module> from pyspark.serializers import read_int, write_with_length, UTF8Deserializer File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module> from pyspark import cloudpickle File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module> from pyspark.cloudpickle.cloudpickle import * # noqa File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module> from .compat import pickle File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module> from _pickle import Pickler # noqa: F401 ModuleNotFoundError: No module named '_pickle' ``` To support Python 3.8 in PyPy3. - From PyPy3.8, `_pickle` is removed. - cloudpipe/cloudpickle#458 - We need this change. - cloudpipe/cloudpickle#469 ### Does this PR introduce _any_ user-facing change? This is an additional support. ### How was this patch tested? Pass the CIs. Closes apache#41024 from dongjoon-hyun/SPARK-43348. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
cloudpickle/cloudpickle/compat.py
Line 13 in 7ddb089
uses _pickle module to import Pickler. However, as per python 3.8 documentation[0] this class is available directly in the pickle module.
works ok and it is equivalent to _pickle.Pickler
[0] https://docs.python.org/3.8/library/pickle.html#pickle.Pickler
The text was updated successfully, but these errors were encountered: