Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyjlwrap support for Pickle Serialization #863

Open
dmoliveira opened this issue Nov 14, 2020 · 5 comments
Open

pyjlwrap support for Pickle Serialization #863

dmoliveira opened this issue Nov 14, 2020 · 5 comments

Comments

@dmoliveira
Copy link

dmoliveira commented Nov 14, 2020

PyCall works nicely for many use cases between Python and Julia. In particular, there is one that could be improved and very important for Data Scientist community. For example, I tried to use it for PySpark library and works very well for the basic use case. But, if the user needs to create a UDF (User Defined Functions), the user will have trouble to serialize the functions.
The UDFs, in this case, would help to many DSs reuse Julia code and call spark to do the heavy work. Have this enabled, would improve the usage of Julia in different scenarios.

To solve the current issues with UDF, PyObject needs to be serializable with Pickle. I don't have much idea how to solve this, but I have a simple use case that if we fix would improve towards this functionality:

Example:

using PyCall
pickle = pyimport("pickle")
pickle.dumps(x -> x + 1)

Error:

ERROR: PyError ($(Expr(:escape, :(ccall(#= /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'TypeError'>
TypeError("cannot pickle 'PyCall.jlwrap' object")

Stacktrace:
 [1] pyerr_check at /root/.julia/packages/PyCall/zqDXB/src/exception.jl:60 [inlined]
 [2] pyerr_check at /root/.julia/packages/PyCall/zqDXB/src/exception.jl:64 [inlined]
 [3] _handle_error(::String) at /root/.julia/packages/PyCall/zqDXB/src/exception.jl:81
 [4] macro expansion at /root/.julia/packages/PyCall/zqDXB/src/exception.jl:95 [inlined]
 [5] #110 at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:43 [inlined]
 [6] disable_sigint at ./c.jl:446 [inlined]
 [7] __pycall! at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:42 [inlined]
 [8] _pycall!(::PyObject, ::PyObject, ::Tuple{var"#3#4"}, ::Int64, ::Ptr{Nothing}) at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:29
 [9] _pycall!(::PyObject, ::PyObject, ::Tuple{var"#3#4"}, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:11
 [10] (::PyObject)(::Function; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:86
 [11] (::PyObject)(::Function) at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:86
 [12] top-level scope at REPL[13]:1

Reference to UDF in Python: https://docs.databricks.com/spark/latest/spark-sql/udf-python.html

@stevengj stevengj changed the title PyObject support for Pickle Serialization pyjlwrap support for Pickle Serialization Nov 16, 2020
@stevengj
Copy link
Member

In other words, you want to serialize Julia objects (wrapped in Python objects) via Pickle.

I guess we could do this by embedding the Julia serialization format (via the Serialization stdlib) in pickle?

@dmoliveira
Copy link
Author

Exactly @stevengj . How we can accomplish this? Could you provide some guidance, please?

@stevengj
Copy link
Member

I think it involves overloading __getstate__ and __setstate__ (https://docs.python.org/3/library/pickle.html#object.__getstate__), but I would have to do a bit of reading on pickle and how it interacts with the C api.

@stevengj
Copy link
Member

Or rather, we probably want the lower-level __reduce__ interface (https://docs.python.org/3/library/pickle.html#object.__reduce__), which is more error-prone but will give us more control.

@dmoliveira
Copy link
Author

dmoliveira commented Nov 16, 2020

Great, @stevengj if we can overcome this, would be a huge step for the Julia community and would be glad to publish an article showing this new awesome feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants