Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of dill as a module in Alibi causes problems if a user runs joblib #447

Closed
ukclivecox opened this issue Jul 14, 2021 · 7 comments · Fixed by #455
Closed

Use of dill as a module in Alibi causes problems if a user runs joblib #447

ukclivecox opened this issue Jul 14, 2021 · 7 comments · Fixed by #455

Comments

@ukclivecox
Copy link

I ran moviesentiment notebook: https://github.com/SeldonIO/alibi/blob/master/examples/anchor_text_movie.ipynb

After training of sklearn model I added a new line to save model using joblib:

from joblib import dump
dump(skmodel, 'model.joblib') 

Loading that into a new environment with sklearn fails with:

    self._joblib = joblib.load(model_file)
  File "/opt/conda/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/opt/conda/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/opt/conda/lib/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/opt/conda/lib/python3.7/pickle.py", line 1376, in load_global
    klass = self.find_class(module, name)
  File "/opt/conda/lib/python3.7/pickle.py", line 1426, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'dill'

This would seem to be due to dill "magic" where it on load disrupts some core python module handling that means it can cause problems when a user uses joblib.

This is fine if a user knows dill is active but the above import of dill is hidden inside alibi module.

@jklaise
Copy link
Member

jklaise commented Jul 14, 2021

Pretty bad behaviour from dill! A workaround would be importing dill on-demand (inside save, load functions) and then cleaning it up before functions exit, but it's not the best solution. To start with would be good to gather existing reports of dill and pickle (or joblib which is based on pickle) incompatibilities.

@ukclivecox
Copy link
Author

This might adds some background: uqfoundation/dill#383 (comment)

@jklaise
Copy link
Member

jklaise commented Jul 16, 2021

@cliveseldon I can't reproduce this either in Python 3.8 or Python 3.7, regardless whether I load alibi or dill before calling joblib.load or not. Do you have a minimal example with scripts for the saving and loading and the accompanying details of the environments where these are run?

@ukclivecox
Copy link
Author

Did you test by saving in an environment where dill is present and loading in an environment where dill is not present?

@jklaise
Copy link
Member

jklaise commented Jul 16, 2021

I see, I can reproduce the issue now when dill is not present in the loading environment.

@jklaise
Copy link
Member

jklaise commented Jul 16, 2021

The culprit is this function: https://github.com/uqfoundation/dill/blob/4bdc5dfa501c536aa9b7415b13c58947e6409773/dill/__init__.py#L95

Basically, just by importing dill, the pickle dispatch table is populated with types that dill understands how to serialize/deserialize and using pickle or joblib those types are then "intercepted" by dill functions. So if dill is not present in a future environment, loading will fail.

A solution would be to do the following in alibi (and alibi-detect):

import dill
from dill import extend
extend(use_dill=False)

However, we should check if after calling the above, the using the dill public API can still serialize a strict superset of types that pickle can. It's possible the extension side-effect is necessary for dill function. In that case we would need a context manager whenever calling dill.dump/load that temporarily switched to use_dill=True.

Edit: Looking at the following line it might be the case that indeed dill.dump uses the stock pickle Pickler, so if use_dill=False I would not expect dill to work: https://github.com/uqfoundation/dill/blob/4bdc5dfa501c536aa9b7415b13c58947e6409773/dill/_dill.py#L498

@jklaise
Copy link
Member

jklaise commented Jul 21, 2021

Inspecting the source code and the dispatch tables I've confirmed that calling dill.extend(use_dill=False) reverts the pickle behaviour to what's expected from the standard library whilst keeping dill functionality intact as the dispatch table of dill's Pickler is unaffected.

It feels prudent to call dill.extend(use_dill=False) in library usage immediately after importing dill so as not to have unintended side-effects on users, consider doing this for tempo and wherever else we use dill @cliveseldon @adriangonz .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants