Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing in notebooks evaluated with autograde #24

Open
Feelx234 opened this issue Feb 11, 2021 · 2 comments
Open

Multiprocessing in notebooks evaluated with autograde #24

Feelx234 opened this issue Feb 11, 2021 · 2 comments

Comments

@Feelx234
Copy link
Contributor

Feelx234 commented Feb 11, 2021

I found sth weird when multiprocessing is used with autograde. This is not a major concern just sth too keep in mind when moving forward.

Lets say a notebook contains roughly the following:

from multiprocessing import Pool

class Multiplier:
   def __init__(self, val):
       self.val = val
  def mult(self, x):
       return x *  self.val

def mult_by3_multiprocessing(x_list):
       cls = Multiplier(3)
       pool = Pool(2)
       _out = pool.map(cls.mult,x_list)
       pool.close()
      return _out

Now assume we want to test the mult_by3... function.
When calling the mult_by3 function there will be an error in the evaluation of the test complaining about some pickling errors.
Problem is, that pickle can't find the Multiplier class because it is defined in the state_dict of the notebook rather than the state_dict of the test.py. For context internally objects are shipped to other processed using pickle when using multiprocessing.

Just something to keep in mind

@0b11001111
Copy link
Contributor

0b11001111 commented Feb 11, 2021

Your example can be boiled down to

CODE = """

from multiprocessing import Pool

def square(x):
    return x ** 2

def parallel_square(x_list):
    with Pool(3) as pool:
        return pool.map(square, x_list)
        
print('secret:', globals().get('secret'))
print(parallel_square([1, 2, 3]))

"""

secret = 42

if __name__ == '__main__':
    # same as exec(CODE), works as expected but leaks current scope
    exec(CODE, globals())

    # crashes
    exec(CODE, dict())

which gives

secret: 42
[1, 4, 9]

secret: None
Traceback (most recent call last):
  File "/tmp/playground.py", line 24, in <module>
    exec(CODE, dict())
  File "<string>", line 13, in <module>
  File "<string>", line 10, in parallel_square
  File "/usr/lib64/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib64/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib64/python3.9/multiprocessing/pool.py", line 537, in _handle_tasks
    put(task)
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/lib64/python3.9/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function square at 0x7f2044e9bc10>: it's not the same object as __main__.square

I found this thread explaining the same(?) class of issues.

Right now, I don't see a way for solving this class of error in general. Interesting topic though, I'll keep investigating ;)

@0b11001111
Copy link
Contributor

I'll extend the notebook template with a respective note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants