Make sure subprocess/process/thread/dask can create a runner instance #816

Delaunay · 2022-03-01T18:37:25Z

Hi there! Thank you for contributing. Feel free to use this pull request template; It helps us reviewing your work at its true value.

Please remove the instructions in italics before posting the pull request :).

Description

Describe what is the purpose of this pull request and why it should be integrated to the repository.
When your changes modify the current behavior, explain why your solution is better.

If it solves a GitHub issue, be sure to link it.

Changes

Give an overview of the suggested solution.

Checklist

This is simply a reminder of what we are going to look for before merging your code.

Add an x in the boxes that apply.
If you're unsure about any of them, don't hesitate to ask. We're here to help!
You can also fill these out after creating the PR if it's a work in progress (be sure to publish the PR as a draft then)

Tests

I added corresponding tests for bug fixes and new features. If possible, the tests fail without the changes
All new and existing tests are passing ($ tox -e py38; replace 38 by your Python version if necessary)

Documentation

I have updated the relevant documentation related to my changes

Quality

I have read the CONTRIBUTING doc
My commits messages follow this format
My code follows the style guidelines ($ tox -e lint)

Further comments

Please include any additional information or comment that you feel will be helpful to the review of this pull request.

Delaunay · 2022-03-01T18:38:49Z

#807

bouthilx

Thanks!

Delaunay · 2022-03-03T17:30:43Z

@jbeirer I think I was able to reproduce your issue (#807)
But you will receive another exception once that part of the code is reached.
Was your code working before the Runner refactoring ?

Could you provide a minimal example that reproduces the issue ?
I tried to do so in that test but I might be missing some
subtleties .

New Exception

   src/orion/executor/dask_backend.py:32: in get
      return self.future.result(timeout)
  .tox/py/lib/python3.8/site-packages/distributed/client.py:274: in result
      raise exc.with_traceback(tb)
  tests/unittests/client/test_runner.py:547: in run_runner
      runner = new_runner(0.1, n_workers=workers, backend=backend)
  tests/unittests/client/test_runner.py:107: in new_runner
      client = FakeClient(n_workers, backend=backend)
  tests/unittests/client/test_runner.py:53: in __init__
      self.executor = executor_factory.create(backend, n_workers)
  src/orion/core/utils/__init__.py:129: in create
      return constructor(*args, **kwargs)
  src/orion/executor/dask_backend.py:63: in __init__
      client = Client(**self.config)
  .tox/py/lib/python3.8/site-packages/distributed/client.py:933: in __init__
      self.start(timeout=timeout)
  .tox/py/lib/python3.8/site-packages/distributed/client.py:1091: in start
      sync(self.loop, self._start, **kwargs)
  .tox/py/lib/python3.8/site-packages/distributed/utils.py:378: in sync
      raise exc.with_traceback(tb)
  .tox/py/lib/python3.8/site-packages/distributed/utils.py:351: in f
      result = yield future
  .tox/py/lib/python3.8/site-packages/tornado/gen.py:762: in run
      value = future.result()
  .tox/py/lib/python3.8/site-packages/distributed/client.py:1158: in _start
      self.cluster = await LocalCluster(
  .tox/py/lib/python3.8/site-packages/distributed/deploy/spec.py:383: in _
      await self._correct_state()
  .tox/py/lib/python3.8/site-packages/distributed/deploy/spec.py:352: in _correct_state_internal
      await w  # for tornado gen.coroutine support
  .tox/py/lib/python3.8/site-packages/distributed/core.py:297: in _
      await self.start()
  .tox/py/lib/python3.8/site-packages/distributed/nanny.py:334: in start
      response = await self.instantiate()
  .tox/py/lib/python3.8/site-packages/distributed/nanny.py:417: in instantiate
      result = await self.process.start()
  .tox/py/lib/python3.8/site-packages/distributed/nanny.py:687: in start
      await self.process.start()
  .tox/py/lib/python3.8/site-packages/distributed/process.py:32: in _call_and_set_future
      res = func(*args, **kwargs)
  .tox/py/lib/python3.8/site-packages/distributed/process.py:186: in _start
      process.start()
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  
  >   assert not _current_process._config.get('daemon'), \
             'daemonic processes are not allowed to have children'
  E   AssertionError: daemonic processes are not allowed to have children
  
  /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/multiprocessing/process.py:118: AssertionError

Original Issue

 ----------------------------- Captured stderr call -----------------------------
  Traceback (most recent call last):
    File "/home/runner/work/orion/orion/tests/unittests/client/test_runner.py", line 557, in run_runner
      runner.run()
    File "/home/runner/work/orion/orion/src/orion/client/runner.py", line 239, in run
      with Protected():
    File "/home/runner/work/orion/orion/src/orion/client/runner.py", line 50, in __enter__
      self.handlers[signal.SIGINT] = signal.signal(signal.SIGINT, self.handler)
    File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/signal.py", line 47, in signal
      handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
  ValueError: signal only works in main thread

jbeirer · 2022-03-03T18:27:24Z

Hi @Delaunay,

thanks for looking into this. Indeed, I did not have these kind of issues in the previous 0.2.1 version. The most minimal reproducer I can come up with in order to reproduce #807 is running two experiments in parallel:

# Orion includes
from orion.executor.base import executor_factory
from distributed import Client
from orion.client import build_experiment

# Dask includes
import dask
import dask.distributed

# Dummy method
def main():
    return [ {"name": "error", "type": "objective", "value": 1}]

if __name__ == '__main__':

  ########################################################################
  ### Assumes cluster, space, storage and algorithm variables are set ###
  ########################################################################

  # Set number of trials and workers
  nMaxTrials = 1
  nWorkers = 1 

  # Initialize client
  client  = Client(cluster) 
  cluster.scale(nWorkers)

  # Create DASK executor
  executor = executor_factory.create('dask', n_workers = nWorkers, client = client)

  # Encapsulate experiment execution in method to parellelise experiments
  def exec_experiment(executor, nWorkers, experiment):
      experiment.workon(main, n_workers = nWorkers)

  # Build two experiments to run in parallel
  experimentNames = ['experiment1', 'experiment2']
  exp_futures = []
  for experimentName in experimentNames:
      experiment = build_experiment(name = name, max_trials  = nMaxTrials, space = space, storage = storage, algorithms = algorithm, executor = executor)
      future     = client.submit(exec_experiment, executor, nWorkers, experiment)
      exp_futures.append(future)

  # Gather results
  for exp_future in exp_futures:
      client.gather(exp_future)

Hope this helps!

Cheers,

Joshua

Delaunay · 2022-03-03T19:18:19Z

@jbeirer I think I identified the root cause, the issue should be resolved now

Make sure subprocess/process/thread/dask can create a runner instance

1f032e7

bouthilx approved these changes Mar 1, 2022

View reviewed changes

Delaunay added 8 commits March 1, 2022 14:11

-

4077f12

-

1c8a4a4

Add missing file

4e6ddfa

-

470cc82

-

ccd8cc2

-

0f628bd

-

341484a

-

5da8211

Delaunay force-pushed the issue_807 branch 7 times, most recently from d3dbfe7 to 29ea940 Compare March 2, 2022 18:16

-

c9e306e

Delaunay force-pushed the issue_807 branch from 29ea940 to c9e306e Compare March 2, 2022 18:19

-

1e1f8bb

Delaunay added 3 commits March 3, 2022 12:32

-

8412e63

-

a7e7137

-

161b0b2

Delaunay added 2 commits March 3, 2022 13:29

-

b79bda2

-

c0bdc95

Delaunay merged commit 6a3f9be into Epistimio:develop Mar 3, 2022

Delaunay deleted the issue_807 branch March 3, 2022 19:16

bouthilx mentioned this pull request Mar 7, 2022

Release candidate v0.2.3rc1 #824

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure subprocess/process/thread/dask can create a runner instance #816

Make sure subprocess/process/thread/dask can create a runner instance #816

Delaunay commented Mar 1, 2022

Delaunay commented Mar 1, 2022

bouthilx left a comment

Delaunay commented Mar 3, 2022 •

edited

jbeirer commented Mar 3, 2022

Delaunay commented Mar 3, 2022

Make sure subprocess/process/thread/dask can create a runner instance #816

Make sure subprocess/process/thread/dask can create a runner instance #816

Conversation

Delaunay commented Mar 1, 2022

Description

Changes

Checklist

Tests

Documentation

Quality

Further comments

Delaunay commented Mar 1, 2022

bouthilx left a comment

Choose a reason for hiding this comment

Delaunay commented Mar 3, 2022 • edited

New Exception

Original Issue

jbeirer commented Mar 3, 2022

Delaunay commented Mar 3, 2022

Delaunay commented Mar 3, 2022 •

edited