Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure subprocess/process/thread/dask can create a runner instance #816

Merged
merged 16 commits into from
Mar 3, 2022

Conversation

Delaunay
Copy link
Collaborator

@Delaunay Delaunay commented Mar 1, 2022

Hi there! Thank you for contributing. Feel free to use this pull request template; It helps us reviewing your work at its true value.

Please remove the instructions in italics before posting the pull request :).

Description

Describe what is the purpose of this pull request and why it should be integrated to the repository.
When your changes modify the current behavior, explain why your solution is better.

If it solves a GitHub issue, be sure to link it.

Changes

Give an overview of the suggested solution.

Checklist

This is simply a reminder of what we are going to look for before merging your code.

Add an x in the boxes that apply.
If you're unsure about any of them, don't hesitate to ask. We're here to help!
You can also fill these out after creating the PR if it's a work in progress (be sure to publish the PR as a draft then)

Tests

  • I added corresponding tests for bug fixes and new features. If possible, the tests fail without the changes
  • All new and existing tests are passing ($ tox -e py38; replace 38 by your Python version if necessary)

Documentation

  • I have updated the relevant documentation related to my changes

Quality

  • I have read the CONTRIBUTING doc
  • My commits messages follow this format
  • My code follows the style guidelines ($ tox -e lint)

Further comments

Please include any additional information or comment that you feel will be helpful to the review of this pull request.

@Delaunay
Copy link
Collaborator Author

Delaunay commented Mar 1, 2022

#807

Copy link
Member

@bouthilx bouthilx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Delaunay Delaunay force-pushed the issue_807 branch 7 times, most recently from d3dbfe7 to 29ea940 Compare March 2, 2022 18:16
@Delaunay
Copy link
Collaborator Author

Delaunay commented Mar 3, 2022

@jbeirer I think I was able to reproduce your issue (#807)
But you will receive another exception once that part of the code is reached.
Was your code working before the Runner refactoring ?

Could you provide a minimal example that reproduces the issue ?
I tried to do so in that test but I might be missing some
subtleties .

New Exception

   src/orion/executor/dask_backend.py:32: in get
      return self.future.result(timeout)
  .tox/py/lib/python3.8/site-packages/distributed/client.py:274: in result
      raise exc.with_traceback(tb)
  tests/unittests/client/test_runner.py:547: in run_runner
      runner = new_runner(0.1, n_workers=workers, backend=backend)
  tests/unittests/client/test_runner.py:107: in new_runner
      client = FakeClient(n_workers, backend=backend)
  tests/unittests/client/test_runner.py:53: in __init__
      self.executor = executor_factory.create(backend, n_workers)
  src/orion/core/utils/__init__.py:129: in create
      return constructor(*args, **kwargs)
  src/orion/executor/dask_backend.py:63: in __init__
      client = Client(**self.config)
  .tox/py/lib/python3.8/site-packages/distributed/client.py:933: in __init__
      self.start(timeout=timeout)
  .tox/py/lib/python3.8/site-packages/distributed/client.py:1091: in start
      sync(self.loop, self._start, **kwargs)
  .tox/py/lib/python3.8/site-packages/distributed/utils.py:378: in sync
      raise exc.with_traceback(tb)
  .tox/py/lib/python3.8/site-packages/distributed/utils.py:351: in f
      result = yield future
  .tox/py/lib/python3.8/site-packages/tornado/gen.py:762: in run
      value = future.result()
  .tox/py/lib/python3.8/site-packages/distributed/client.py:1158: in _start
      self.cluster = await LocalCluster(
  .tox/py/lib/python3.8/site-packages/distributed/deploy/spec.py:383: in _
      await self._correct_state()
  .tox/py/lib/python3.8/site-packages/distributed/deploy/spec.py:352: in _correct_state_internal
      await w  # for tornado gen.coroutine support
  .tox/py/lib/python3.8/site-packages/distributed/core.py:297: in _
      await self.start()
  .tox/py/lib/python3.8/site-packages/distributed/nanny.py:334: in start
      response = await self.instantiate()
  .tox/py/lib/python3.8/site-packages/distributed/nanny.py:417: in instantiate
      result = await self.process.start()
  .tox/py/lib/python3.8/site-packages/distributed/nanny.py:687: in start
      await self.process.start()
  .tox/py/lib/python3.8/site-packages/distributed/process.py:32: in _call_and_set_future
      res = func(*args, **kwargs)
  .tox/py/lib/python3.8/site-packages/distributed/process.py:186: in _start
      process.start()
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  
  >   assert not _current_process._config.get('daemon'), \
             'daemonic processes are not allowed to have children'
  E   AssertionError: daemonic processes are not allowed to have children
  
  /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/multiprocessing/process.py:118: AssertionError

Original Issue

 ----------------------------- Captured stderr call -----------------------------
  Traceback (most recent call last):
    File "/home/runner/work/orion/orion/tests/unittests/client/test_runner.py", line 557, in run_runner
      runner.run()
    File "/home/runner/work/orion/orion/src/orion/client/runner.py", line 239, in run
      with Protected():
    File "/home/runner/work/orion/orion/src/orion/client/runner.py", line 50, in __enter__
      self.handlers[signal.SIGINT] = signal.signal(signal.SIGINT, self.handler)
    File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/signal.py", line 47, in signal
      handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
  ValueError: signal only works in main thread

@jbeirer
Copy link

jbeirer commented Mar 3, 2022

Hi @Delaunay,

thanks for looking into this. Indeed, I did not have these kind of issues in the previous 0.2.1 version. The most minimal reproducer I can come up with in order to reproduce #807 is running two experiments in parallel:

# Orion includes
from orion.executor.base import executor_factory
from distributed import Client
from orion.client import build_experiment

# Dask includes
import dask
import dask.distributed

# Dummy method
def main():
    return [ {"name": "error", "type": "objective", "value": 1}]

if __name__ == '__main__':

  ########################################################################
  ### Assumes cluster, space, storage and algorithm variables are set ###
  ########################################################################

  # Set number of trials and workers
  nMaxTrials = 1
  nWorkers = 1 

  # Initialize client
  client  = Client(cluster) 
  cluster.scale(nWorkers)

  # Create DASK executor
  executor = executor_factory.create('dask', n_workers = nWorkers, client = client)

  # Encapsulate experiment execution in method to parellelise experiments
  def exec_experiment(executor, nWorkers, experiment):
      experiment.workon(main, n_workers = nWorkers)

  # Build two experiments to run in parallel
  experimentNames = ['experiment1', 'experiment2']
  exp_futures = []
  for experimentName in experimentNames:
      experiment = build_experiment(name = name, max_trials  = nMaxTrials, space = space, storage = storage, algorithms = algorithm, executor = executor)
      future     = client.submit(exec_experiment, executor, nWorkers, experiment)
      exp_futures.append(future)

  # Gather results
  for exp_future in exp_futures:
      client.gather(exp_future)

Hope this helps!

Cheers,

Joshua

@Delaunay Delaunay merged commit 6a3f9be into Epistimio:develop Mar 3, 2022
@Delaunay Delaunay deleted the issue_807 branch March 3, 2022 19:16
@Delaunay
Copy link
Collaborator Author

Delaunay commented Mar 3, 2022

@jbeirer I think I identified the root cause, the issue should be resolved now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants