Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ophyd-async devices give bad error when not able to connect #223

Closed
DominicOram opened this issue Oct 29, 2023 · 5 comments · Fixed by #393
Closed

Ophyd-async devices give bad error when not able to connect #223

DominicOram opened this issue Oct 29, 2023 · 5 comments · Fixed by #393

Comments

@DominicOram
Copy link
Contributor

If I have an ophyd-async device where one of the PVs cannot connect I get the following error when I run the system tests (i.e. through make_all_devices):

Traceback (most recent call last):
  File "/scratch/ffv81422/hyperion/dodal/tests/system_tests/test_i03_system.py", line 23, in <module>
    make_all_devices(i03)
  File "/scratch/ffv81422/hyperion/dodal/src/dodal/utils.py", line 125, in make_all_devices
    devices: dict[str, AnyDevice] = invoke_factories(factories, **kwargs)
  File "/scratch/ffv81422/hyperion/dodal/src/dodal/utils.py", line 150, in invoke_factories
    devices[dependent_name] = factories[dependent_name](**params, **kwargs)
  File "/scratch/ffv81422/hyperion/dodal/src/dodal/beamlines/i03.py", line 160, in smargon
    return device_instantiation(
  File "/scratch/ffv81422/hyperion/dodal/src/dodal/utils.py", line 98, in wrapper
    return func(*args, **kwds)
  File "/scratch/ffv81422/hyperion/dodal/src/dodal/beamlines/beamline_utils.py", line 105, in device_instantiation
    _wait_for_connection(device_instance, sim=fake)
  File "/scratch/ffv81422/hyperion/dodal/src/dodal/beamlines/beamline_utils.py", line 51, in _wait_for_connection
    call_in_bluesky_event_loop(
  File "/scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/bluesky/run_engine.py", line 2649, in call_in_bluesky_event_loop
    return fut.result(timeout=timeout)
  File "/dls_sw/apps/python/miniforge/4.10.0-0/envs/python3.10/lib/python3.10/concurrent/futures/_base.py", line 460, in result
    raise TimeoutError()
concurrent.futures._base.TimeoutError
Task was destroyed but it is pending!
task: <Task pending name='Task-2' coro=<Device.connect() done, defined at /scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/ophyd_async/core/_device/device.py:48> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-1' coro=<wait_for_connection() done, defined at /scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/ophyd_async/core/utils.py:26> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_chain_future.<locals>._call_set_state() at /dls_sw/apps/python/miniforge/4.10.0-0/envs/python3.10/lib/python3.10/asyncio/futures.py:392]>
Task was destroyed but it is pending!
task: <Task pending name='Task-5' coro=<Device.connect() done, defined at /scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/ophyd_async/core/_device/device.py:48> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-27' coro=<Signal.connect() done, defined at /scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/ophyd_async/core/_device/_signal/signal.py:59> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-28' coro=<Signal.connect() done, defined at /scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/ophyd_async/core/_device/_signal/signal.py:59> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-29' coro=<Signal.connect() done, defined at /scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/ophyd_async/core/_device/_signal/signal.py:59> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-30' coro=<Signal.connect() done, defined at /scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/ophyd_async/core/_device/_signal/signal.py:59> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-31' coro=<Signal.connect() done, defined at /scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/ophyd_async/core/_device/_signal/signal.py:59> wait_for=<Future pending cb=[Task.task_wakeup()]>>
Task was destroyed but it is pending!
task: <Task pending name='Task-32' coro=<Signal.connect() done, defined at /scratch/ffv81422/hyperion/hyperion/.venv/lib/python3.10/site-packages/ophyd_async/core/_device/_signal/signal.py:59> wait_for=<Future pending cb=[Task.task_wakeup()]>>

In this case there was a Motor that had an incorrect PV. The error has two issues:

  • It contains no hint of which signal failed
  • It has a bunch of later errors around task not being destroyed that confuse things (less important)

Note this could be an error internal to ophyd-async

Acceptance Criteria

  • The error is better in this case
@DominicOram
Copy link
Contributor Author

FYI @coretl, this might stop us from using ophyd-async on the beamline as we won't be able to diagnose things

@coretl
Copy link
Collaborator

coretl commented Nov 1, 2023

This needs @callumforrester and @rosesyrett as well. I don't know what the factories in dodal do with device connection. In DeviceCollector in ophyd-async there is some knowledge for pretty-printing the signals that didn't connect, but it may have some issues that @callumforrester identified, and the generalised version of it was going to make it into dodal at some point...

@callumforrester
Copy link
Contributor

Hmm, doesn't ophyd-async log extra information? @DominicOram are you running pytest with logging enabled?

@DominicOram
Copy link
Contributor Author

The system tests don't actually run through pytest. They don't enable any specific logging either. Even if we get additional logs I don't think that fixes the issue though? I feel like the Exception still needs to contain what timed out

@callumforrester
Copy link
Contributor

Tom and I discussed, see linked issue above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants