Skip to content

Add option to wait for domain shutdown#469

Draft
ben-grande wants to merge 6 commits into
QubesOS:mainfrom
ben-grande:long-shutdown
Draft

Add option to wait for domain shutdown#469
ben-grande wants to merge 6 commits into
QubesOS:mainfrom
ben-grande:long-shutdown

Conversation

@ben-grande
Copy link
Copy Markdown
Contributor

@ben-grande ben-grande commented May 6, 2026

By waiting for the full shutdown, it becomes possible to receive appropriate exception, that would allow adapting the client to react differently. A timeout can be interpreted and the method adjusted to kill the domain.

For: QubesOS/qubes-issues#10835
Requires: QubesOS/qubes-core-admin#807


Adding wait parameter to qubesadmin.tools.qvm_shutdown doesn't work well when specifying multiple domains. I'm thinking of a better way to handle it. I thought of looping the domains to create tasks for each of them with run_in_executor and asyncio.gather at last, but that will require changing the qvm_shutdown a bit.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.84%. Comparing base (506bee2) to head (40f8633).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #469      +/-   ##
==========================================
+ Coverage   76.82%   76.84%   +0.01%     
==========================================
  Files          53       53              
  Lines        9361     9368       +7     
==========================================
+ Hits         7192     7199       +7     
  Misses       2169     2169              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ben-grande
Copy link
Copy Markdown
Contributor Author

ben-grande commented May 8, 2026

6ebacfb

It works, half of it, at least.

This will make some tests fails because it wasn't updated to have wait in the argument. But I am not sure if I will continue with the current method, since it currently blocks exiting the interpreter until the api call ceases (unblocks the thread). It is not "dangerous", but an inconvenience and might break expectations. For more information, please read the big "TODO" on top of "ThreadPoolexecutor".

The future.cancel() is False, as well as future.cancelled(). The future.running() is True. All this means is that it cannot be cancelled.

@ben-grande
Copy link
Copy Markdown
Contributor Author

I tried simper things and had the same problem.

    loop = asyncio.get_event_loop()
    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
        for qube in domains:
            future = loop.run_in_executor(
               executor, partial(qube.shutdown, force=force, wait=args.wait)
            )
            tasks.append(future)
        loop.run_until_complete(asyncio.wait_for(asyncio.gather(*tasks), args.timeout))
async def main(...):

    ...

    loop = asyncio.get_running_loop()
    tasks = []
    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
        for qube in domains:
            future = loop.run_in_executor(executor, qube.shutdown)
            tasks.append(future)
        async with asyncio.timeout(args.timeout):
            await asyncio.gather(*tasks)
    return

And change to asyncio.run(main()) on /usr/bin/qvm-shutdown.

Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/qubesadmin/tools/qvm_shutdown.py", line 111, in main
    await asyncio.gather(*tasks)
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/qubesadmin/tools/qvm_shutdown.py", line 110, in main
    async with asyncio.timeout(args.timeout):
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/asyncio/timeouts.py", line 116, in __aexit__
    raise TimeoutError from exc_val
TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/qvm-shutdown", line 7, in <module>
    sys.exit(asyncio.run(main()))
             ~~~~~~~~~~~^^^^^^^^
  File "/usr/lib64/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/usr/lib64/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/usr/lib64/python3.13/asyncio/base_events.py", line 712, in run_until_complete
    self.run_forever()
    ~~~~~~~~~~~~~~~~^^
  File "/usr/lib64/python3.13/asyncio/base_events.py", line 683, in run_forever
    self._run_once()
    ~~~~~~~~~~~~~~^^
  File "/usr/lib64/python3.13/asyncio/base_events.py", line 2050, in _run_once
    handle._run()
    ~~~~~~~~~~~^^
  File "/usr/lib64/python3.13/asyncio/events.py", line 89, in _run
    self._context.run(self._callback, *self._args)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/qubesadmin/tools/qvm_shutdown.py", line 102, in main
    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/concurrent/futures/_base.py", line 647, in __exit__
    self.shutdown(wait=True)
    ~~~~~~~~~~~~~^^^^^^^^^^^
  File "/usr/lib64/python3.13/concurrent/futures/thread.py", line 239, in shutdown
    t.join()
    ~~~~~~^^
  File "/usr/lib64/python3.13/threading.py", line 1094, in join
    self._handle.join(timeout)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "/usr/lib64/python3.13/asyncio/runners.py", line 157, in _on_sigint
    raise KeyboardInterrupt()
KeyboardInterrupt

^CException ignored on threading shutdown:
Traceback (most recent call last):
  File "/usr/lib64/python3.13/threading.py", line 1536, in _shutdown
    atexit_call()
  File "/usr/lib64/python3.13/threading.py", line 1507, in <lambda>
    _threading_atexits.append(lambda: func(*arg, **kwargs))
  File "/usr/lib64/python3.13/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/usr/lib64/python3.13/threading.py", line 1094, in join
    self._handle.join(timeout)
KeyboardInterrupt:

@ben-grande
Copy link
Copy Markdown
Contributor Author

@marmarek
Copy link
Copy Markdown
Member

Maybe better to really do QubesOS/qubes-issues#4719 first?

@ben-grande
Copy link
Copy Markdown
Contributor Author

Maybe better to really do QubesOS/qubes-issues#4719 first?

Yes... I read some alternatives, but they are dirty: https://superfastpython.com/threadpoolexecutor-kill-running-tasks/

@ben-grande
Copy link
Copy Markdown
Contributor Author

But I am not sure if I will continue with the current method, since it currently blocks exiting the interpreter until the api call ceases (unblocks the thread).

Technically, this is also the current behavior. With the current code (without this PR), qvm-shutdown --timeout=3 HANGING_QUBE will not cease the connection.

ben-grande added a commit to ben-grande/qubes-manager that referenced this pull request May 12, 2026
The merge request to core-admin already avoids the main problem, of a hanging
shutdown blocking qubesd. This change avoids a hanging shutdown blocking the
interaction with the qube manager

For: QubesOS/qubes-issues#10648
Fixes: QubesOS/qubes-issues#10074
Requires: QubesOS/qubes-core-admin#807
Requires: QubesOS/qubes-core-admin-client#469
By waiting for the full shutdown, it becomes possible to receive
appropriate exception, that would allow adapting the client to react
differently. A timeout can be interpreted and the method adjusted to
kill the domain.

For: QubesOS/qubes-issues#10835
ben-grande added a commit to ben-grande/qubes-desktop-linux-manager that referenced this pull request May 13, 2026
Waiting for the server to send the result of the API calls, allows
dealing with failures/exceptions:

- QubesVM.shutdown -> .shutdown(wait=True)
- QubesVM.run_service -> .run_service_for_stdio

It also does not block the widget from being opened again even if an
action is still running. This allows, for example:

- To create disposable qube and attach device to it (which takes 10s),
  but not block the widget from opening
- To detach and shutdown disposable that is hanging, without hanging the
  widget

For qui/tray/domains.py, the "widget.destroy()" is called earlier, cause
it's not needed after the response is received. Else it hangs until the
"react_to_question" finishes.

For: QubesOS/qubes-issues#10648
For: QubesOS/qubes-issues#10651
For: QubesOS/qubes-issues#10835
Requires: QubesOS/qubes-core-admin#807
Requires: QubesOS/qubes-core-admin-client#469
- Avoids extra calls that might not be available
- Gather returned exception to raise appropriate message and act
  accordingly
@ben-grande
Copy link
Copy Markdown
Contributor Author

ben-grande commented May 13, 2026

The last commit, b544cb0, probably requires adapting the file created in qvm-shutdown to call asyncio.run(main()).

I will try again to not require the main() to be an async function, and pass the loop object... another day.

I tried to not require async functions, by passing the loop from
template_postprocess to qvm_shutdown.main, but failed with:

        RuntimeError: This event loop is already running
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants