New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispVM halts prematurely with custom .desktop file involving `qrexec-client-vm` #3318

Open
joshuathayer opened this Issue Nov 17, 2017 · 7 comments

Comments

Projects
None yet
3 participants
@joshuathayer

Qubes OS version:

4.0rc2 (dom0 and templates up to date with testing repos)

Affected TemplateVMs:

At least fedora-25


Steps to reproduce the behavior:

I have a situation where I'm trying to open a particular file format in a disposable VM, from a (Whonix-based) AppVM, eg:

$ qvm-open-in-vm '$dispvm:work' ./foo.custom-ext

In my work VM I've configured xdg to associate .custom-ext to a python script. That script processes the given file, and as a side effect there's a call to qrexec-client-vm, eg:

...
p = subprocess.Popen(["qrexec-client-vm", "target-vm", "custom.Process"], stdin=subprocess.Pipe)
p.communicate("message")
...

Expected behavior:

I'd expect:

  • a disposable VM based on my work VM to start
  • my script to start running
  • the RPC call in the script to successfully complete
  • the script to continue processing until it exits
  • finally, the dispVM to shut down

Actual behavior:

My script seems to run as far as the call to qrexec (which completes successfully), and then the entire VM gets shut down. While the machine is shutting down, my script is still running: based on writes to STDERR, it's able to make some inconsistent number of lines before the machine is down.

General notes:

Removing the Popen call from my script allows it to run and complete normally.

Running the command on a non-disposable VM allows the script to run and complete normally, eg:

$ qvm-open-in-vm work ./foo.custom-ext

works great.

I suspected the issue may have had something to do with my script's STDIN/OUT/ERR being passed to the spawned child, but the following didn't change behavior at all:

...
fd = os.open("/dev/null", os.O_RDWR)
p = subprocess.Popen(["qrexec-client-vm", "target-vm", "custom.Process"],
      stdin=subprocess.Pipe, stdout=fd, stderr=fd)
...

For debugging, i tried making other calls which involve a fork() in my processing script, and those calls seemed to complete without error and without causing the VM to halt. Only the call to qrexec-client-vm caused the machine to halt.

Also for debugging I tried not using a pipe for stdin (instead using the same /dev/null filehandle), and that also didn't change matters.

I'm stumped. I'm not sure this is a Qubes bug as much as my not understanding something, but I do expect the behavior I noted above. Thanks for taking a look!


Related issues:

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 18, 2017

Member

Just to be sure - have you checked the script isn't simply crashing? For example it should be subprocess.PIPE, not subprocess.Pipe. Have you tried running this in non-DisposableVM first?

Member

marmarek commented Nov 18, 2017

Just to be sure - have you checked the script isn't simply crashing? For example it should be subprocess.PIPE, not subprocess.Pipe. Have you tried running this in non-DisposableVM first?

@andrewdavidwong andrewdavidwong added this to the Release 4.0 milestone Nov 18, 2017

@joshuathayer

This comment has been minimized.

Show comment
Hide comment
@joshuathayer

joshuathayer Nov 18, 2017

@marmarek, yes, I've run this on a non-dispVM (the "base" VM the dispVM is based on) and it runs without problem.

And right, subprocess.PIPE is correct... sorry for the typo here. It's correct in my script. This code is public (part of this SecureDrop project), so I'll commit the branch I'm working on tomorrow and link to the code in question here.

@marmarek, yes, I've run this on a non-dispVM (the "base" VM the dispVM is based on) and it runs without problem.

And right, subprocess.PIPE is correct... sorry for the typo here. It's correct in my script. This code is public (part of this SecureDrop project), so I'll commit the branch I'm working on tomorrow and link to the code in question here.

@joshuathayer

This comment has been minimized.

Show comment
Hide comment
@joshuathayer

joshuathayer Nov 20, 2017

I've come up with a minimal example of this behavior. It seems to be triggered when making qrexec calls to a disposable target VM, where the target VM then makes its own RPC calls back to the source VM in the course of handling the initial RPC call.

In this example we're going to make an RPC call from personal to a disposable VM based on work. On personal, we'll run qrexec-client-vm work bug.PartOne, then the invoked script will call qrexec-client-vm personal bug.PartTwo. We'll see that while that process works fine on a non-disposable VM, it fails when the target VM is disposable.

On work, create /rw/config/bug.PartOne with the following, and chmod 0755 it:

#!/usr/bin/python

import subprocess
import sys
import time

sys.stderr.write("Hello from bug-part-one. Going to qrexec bug.PartTwo now\n")

subprocess.Popen(["qrexec-client-vm","personal","bug.PartTwo"])

i = 0
while i < 10:
  time.sleep(1)
  subprocess.Popen(["qrexec-client-vm","personal","bug.PartTwo"])
  i += 1     

Add the following to /rw/config/rc.local on work:

ln -s /rw/config/bug.PartOne /etc/qubes-rpc/

And also run that command in your running work VM.

On personal, create /rw/config/bug.PartTwo with the following, and chmod 0755 it:

#!/usr/bin/python

import sys

sys.stderr.write("Hello from bug part two!\n")

Although it's not strictly necessary for this demo, to /rw/config/rc.local on personal add:

ln -s /rw/config/bug.PartTwo /etc/qubes-rpc/

and also run that command on your running personal VM.

On dom0, create /etc/qubes-rpc/policy/bug.PartOne and /etc/qubes-rpc/policy/bug.PartTwo with the following:

$anyvm $anyvm allow

And allow work to be a dispvm template:

$ qvm-prefs work template-for-dispvms True

Now we'll all set up. On personal, run:

$ qrexec-client-vm 'work' bug.PartOne

You should see the expected output:

$ qrexec-client-vm 'work' bug.PartOne
Hello from bug-part-one. Going to qrexec bug.PartTwo now
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
$ _

OK, that's what we expect. Now run the same thing on a disposable VM using work as a template:

[user@personal ~]$ qrexec-client-vm '$dispvm:work' bug.PartOne
Hello from bug-part-one. Going to qrexec bug.PartTwo now
Hello from bug part two!

The call never returns, and the disposable VM that was created has crashed.

I believe that if bug.PartTwo is run from the disposable VM targeting a different VM from the initial source (devel, for example), the process runs fine.

I've come up with a minimal example of this behavior. It seems to be triggered when making qrexec calls to a disposable target VM, where the target VM then makes its own RPC calls back to the source VM in the course of handling the initial RPC call.

In this example we're going to make an RPC call from personal to a disposable VM based on work. On personal, we'll run qrexec-client-vm work bug.PartOne, then the invoked script will call qrexec-client-vm personal bug.PartTwo. We'll see that while that process works fine on a non-disposable VM, it fails when the target VM is disposable.

On work, create /rw/config/bug.PartOne with the following, and chmod 0755 it:

#!/usr/bin/python

import subprocess
import sys
import time

sys.stderr.write("Hello from bug-part-one. Going to qrexec bug.PartTwo now\n")

subprocess.Popen(["qrexec-client-vm","personal","bug.PartTwo"])

i = 0
while i < 10:
  time.sleep(1)
  subprocess.Popen(["qrexec-client-vm","personal","bug.PartTwo"])
  i += 1     

Add the following to /rw/config/rc.local on work:

ln -s /rw/config/bug.PartOne /etc/qubes-rpc/

And also run that command in your running work VM.

On personal, create /rw/config/bug.PartTwo with the following, and chmod 0755 it:

#!/usr/bin/python

import sys

sys.stderr.write("Hello from bug part two!\n")

Although it's not strictly necessary for this demo, to /rw/config/rc.local on personal add:

ln -s /rw/config/bug.PartTwo /etc/qubes-rpc/

and also run that command on your running personal VM.

On dom0, create /etc/qubes-rpc/policy/bug.PartOne and /etc/qubes-rpc/policy/bug.PartTwo with the following:

$anyvm $anyvm allow

And allow work to be a dispvm template:

$ qvm-prefs work template-for-dispvms True

Now we'll all set up. On personal, run:

$ qrexec-client-vm 'work' bug.PartOne

You should see the expected output:

$ qrexec-client-vm 'work' bug.PartOne
Hello from bug-part-one. Going to qrexec bug.PartTwo now
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
Hello from bug part two!
$ _

OK, that's what we expect. Now run the same thing on a disposable VM using work as a template:

[user@personal ~]$ qrexec-client-vm '$dispvm:work' bug.PartOne
Hello from bug-part-one. Going to qrexec bug.PartTwo now
Hello from bug part two!

The call never returns, and the disposable VM that was created has crashed.

I believe that if bug.PartTwo is run from the disposable VM targeting a different VM from the initial source (devel, for example), the process runs fine.

marmarek added a commit to marmarek/qubes-core-agent-linux that referenced this issue Nov 21, 2017

WIP qrexec: report connection end only to the daemon allocating vchan…
… port

(unconfirmed theory)
Only one side of the connection should report MSG_CONNECTION_TERMINATED
- the one where vchan port was allocated. This is especially important
for VM-VM connection, because when two such connections are established
between the same domains, in opposite directions - both will share the
same vchan port number, so qrexec-daemon will get confused which
connection was terminated.

QubesOS/qubes-issues#3318
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 21, 2017

Member

I think this is about qrexec connection cleanup. When you execute two qrexec connections between the same domains (and those are the only connections), but in opposite directions, they will be assigned the same vchan port number. This is normally ok, because connection direction differs (different VM act as a server). But connection cleanup code got confused and report wrong connection being closed. The only code using that report in practice is cleanup of DispVM.

I've pushed preliminary attempt to fix this, but it doesn't work yet.

Member

marmarek commented Nov 21, 2017

I think this is about qrexec connection cleanup. When you execute two qrexec connections between the same domains (and those are the only connections), but in opposite directions, they will be assigned the same vchan port number. This is normally ok, because connection direction differs (different VM act as a server). But connection cleanup code got confused and report wrong connection being closed. The only code using that report in practice is cleanup of DispVM.

I've pushed preliminary attempt to fix this, but it doesn't work yet.

@joshuathayer

This comment has been minimized.

Show comment
Hide comment
@joshuathayer

joshuathayer Nov 21, 2017

@marmarek thanks for your prompt attention! I suspected something along those lines, but had no idea where to start looking... I'll be interested to take a close look at your patch. Is there documentation anywhere about setting up a development environment for building and testing these core libraries? I'd rather be submitting patches than reporting bugs ;)

@marmarek thanks for your prompt attention! I suspected something along those lines, but had no idea where to start looking... I'll be interested to take a close look at your patch. Is there documentation anywhere about setting up a development environment for building and testing these core libraries? I'd rather be submitting patches than reporting bugs ;)

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 21, 2017

Member

See here: https://www.qubes-os.org/doc/development-workflow/
For testing I suggest using cloned template (to not brick to many pieces at once).
There is also (really poor :/) documentation of qrexec internals: https://www.qubes-os.org/doc/qrexec3/#qrexec-protocol-details

As for the bug - the dom0 counterpart is here: https://github.com/QubesOS/qubes-core-admin-linux/blob/master/qrexec/qrexec-daemon.c#L387-L517 and https://github.com/QubesOS/qubes-core-admin-linux/blob/master/qrexec/qrexec-client.c#L713-L805. Unfortunately the logic about vchan port allocation is quite convoluted (the second link), mostly because dom0 is special - do not have own qrexec-agent/qrexec-daemon pair.

BTW if you want to install qrexec service in one VM only, you can use /usr/local/etc/qubes-rpc (which is part of /rw), instead of ln -s in rc.local.

Member

marmarek commented Nov 21, 2017

See here: https://www.qubes-os.org/doc/development-workflow/
For testing I suggest using cloned template (to not brick to many pieces at once).
There is also (really poor :/) documentation of qrexec internals: https://www.qubes-os.org/doc/qrexec3/#qrexec-protocol-details

As for the bug - the dom0 counterpart is here: https://github.com/QubesOS/qubes-core-admin-linux/blob/master/qrexec/qrexec-daemon.c#L387-L517 and https://github.com/QubesOS/qubes-core-admin-linux/blob/master/qrexec/qrexec-client.c#L713-L805. Unfortunately the logic about vchan port allocation is quite convoluted (the second link), mostly because dom0 is special - do not have own qrexec-agent/qrexec-daemon pair.

BTW if you want to install qrexec service in one VM only, you can use /usr/local/etc/qubes-rpc (which is part of /rw), instead of ln -s in rc.local.

@joshuathayer

This comment has been minimized.

Show comment
Hide comment
@joshuathayer

joshuathayer Nov 21, 2017

Thanks so much, I'll spend some time with that tomorrow.

Thanks so much, I'll spend some time with that tomorrow.

@joshuathayer joshuathayer referenced this issue in freedomofpress/securedrop-workstation Jan 9, 2018

Open

Work around Qubes' qrexec bug #46

@andrewdavidwong andrewdavidwong added bug C: core and removed C: other labels Mar 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment