-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working socket-based qrexec #3
Conversation
Thanks for doing this! I'll try to review at least parts of it next week, but the amount of changes (and my other responsibilities) may not allow me to submit full review. In the meantime, if you could split code style improvements (including asserts etc) from the actual socket-related changes into separate commits, that would ease review a lot. |
Also, could you upload your PGP key to some keyserver (so the verification script could fetch it)? |
ade3dc9
to
827c0a7
Compare
At some point, I plan on removing |
@marmarek is the code now easier to review? Also, can you make the bot restart? |
Yes, it's much better now :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the comments on the code:
-
Replacement of
qubes-rpc-multiplexer
seems work-in-progress - a bunch of functions gotargument
parameter, which isn't really used. Since this is unrelated to socket-based connection, it should at least be a separate commit (and part of PR where it will be used). -
Socket-based service have no way to get source domain name, service argument (unless separate sockets for each argument is open). See Socket-based qrexec services qubes-issues#3912 (comment)
agent/qrexec-agent.c
Outdated
|
||
strncpy(username, cmdline, cmdline_len); | ||
colon = index(username, ':'); | ||
if (cmdline_len > (1ULL << 20)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either #define
this constant, or add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment still applies.
I will split it up into separate commits, and potentially open a separate PR. Also, I based this on
Correct. There are two possibilities for handling this:
Whichever method we choose, we should provide a C library that implements this (as a wrapper around Additionally, the main reason for socket-based qrexec is performance, and that will not be helped much until we can speed up qrexec policy handling. Ideally, this would be handled by |
As explained in linked comment, I'd go this way. With an option to opt out of it. Reasoning: application specifically written for being qrexec service, should be able (and even encouraged) to easily obtain this information. On the other hand, if connecting directly to some generic application (like ssh-agent), it won't retrieve this information anyway, regardless of the mechanism.
Top commit is signed by Wojtek. But indeed signed tag is missing, as this repository is kind of work-in-progress (see #2). |
eaf0b29
to
868ce55
Compare
Questions:
|
Why exactly? Since user doesn't really make sense for socket-based services (the actual service process is started independently, as some user), I don't see why separate process would be needed (i.e. why couldn't it work without |
@marmarek The issue is |
The |
@marmarek can you review the updated version? The main thing missing is passing parameters to the call. |
I see. But IMO that's minority use case for qrexec services. In that case it would be cleaner to start qrexec-fork-server independently of qrexec-agent (separate systemd service?), possibly even ordering it before qrexec-agent to avoid race conditions on startup. |
Makes sense. That said, we can have the agent spawn the server on-demand.
Does the rest of the code look good? I have tested it locally, and it can
make connections. I have not tested the agent changes, however, mostly
because debugging the agent is extremely difficult.
…On Mon, Mar 25, 2019, 7:55 PM Marek Marczykowski-Górecki < ***@***.***> wrote:
The issue is SO_PEERCRED/SO_PEERSEC/SCM_CRED and friends. Some programs,
such as databases, use them for authentication.
I see. But IMO that's minority use case for qrexec services. In that case
it would be cleaner to start qrexec-fork-server independently of
qrexec-agent (separate systemd service?), possibly even ordering it before
qrexec-agent to avoid race conditions on startup.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGGWB3ssb2vgJr0MtHz9ZEXYZ5Rs-baXks5vaWHngaJpZM4ac9HT>
.
|
This have a great potential of breaking the common use case - starting applications as "user" user. User's qrexec-fork-server needs to be started as a child of X session, inheriting all the X related env variables, being part of logind session. If qrexec-fork-server would be started earlier by qrexec-agent (for any reason, including some buggy edge case handling), then we'll have a problem. |
@marmarek I will implement transmission of the RPC command to the service. I will probably implement this by sending the entire string after |
One by may be not enough, see QubesOS/qubes-issues#4909 |
Can you think of anything else needed before this gets merged? Also, how
should I debug the no fork server case? That needs a way to log in to a VM
without qrexec. What is the safest way to do this? I can use the QEMU
emulated console, but it is slow, and by default disabled after a certain
point in the boot process.
…On Sat, Mar 30, 2019, 3:04 PM Marek Marczykowski-Górecki < ***@***.***> wrote:
One by may be not enough, see QubesOS/qubes-issues#4909
<QubesOS/qubes-issues#4909>
Better use 4-bytes. Otherwise, looks fine.
Also, there will be more arguments there soon: requested target type +
name (like "some.service+argument source-vm name target-vm", or
"some.service+argument source-vm keyword @default
<https://github.com/default>"). This is already the case for dom0
services, but soon will be for VMs too.
Anyway, if you copy the full line there, it should be fine.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGGWB4qoWKEsdymg_MCvnuwPxPWym5hUks5vb7UggaJpZM4ac9HT>
.
|
Generally, similar thing would be useful for dom0 services, but it isn't really necessary for this PR. You can debug no fork server case for different user - for example root. |
d73c2d5
to
5596a68
Compare
Some more context extracted from core file:
|
Full core dump: https://ftp.qubes-os.org/~marmarek/qrexec-client.core.lz4 |
Problem is on lines 617 and 618 of qrexec-daemon.c: they should have been
replaced by a call to close_stdin_fd, but were not. Since local_stdout_fd
== local_stdin_fd, local_stdout_fd is now a junk file descriptor.
Therefore, reading it fails, and we crash trying to shut it down.
…On Fri, Feb 7, 2020, 11:40 PM Marek Marczykowski-Górecki < ***@***.***> wrote:
#0 0x0000754a8f5fd625 in raise () from /lib64/libc.so.6
#1 0x0000754a8f5e68d9 in abort () from /lib64/libc.so.6
#2 0x000055e0fb4e649a in close_stdout_fd () at qrexec-client.c:96
#3 0x000055e0fb4e64c1 in do_exit (code=1) at qrexec-client.c:244
#4 0x000055e0fb4e6f7f in handle_input (data_protocol_version=<optimized out>, vchan=0x754a88000c50) at qrexec-client.c:367
#5 select_loop (vchan=0x754a88000c50, data_protocol_version=<optimized out>, stdin_buf=0x7ffc42f78660) at qrexec-client.c:628
#6 0x000055e0fb4e5822 in main (argc=<optimized out>, argv=<optimized out>) at qrexec-client.c:874
This is run of socket-merge2 branch from my repo (this PR + #21
<#21>).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3?email_source=notifications&email_token=ABQZMB5KTVCQSGV4DXLO2Z3RBYZUFA5CNFSM4GTT2HJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELFJNEA#issuecomment-583702160>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQZMBYT42ZPX5YVRBTIULLRBYZUFANCNFSM4GTT2HJQ>
.
|
That looks to be it. With this (close_stdin_fd) modified, test_091_qrexec_service_socket_dom0_send passes. |
The `close_stdin_fd` helper function ensures that `local_stdin_fd` is not closed if it is equal to `local_stdout_fd`, but just calling `close` does not do that. This caused `shutdown` to fail with EBADF, which lead to a crash.
Even when they do not need to send an exit code.
One of the unit tests fails: https://travis-ci.com/QubesOS/qubes-core-qrexec/jobs/285141372#L2152 |
You can run the test with strace to better see what happens exactly: I haven't checked but but I would guess that the |
Indeed, the vchan was not being closed. Thank you @pwmarcz! Fixed, along with many other error paths where it was not being closed. |
daemon/qrexec-client.c
Outdated
@@ -382,6 +382,9 @@ static void handle_input(libvchan_t *vchan, int data_protocol_version) | |||
send_exit_code(vchan, 0); | |||
do_exit(0); | |||
} | |||
} else if (local_pid < 0) { | |||
// socket-based service, so we will never get a SIGCHLD | |||
do_exit(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why exactly this code in select_loop() doesn't handle it already:
if (local_stdout_fd == -1 &&
(child_exited || (local_stdin_fd == -1 && local_pid == -1)))
check_child_status(vchan);
?
I was trying to run full tests including latest changes, but it fails during update:
This is just after updating package in dom0. The call that fails is https://github.com/QubesOS/qubes-mgmt-salt/blob/e75e528de0a8039709ff19e1cb43033a020ec9f9/qubessalt/__init__.py#L156-L164: retcode = dispvm.run_service(
'qubes.Filecopy',
localcmd='/usr/lib/qubes/qfile-dom0-agent {}'.format(
salt_config)).wait()
shutil.rmtree(salt_config)
if retcode != 0:
raise qubesadmin.exc.QubesException(
"Failed to copy Salt configuration to {}".
format(dispvm.name)) I don't have stderr of underlying qrexec-client call unfortunately. But in dom0 dmesg I don't see any crash. |
On 2020-02-11 07:29, Marek Marczykowski-Górecki wrote:
I was trying to run full tests including latest changes, but it fails during update:
```
debian-10: ERROR (exception Failed to copy Salt configuration to disp-mgmt-debian-10)
fedora-30: ERROR (exception Failed to copy Salt configuration to disp-mgmt-fedora-30)
whonix-gw-15: ERROR (exception Failed to copy Salt configuration to disp-mgmt-whonix-gw-15)
whonix-ws-15: ERROR (exception Failed to copy Salt configuration to disp-mgmt-whonix-ws-15)
```
This is just after updating package in dom0. The call that fails is https://github.com/QubesOS/qubes-mgmt-salt/blob/e75e528de0a8039709ff19e1cb43033a020ec9f9/qubessalt/__init__.py#L156-L164:
```python
retcode = dispvm.run_service(
'qubes.Filecopy',
localcmd='/usr/lib/qubes/qfile-dom0-agent {}'.format(
salt_config)).wait()
shutil.rmtree(salt_config)
if retcode != 0:
raise qubesadmin.exc.QubesException(
"Failed to copy Salt configuration to {}".
format(dispvm.name))
```
I don't have stderr of underlying qrexec-client call unfortunately. But in dom0 dmesg I don't see any crash.
Would you mind bisecting this? All tests on CI pass, so testing it locally is going to be a pain. I suspect that the vchan gets closed twice.
Would it be possible to get the first three commits merged sooner? In particular, I believe the third one fixes a bug in USB passthrough.
|
@marmarek asked me to write a test for
Apparently at the very end, |
daemon/qrexec-client.c
Outdated
send_exit_code(vchan, WEXITSTATUS(status)); | ||
do_exit(status); | ||
send_exit_code(vchan, status); | ||
close_vchan_and_exit(1, vchan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DemiMarie The local cmd handling issue is here: hardcoded 1 instead of status
as it was previously.
@pwmarcz I think your test may be wrong, if I fix it to status
, then the exit code is 44
, not 0
. On the other hand, I'm not really sure if qrexec-client
exit code should represent local process exit code, instead of the remote one. In case of no local command, it should definitely be remote process exit code. But if there is local command logically it should receive the remote exit code - with an exception that we have no way of doing that... What do you think how it should behave?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to complete the picture - if qrexec-client
should exit with the remote process exit code even if local command is used, then both code and test are wrong - code does not wait for remote exit code and test doesn't send one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My test reflected the status quo on master - currently if the local command exits first, we always exit with code 0, and if the remote command exits first, we return its code. I agree that sounds wrong.
I also think returning the remote exit code sounds good. We cannot do that in the case where we just connect two remote domains, but in this case (with both local cmd and remote cmd) probably it makes sense to wait for the remote end.
buffer_append(stdin_buffer, command->service_descriptor, strlen(command->service_descriptor) + 1); | ||
return 0; | ||
} | ||
switch (errno) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calling connect()
first makes this much harder than it needs to be. This is a pretty complicated bit of control flow: infinite loop (for(;;)
) with changing the use_bare_path
parameter, and possible exits through return
, break
, and goto out
, depending on many possible error codes.
I'm finding it pretty hard to understand all the possible outcomes, and still I think there are some weird corner cases: for instance, if a file is a socket but the listening process died (so we get ECONNREFUSED
), we will fork and try executing it.
Why not first check if a file exists, and then decide what to do with it? Something like:
// Look for `service_name+arg` and `service_name` files in both dirs, true
if (!find_service_file(command->service_descriptor, file_path)) {
fprintf(stderr, "file not found for %s\n", command->service_descriptor);
return 1;
}
if (is_socket(file_path)) {
if (!qubes_connect(file_path) {
perror("connect");
return 1;
}
// prepare socket
return 0;
} else if (is_executable(file_path)) {
return do_fork_exec(file_path, ...);
} else {
fprintf(stderr, "unrecognized service type (not a socket or executable): %s\n", file_path);
return 1;
}
That would simplify the control flow (no loops over the whole code, just possibly a loop over directories in find_service_file
), make sure that the errors (e.g. for connect()
) are always reported, and match what qubes-rpc-multiplexer
does (find a file, then execute it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(As discussed with @marmarek, I'm happy to leave this part until after merge; number 1 priority for me is getting it under unit tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually agree with you. I am not sure if this should go live until this is fixed.
By the way, how would I go about overriding the RPC services path ( |
Yes, some option like this makes sense. But note currently those directories are hardcoded in exec.c in a const array, so you'll need to change that definition. |
An environment variable would be easier to implement. Would separate environment variables for the local and system paths be acceptable? |
Design-wise, it sounds like a variable holding a list of paths would be good (by analogy with For testing, I need to 1. make the program look into my directory, 2. make it NOT look into any of the system-wide directories (since I'm just running a local instance). |
The current version still fails at test_096_qrexec_service_socket_vm_send (timeout). |
I am fine with merging it if you are confident in its security.
…On Sun, Feb 16, 2020, 8:56 PM Marek Marczykowski-Górecki < ***@***.***> wrote:
The current version still fails at test_096_qrexec_service_socket_vm_send
(timeout).
See #3 (comment)
<#3 (comment)>
.
But since it isn't a regression, and there are a few PRs that waits for
this to be merged, that can help to debug it, I'm ok with merging it and
fixing it in a subsequent PR. Let me know @DemiMarie
<https://github.com/DemiMarie> if you want to push something here now.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3?email_source=notifications&email_token=ABQZMB64YZM7325HOEKHWRLRDHVEDA5CNFSM4GTT2HJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL42NCQ#issuecomment-586786442>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQZMB3IUNHO4ZBO6HBYCTDRDHVEDANCNFSM4GTT2HJQ>
.
|
That's good news, I've been waiting for that merge because I didn't want to pile on too many changes. Then I will be trying to add the path override and write unit tests for the socket part next. |
This commit includes working socket-based qrexec.
Known limitations: