Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working socket-based qrexec #3

Merged
merged 26 commits into from
Feb 17, 2020
Merged

Conversation

DemiMarie
Copy link
Contributor

This commit includes working socket-based qrexec.

Known limitations:

  • Only dom0 →domU calls have been tested.
  • dom0 part often hangs and must be killed manually

@marmarek
Copy link
Member

marmarek commented Jan 31, 2019

Thanks for doing this! I'll try to review at least parts of it next week, but the amount of changes (and my other responsibilities) may not allow me to submit full review.

In the meantime, if you could split code style improvements (including asserts etc) from the actual socket-related changes into separate commits, that would ease review a lot.

@marmarek
Copy link
Member

Also, could you upload your PGP key to some keyserver (so the verification script could fetch it)?

@DemiMarie DemiMarie force-pushed the socket-based-qrexec branch 4 times, most recently from ade3dc9 to 827c0a7 Compare February 1, 2019 16:52
@DemiMarie
Copy link
Contributor Author

At some point, I plan on removing /usr/lib/qubes/qubes-rpc-multiplexer, and logging from the daemon directly.

@DemiMarie
Copy link
Contributor Author

@marmarek is the code now easier to review? Also, can you make the bot restart?

@marmarek
Copy link
Member

marmarek commented Feb 4, 2019

is the code now easier to review

Yes, it's much better now :)

Copy link
Member

@marmarek marmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the comments on the code:

  • Replacement of qubes-rpc-multiplexer seems work-in-progress - a bunch of functions got argument parameter, which isn't really used. Since this is unrelated to socket-based connection, it should at least be a separate commit (and part of PR where it will be used).

  • Socket-based service have no way to get source domain name, service argument (unless separate sockets for each argument is open). See Socket-based qrexec services qubes-issues#3912 (comment)

agent/Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
agent/Makefile Outdated Show resolved Hide resolved
agent/Makefile Outdated Show resolved Hide resolved

strncpy(username, cmdline, cmdline_len);
colon = index(username, ':');
if (cmdline_len > (1ULL << 20))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either #define this constant, or add a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still applies.

agent/qrexec-agent-data.c Outdated Show resolved Hide resolved
agent/qrexec-agent-data.c Outdated Show resolved Hide resolved
agent/qrexec-agent-data.c Outdated Show resolved Hide resolved
agent/qrexec-agent.h Outdated Show resolved Hide resolved
libqrexec/libqrexec-utils.h Outdated Show resolved Hide resolved
@DemiMarie
Copy link
Contributor Author

Besides the comments on the code:

  • Replacement of qubes-rpc-multiplexer seems work-in-progress - a bunch of functions got argument parameter, which isn't really used. Since this is unrelated to socket-based connection, it should at least be a separate commit (and part of PR where it will be used).

I will split it up into separate commits, and potentially open a separate PR.

Also, I based this on master, which isn’t signed, so I didn’t check the signature on the Git repo (oops!).

Correct.

There are two possibilities for handling this:

  1. Prepend the data to the input. This has the advantage of simplicity, but is incompatible with many services.
  2. Transmit the data out-of-band, such as over a separate file descriptor.

Whichever method we choose, we should provide a C library that implements this (as a wrapper around accept4).

Additionally, the main reason for socket-based qrexec is performance, and that will not be helped much until we can speed up qrexec policy handling. Ideally, this would be handled by qrexec-daemon in dom0.

@marmarek
Copy link
Member

marmarek commented Feb 6, 2019

1. Prepend the data to the input.  This has the advantage of simplicity, but is incompatible with many services.

As explained in linked comment, I'd go this way. With an option to opt out of it. Reasoning: application specifically written for being qrexec service, should be able (and even encouraged) to easily obtain this information. On the other hand, if connecting directly to some generic application (like ssh-agent), it won't retrieve this information anyway, regardless of the mechanism.
See load_service_config().

Also, I based this on master, which isn’t signed, so I didn’t check the signature on the Git repo (oops!).

Top commit is signed by Wojtek. But indeed signed tag is missing, as this repository is kind of work-in-progress (see #2).

agent/Makefile Outdated Show resolved Hide resolved
daemon/Makefile Outdated Show resolved Hide resolved
agent/Makefile Show resolved Hide resolved
libqrexec/Makefile Outdated Show resolved Hide resolved
libqrexec/Makefile Outdated Show resolved Hide resolved
agent/Makefile Outdated Show resolved Hide resolved
@DemiMarie DemiMarie force-pushed the socket-based-qrexec branch 2 times, most recently from eaf0b29 to 868ce55 Compare February 7, 2019 15:01
@DemiMarie
Copy link
Contributor Author

Questions:

  • Should PAM be invoked every time a new process is spawned? It seems to me that it should only be invoked when qrexec-fork-server first starts.
  • With these changes, qrexec-fork-server is no longer optional: it MUST be running for socket-based qrexec to work. Proposed solution: run one instance for each user, spawned on-demand by qrexec-agent. Is there a better answer?

@marmarek
Copy link
Member

marmarek commented Feb 7, 2019

* With these changes, `qrexec-fork-server` is no longer optional: it MUST be running for socket-based qrexec to work.

Why exactly? Since user doesn't really make sense for socket-based services (the actual service process is started independently, as some user), I don't see why separate process would be needed (i.e. why couldn't it work without qrexec-fork-server too).
This probably means moving setuid/setgid was wrong.

@DemiMarie
Copy link
Contributor Author

@marmarek The issue is SO_PEERCRED/SO_PEERSEC/SCM_CRED and friends. Some programs, such as databases, use them for authentication.

@DemiMarie
Copy link
Contributor Author

The setuid could still be done, but we would need to do a full PAM call to get SO_PEERSEC correct in general.

@DemiMarie
Copy link
Contributor Author

@marmarek can you review the updated version? The main thing missing is passing parameters to the call.

@marmarek
Copy link
Member

The issue is SO_PEERCRED/SO_PEERSEC/SCM_CRED and friends. Some programs, such as databases, use them for authentication.

I see. But IMO that's minority use case for qrexec services. In that case it would be cleaner to start qrexec-fork-server independently of qrexec-agent (separate systemd service?), possibly even ordering it before qrexec-agent to avoid race conditions on startup.

@DemiMarie
Copy link
Contributor Author

DemiMarie commented Mar 26, 2019 via email

@marmarek
Copy link
Member

marmarek commented Mar 26, 2019

That said, we can have the agent spawn the server on-demand.

This have a great potential of breaking the common use case - starting applications as "user" user. User's qrexec-fork-server needs to be started as a child of X session, inheriting all the X related env variables, being part of logind session. If qrexec-fork-server would be started earlier by qrexec-agent (for any reason, including some buggy edge case handling), then we'll have a problem.
It would be unwise risking breaking the most common case, just to handle minor use case, especially when there is less risky solution (affecting only those who explicitly enable it).

@DemiMarie
Copy link
Contributor Author

@marmarek I will implement transmission of the RPC command to the service. I will probably implement this by sending the entire string after user:QUBESRPC , prefixed by a 1-byte length field, and NULL terminated. If the command from dom0 is user:QUBESRPC demi.Pipe+alpha somevm, the application would get "\027demi.Pipe+alpha somevm\0". This is simple, easy to implement, and fast.

@marmarek
Copy link
Member

One by may be not enough, see QubesOS/qubes-issues#4909
Better use 4-bytes. Otherwise, looks fine.
Also, there will be more arguments there soon: requested target type + name (like "some.service+argument source-vm name target-vm", or "some.service+argument source-vm keyword @default"). This is already the case for dom0 services, but soon will be for VMs too.
Anyway, if you copy the full line there, it should be fine.

@DemiMarie
Copy link
Contributor Author

DemiMarie commented Mar 30, 2019 via email

@marmarek
Copy link
Member

Can you think of anything else needed before this gets merged?

Generally, similar thing would be useful for dom0 services, but it isn't really necessary for this PR.
Other than that, this needs a documentation update (https://www.qubes-os.org/doc/qrexec3/). And some tests - see for example here. A simple service using socat UNIX-LISTEN should be enough.

You can debug no fork server case for different user - for example root.

@marmarek
Copy link
Member

marmarek commented Feb 8, 2020

Some more context extracted from core file:

  • errno = EBADF
  • local_stdout_fd = 4
  • local_stdin_fd = -1

@marmarek
Copy link
Member

marmarek commented Feb 8, 2020

@DemiMarie
Copy link
Contributor Author

DemiMarie commented Feb 8, 2020 via email

@marmarek
Copy link
Member

marmarek commented Feb 8, 2020

That looks to be it. With this (close_stdin_fd) modified, test_091_qrexec_service_socket_dom0_send passes.

The `close_stdin_fd` helper function ensures that `local_stdin_fd` is
not closed if it is equal to `local_stdout_fd`, but just calling `close`
does not do that.  This caused `shutdown` to fail with EBADF, which lead
to a crash.
Even when they do not need to send an exit code.
@marmarek
Copy link
Member

marmarek commented Feb 8, 2020

One of the unit tests fails: https://travis-ci.com/QubesOS/qubes-core-qrexec/jobs/285141372#L2152
You can easily run them locally in a VM - you need to install core-vchan-socket in that VM first.

@pwmarcz
Copy link
Contributor

pwmarcz commented Feb 8, 2020

You can run the test with strace to better see what happens exactly: USE_STRACE=1 ./run-tests -k test_run_dom0_command_and_connect_vm

I haven't checked but but I would guess that the qrexec-client exits without an explicit libvchan_close (that's necessary for the vchan-socket implementation. see e0bab78), or maybe just exits too early.

@DemiMarie
Copy link
Contributor Author

Indeed, the vchan was not being closed. Thank you @pwmarcz! Fixed, along with many other error paths where it was not being closed.

@@ -382,6 +382,9 @@ static void handle_input(libvchan_t *vchan, int data_protocol_version)
send_exit_code(vchan, 0);
do_exit(0);
}
} else if (local_pid < 0) {
// socket-based service, so we will never get a SIGCHLD
do_exit(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly this code in select_loop() doesn't handle it already:

        if (local_stdout_fd == -1 &&
            (child_exited || (local_stdin_fd == -1 && local_pid == -1)))
            check_child_status(vchan);

?

@marmarek
Copy link
Member

I was trying to run full tests including latest changes, but it fails during update:

debian-10: ERROR (exception Failed to copy Salt configuration to disp-mgmt-debian-10)
fedora-30: ERROR (exception Failed to copy Salt configuration to disp-mgmt-fedora-30)
whonix-gw-15: ERROR (exception Failed to copy Salt configuration to disp-mgmt-whonix-gw-15)
whonix-ws-15: ERROR (exception Failed to copy Salt configuration to disp-mgmt-whonix-ws-15)

This is just after updating package in dom0. The call that fails is https://github.com/QubesOS/qubes-mgmt-salt/blob/e75e528de0a8039709ff19e1cb43033a020ec9f9/qubessalt/__init__.py#L156-L164:

            retcode = dispvm.run_service(
                'qubes.Filecopy',
                localcmd='/usr/lib/qubes/qfile-dom0-agent {}'.format(
                    salt_config)).wait()
            shutil.rmtree(salt_config)
            if retcode != 0:
                raise qubesadmin.exc.QubesException(
                    "Failed to copy Salt configuration to {}".
                    format(dispvm.name))

I don't have stderr of underlying qrexec-client call unfortunately. But in dom0 dmesg I don't see any crash.

@marmarek marmarek mentioned this pull request Feb 11, 2020
@DemiMarie
Copy link
Contributor Author

DemiMarie commented Feb 11, 2020 via email

@pwmarcz
Copy link
Contributor

pwmarcz commented Feb 12, 2020

@marmarek asked me to write a test for qrexec-client -d domain -l local_cmd user:remote_cmd and I wrote one. It fails on your branch:

======================================================================
FAIL: test_run_vm_command_from_dom0_with_local_command (qrexec.tests.socket.daemon.TestClient)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/qubes-core-qrexec/qrexec/tests/socket/daemon.py", line 376, in test_run_vm_command_from_dom0_with_local_command
    self.assertEqual(self.client.returncode, 0)
AssertionError: 1 != 0

Apparently at the very end, qrexec-client exits with code 1.

send_exit_code(vchan, WEXITSTATUS(status));
do_exit(status);
send_exit_code(vchan, status);
close_vchan_and_exit(1, vchan);
Copy link
Member

@marmarek marmarek Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DemiMarie The local cmd handling issue is here: hardcoded 1 instead of status as it was previously.

@pwmarcz I think your test may be wrong, if I fix it to status, then the exit code is 44, not 0. On the other hand, I'm not really sure if qrexec-client exit code should represent local process exit code, instead of the remote one. In case of no local command, it should definitely be remote process exit code. But if there is local command logically it should receive the remote exit code - with an exception that we have no way of doing that... What do you think how it should behave?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to complete the picture - if qrexec-client should exit with the remote process exit code even if local command is used, then both code and test are wrong - code does not wait for remote exit code and test doesn't send one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My test reflected the status quo on master - currently if the local command exits first, we always exit with code 0, and if the remote command exits first, we return its code. I agree that sounds wrong.

I also think returning the remote exit code sounds good. We cannot do that in the case where we just connect two remote domains, but in this case (with both local cmd and remote cmd) probably it makes sense to wait for the remote end.

buffer_append(stdin_buffer, command->service_descriptor, strlen(command->service_descriptor) + 1);
return 0;
}
switch (errno) {
Copy link
Contributor

@pwmarcz pwmarcz Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling connect() first makes this much harder than it needs to be. This is a pretty complicated bit of control flow: infinite loop (for(;;)) with changing the use_bare_path parameter, and possible exits through return, break, and goto out, depending on many possible error codes.

I'm finding it pretty hard to understand all the possible outcomes, and still I think there are some weird corner cases: for instance, if a file is a socket but the listening process died (so we get ECONNREFUSED), we will fork and try executing it.

Why not first check if a file exists, and then decide what to do with it? Something like:

// Look for `service_name+arg` and `service_name` files in both dirs, true
if (!find_service_file(command->service_descriptor, file_path)) {
    fprintf(stderr, "file not found for %s\n", command->service_descriptor);
    return 1;
}

if (is_socket(file_path)) {
    if (!qubes_connect(file_path) {
        perror("connect");
        return 1;
    }
    // prepare socket
    return 0;
} else if (is_executable(file_path)) {
    return do_fork_exec(file_path, ...);
} else {
    fprintf(stderr, "unrecognized service type (not a socket or executable): %s\n", file_path);
    return 1;
}

That would simplify the control flow (no loops over the whole code, just possibly a loop over directories in find_service_file), make sure that the errors (e.g. for connect()) are always reported, and match what qubes-rpc-multiplexer does (find a file, then execute it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As discussed with @marmarek, I'm happy to leave this part until after merge; number 1 priority for me is getting it under unit tests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually agree with you. I am not sure if this should go live until this is fixed.

@pwmarcz
Copy link
Contributor

pwmarcz commented Feb 14, 2020

By the way, how would I go about overriding the RPC services path (/etc/qubes-rpc)? For the unit tests, I run all the programs locally, as non-root, and I added config options like --agent-socket=/tmp/.../agent.sock. I would like to add a similar option for service path (or, if that's too cumbersome, an environment variable).

@marmarek
Copy link
Member

By the way, how would I go about overriding the RPC services path (/etc/qubes-rpc)? For the unit tests, I run all the programs locally, as non-root, and I added config options like --agent-socket=/tmp/.../agent.sock. I would like to add a similar option for service path (or, if that's too cumbersome, an environment variable).

Yes, some option like this makes sense. But note currently those directories are hardcoded in exec.c in a const array, so you'll need to change that definition.

@DemiMarie
Copy link
Contributor Author

By the way, how would I go about overriding the RPC services path (/etc/qubes-rpc)? For the unit tests, I run all the programs locally, as non-root, and I added config options like --agent-socket=/tmp/.../agent.sock. I would like to add a similar option for service path (or, if that's too cumbersome, an environment variable).

An environment variable would be easier to implement. Would separate environment variables for the local and system paths be acceptable?

@pwmarcz
Copy link
Contributor

pwmarcz commented Feb 15, 2020

An environment variable would be easier to implement. Would separate environment variables for the local and system paths be acceptable?

Design-wise, it sounds like a variable holding a list of paths would be good (by analogy with PATH, LD_LIBRARY_PATH etc.) But separate variables for local and system paths would also work for me.

For testing, I need to 1. make the program look into my directory, 2. make it NOT look into any of the system-wide directories (since I'm just running a local instance).

@marmarek
Copy link
Member

The current version still fails at test_096_qrexec_service_socket_vm_send (timeout).
See #3 (comment).
But since it isn't a regression, and there are a few PRs that waits for this to be merged, that can help to debug it, I'm ok with merging it and fixing it in a subsequent PR. Let me know @DemiMarie if you want to push something here now.

@DemiMarie
Copy link
Contributor Author

DemiMarie commented Feb 17, 2020 via email

@pwmarcz
Copy link
Contributor

pwmarcz commented Feb 17, 2020

That's good news, I've been waiting for that merge because I didn't want to pile on too many changes.

Then I will be trying to add the path override and write unit tests for the socket part next.

@marmarek marmarek merged commit c05c3a1 into QubesOS:master Feb 17, 2020
@DemiMarie DemiMarie deleted the socket-based-qrexec branch December 13, 2021 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants