New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Rework remoteConfig() to support command and control capabilities #498
Comments
We had to globally disable remote config when support for TLS was added. |
As I starting point, I wanted to get remoteConfig working for non-tls connections again. To do this I made these changes to a stock v1.0.2:
Building with these changes allowed me to successfully communicate with a scoped process. For specifics, first I started an app that would stick around a while in one shell:
Then in another shell, I created a /tmp/cmdin file with what should be a valid command, launched tcpserver, and then typed "U" and enter in the tcpserver shell to tell tcpserver to send the contents of /tmp/cmdin to the sleep process above.
What shows that it was working, was that the scoped process responded by sending a proper response to that command:
|
Without any changes from the code above, I can see that TLS is no bueno, which is why this ticket was written in the first place:
In a second shell:
What I saw:
|
So, I've dug in to this a bit, and I think I've figured out how to add command/control over tls. It's not totally trivial. I'll try to explain why here. My conclusion as I write this is that I shouldn't do this now just due to time constraints at the moment. Instead I'll try to leave enough breadcrumbs of a description to explain what it would take should we decide to do this later... First, what happened above where I said "TLS is no bueno" is:
My first attempt to fix this failed and it took a bit to figure out why that approach was not viable. What I tried to do is equivalent to the thing the original guy asking the question in this thread tried... https://openssl-users.openssl.narkive.com/l4JsYKS8/ssl-peek-vs-ssl-pending After the poll() returns and says there is incoming data, I tried calling SSL_has_pending() to see if there was ssl data, planning to use SCOPE_SSL_read() to handle this if there was. But... the SSL_has_pending() never returned true even when I could see that the data was tls stuff! After finding the above link and pouring over it a few times, I came to understand that for the state of the ssl subsystem to work correctly, I can't interleave poll()s with SSL_ calls in this way. (poll and select are equivalent for the sake of understanding what's going on here) This reads to me a little like a zen poem but taking this to heart and following it's suggestion helped me get where I needed to be:
After we establish a socket connection using an async (non-blocking) socket, we've always switched the socket to be blocking, then if desired, did the establishTlsConnection() stuff and all later stuff as a blocking socket. This zen poem made me think that I needed to switch things to be non-blocking to have a chance. I then realized that for the sake of investigation at least, I could let the establishTlsConnection() run as a blocking socket as we always have, and then after the TLS connection was established, I could switch the socket to be non-blocking. This got me farther. At this point I have a tls connection working, and just need to figure out how to read the command/control data in a way that doesn't hose the stateful nature of our TLS subsystem. To do this in remoteConfig(), before we poll() we can now call SCOPE_SSL_read() because it won't block. I've seen it return no error, and I've seen it return SSL_ERROR_WANT_READ. If the error is SSL_ERROR_WANT_READ, now it's ok to poll in a way that won't hose up the TLS subsystems' state. Voilá! This is how it would be possible to support command/control over TLS. The recipe is if we're using tls to 1) use non-blocking sockets, and 2) be sure to call SCOPE_SSL_read() before calling poll(). |
To capture the state of my experimental code before I commit anything, I captured diffs from 3f90a84 version, that I'm attaching here. It's absolutely not ready for primetime, but might be something to refer to if my comments above are too cryptic/hard to follow... To rehydrate this, |
Ok. So the current plan is to support command and control on tcp and unix connections, with the limitation that we won't support tls right now. With this agreement, I think the commit here on the feature/498-remote-config branch is ready for prime time and can now be merged if needed. |
I originally saw an issue with tcpserver and tls, during the timeframe of the above comment (Mar 24, 2022). |
The final set of testing I performed:
All tests passed. |
During the review, @iapaddler asked whether we really want to do a ctlDisconnect() in the "no bueno" situation above... #498 (comment) After talking, it does seem like receiving garbage on the ctl channel should not cause us to drop the connection. So I made one more change here... removed the ctlDisconnect() from remoteConfig(). To test this:
The output of tcpserver makes it clear the connection is getting dropped too:
After making this change, I ran the same test, and observed with a similar watch command that appscope did not kill the connection for port 9109. I could also see the difference from the output of tcpserver:
I'm considering this a successful test. ✅ |
This is one of the tasks that was noted in #305. Allow reception of commands from an external source again.
The text was updated successfully, but these errors were encountered: