-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to handle ssl certificate rotation scenario from client #2868
Comments
There is no client-side support for this, and I'm not sure whether the broker supports it either, but if the broker were to support TLS PAH we could enable it in the client as well with SSL_CTX_set_post_handshake_auth(): https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_post_handshake_auth.html |
@edenhill Thanks for responding. Server has connections.max.reauth.ms. "The broker will disconnect any such connection that is not re-authenticated within the session lifetime and that is then subsequently used for any purpose other than re-authentication" Ideally it would be good for a client application to be a good citizen and present rotated certificate to server before connection is forcibly disconnected. |
From what I understand Kafka session reauth is for the SASL layer only, not SSL. librdkafka currently does not call SSL_CTX_set_post_handshake_auth, it could but then it would need to provide a certificate callback to deliver the new certificate to the server, and this has implications for librdkafka daughter clients in other languages. |
I think there is a pretty good argument that dynamic SSL rotation is a must have these days. Many systems out there support this and companies have introduced short lived certificates as a way to identify applications. From a Kafka server standpoint, I have not tried this yet, but it does appear that they support dynamic TLS. With this certificate rotation it makes it possible to plug in existing PKI infrastructure as a source for ACLs. |
We'll add support for this in the clients when the broker supports it. |
The brokers already do support this though:
It was added as part of KIP-226 |
I believe that is not enforced for existing connections, just new ones:
|
We are very much interested in this feature as our infrastructure does rotate out keys/certificates of all hosts every couple of hours. Hence there are few questions pertaining to the behavior of librdkafka consumer w.r.t. certificates/keys provided from file location. Does librdkafka store the certificates/key etc... read from file and re-use for other connections or it always read it afresh from file/disk when establishing new connections to kafka brokers? More clarity on how the keys/certificates are used, when provided in-memory / from file, assumption regarding lifecycle of keys/certificates, errors received by clients when server fails to authenticate client etc... will be helpful in correct implementation of it's use in consumers. Also what happens when once authenticated client connection is closed by broker to induce consumer to re-authenticate. Will consumer in this case reload keys/certificates from disk or will re-use them from the time previously successfully established connection. |
librdkafka will load certificates, et.al., once per instance on rd_kafka_new() instantiation and they will be stored in memory on the SSL_CTX, the SSL_CTX is then reused for all broker connections the client instance is doing. I think the SSL cert reauth can be done without reconnectiong using the PHA extension. |
Does it need to be implemented in librdkafka or do we have to do it in client code? Moreover, how do we handle this in client code where we do not get Error while reading message? Also, does it handle both scenarios where key/certs are rotated not only for clients / consumers but also for the brokers. |
We'll need to wait for the broker implementation before addressing this in librdkafka. |
Currently, we use TLS client auth and nothing else for our Kafka clients. If i've understood this correctly, the certificate validity is checked once by the broker when a connection is first made, and then never again; a connected client whose certificate expires can stay connected. This is what the broker supporting TLS post-handshake auth would fix, right? At the moment, we use short-lived TLS client certificates anyway. They will eventually expire and stay connected, and I'm fine with that. But, if librdkafka decides to open a new broker connection after the certificate expires, that will of course fail. Then, we have to tear down the whole producer/consumer and restart it with up-to-date credentials. Ideally, it'd be great if librdkafka supported passing in a callback to set the client cert/key instead, so that when asked, the application can always respond with an up-to-date credential. |
And since "it would be great if..." isn't that productive in open-source, I guess the question really should be "would you accept a patch that..." instead :) |
@edenhill @adinigam I had a look at what would be involved in asking librdkafka to re-read certificates off disk, and came up with this diff: master...zendesk:ssl_fork Curious if you have any thoughts? Do you think it's worthwhile turning this approach into a PR? What other things would need to be supported? |
Thanks for sharing your suggested solution. I think from a usability perspective it would be easier on the application to have a cert-retrieve-callback so it doesn't have to keep track of certificate lifetimes, and this retrieve would be called on each (re)connect attempt. |
Great work @KJTsanaktsidis . |
If we constructed the cert-refresh/retrieve callback (that will be called on each connect attempt) with an error parameter that if set indicates that the last connection attempt failed we would allow the application to distinguish between the case where refreshes are required or not, that is, if the callback is triggered without an error set the application could simply ignore it and have librdkafka use the current cert. |
I think a callback approach is probably more flexible, and avoids questions like "why can I get updating certificates with To put my Go glasses on, the callback interface is in fact what would be required to support having confluent-kafka-go accept a Golang stdlib The only tricky thing about the callback is that it requires a little bit more surgery; at the moment, there's one I might play around with adding a callback & benchmarking it next time I get a few spare cycles. |
IIRC it is problematic to call into Go (from C) from a thread that the Go runtime does not know about, in this case one of the librdkafka broker threads, that's also why the Go client uses the event queue interface which we could also use for this but that would block the broker thread for a pretty long time waiting for the Go event to be handled, and that'll lead to other problems. |
@edenhill @adinigam I did a bit of experimentation around what a callback interface for librdkafka fetching certificates might look like, and how that can be integrated with confluent-kafka-go. It turns out OpenSSL can actually do most of the heavy lifting for us - there's
Basically, we can implement the verify and fetch-client-cert callbacks in terms of a golang Thoughts? An approach worth pursuing? |
So this took a little while, but I put together a PR for how this could work: #3180 |
@KJTsanaktsidis @edenhill I'm looking to implement a similar feature to handle client SSL cert rotation, and am also wondering if using the OpenSSL SSL_CTX_set_cert_cb function is the best approach for this? Thanks! |
@vctoriawu well, we're planning to send my patch ☝️ it to production next week I think, so.... I'll let you know? 😂 |
Hi @KJTsanaktsidis , I'm looking forward to the solution of client SSL cert rotation, when could I get the solution in the master branch? Thanks. |
Merging is obviously something that’s up to the librdkafka maintainers. However, we deployed the change to production at Zendesk last week and it’s been working great. One hitch, TLS added an extra ~40ms of latency to every publish acknowledgement, but that turned out to be related to nagle’s algorithm on the socket. Enabling the ‘socket.nagle.disable’ option to our producer configs seemed to fix this nicely though! |
Any update on this? One thing that I find a bit odd already in the current state is that seemingly errors from failed client auth are marked as non-fatal. This means that if treating those as resolvable by the library you basically get stuck once client certs are expired and librdkafka tries to reconnect under the hood. |
We've run into this, where we had an Azure client secret that expired, and because authentication is non-fatal, the library just kept going. Then I reproduced it with a bad pair of certs for authentication, and again it just treats them as non-fatal and keeps retrying. I wish we could mark authentication errors as fatal. |
Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ
Description
How is application supposed to use librdkafka to handle ssl certificate rotation scenario?
How to reproduce
Below is the flow of events
IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
librdkafka version (release number or git tag):
1.2
Apache Kafka version:
2.4
librdkafka client configuration:
metadata.broker.list=broker1:9093, security.protocol=ssl ssl.certificate.pem=std::string, ssl.key.password=std::string, enable.ssl.certificate.verification=true, config->set_ssl_cert(RdKafka::CERT_PRIVATE_KEY, RdKafka::CERT_ENC_PKCS12, const void *buffer, size_t size, std::string &errstr), ssl_cert_verify_cb, SecurityVerifyCallback
class SecurityVerifyCallback : public RdKafka::SslCertificateVerifyCb
{
public:
bool ssl_cert_verify_cb(std::string const& brokerName, int32_t brokerId, int* error, int depth,
const char* buffer, size_t size, std::string& errstr) override;
}
Operating system:
windows
Provide logs (with
debug=..
as necessary) from librdkafkaProvide broker log excerpts
Critical issue
The text was updated successfully, but these errors were encountered: