-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sinks] Fail open when we can't fetch Kafka config #18853
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code change LGTM. Do you think it's possible to write a test for this?
Thanks for the look!
Theoretically! Though I'll be more confident once I've been able to reproduce this locally. |
Okay, I have reproduced this locally.
Writing an automated test for this will be annoying and I don't think I can get it out today, though I should be able to follow up on it later. |
@philip-stoev or someone else from the QA team can probably help with this! |
@bkirwi can you share how you blocked |
@philip-stoev here's a good guide on the subject: https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-demo-acl-authorization.html tl;dr is set
Probably want to adapt one of the existing kafka-auth tests, since ACLs only work in tandem with authentication. |
I gave a Deny ACL to "ALL" against all topics for a user, and yet
which I think is too late. Please stand by. |
Ok so I was able to create a Kafka user that is not allowed to run
However, this does not cause the expected warning path to trigger. In the For the
For the
In other words, with a restrictive ACL, an empty list of Therefore, none of the error paths in the method will trigger. We get to the
is never reached. In fact, from a cursory examination of the code, rdkafka's That said, I have what I think are all the pieces required to construct an mzcompose test, so I will push one tomorrow. |
Yep, that sounds like the error condition I was seeing! My hypothesis is that the broker is passing along an error code that rdkafka is ignoring. (When I make the equivalent call from a JVM client, I get an exception.) But it seems clear that either way this array is empty.
Is the surprise that the log entry is not showing up? I removed those errors since we're failing open anyways. But I can put them back if that would be more clear! In either case, it sounds like your test would fail to create the sink on |
It would be very strange if the actual cluster config was empty, so this will log an error in that case. (Since we fail open, the overall behaviour is unchanged.)
I pushed a change to error when the resulting config list for our cluster is empty, since it seems unlikely that this will ever happen in a non-error case. Let me know if this is what you had in mind! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a test that passes with this patch but fails on main.
tyvm, @philip-stoev! I was hoping you already had a test up your sleeve 😅 |
I did not have a test, but I have gained something -- a deep, simmering hatred for Kafka authentication. |
Motivation
This PR fixes a previously unreported bug.
Kafka allows specifying the partition number and replication factor when creating a topic. When -1 is specified, it will use a broker-wide default value. However, older versions of Kafka don't have the -1 defaulting behaviour... as a workaround, we fetch the config and apply the defaults manually.
If fetching the config fails, we fail the entire ensure-creation request, even though it is actually fairly likely to succeed: either because the topic exists, or because the Kafka version is recent enough to handle these defaulted configs. This PR "fails open", replacing the error with a warning log and continuing on.
Tips for reviewer
This is mostly code movement; you may want to hide whitespace changes with
?w=1
.I'll try and reproduce this locally before merging.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-proto
label.