-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter out false positive errors during cluster maintenance #36
Filter out false positive errors during cluster maintenance #36
Conversation
98dba24
to
b2bd7af
Compare
a57c53d
to
dc2ed5f
Compare
f5e0955
to
63a3476
Compare
cluster_describer.go
Outdated
|
||
// Verifies whether the Kafka cluster is available or not. | ||
// False positive healthcheck errors are being ignored during maintenance windows. | ||
func checkClusterAvailability(healthErr error, describer clusterDescriber, arn *string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not satisfied with this but just copy-pasting the body of the function across all healthchecks is not cool either.
Modifying the error seems to be the correct way to avoid the repetition but I'm open to suggestions if I've missed something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (non-blocking): Just semantics - from a function named checkClusterAvailability I wouldn't expect to receive an error as argument. I would expect to check whether the cluster is available or not and to let the code using the function decides what to do in case of Maintenance error. The downside is what you've described. I would rename the function to something like verifyHealthErrorServerity
.
cluster_describer.go
Outdated
|
||
// Verifies whether the Kafka cluster is available or not. | ||
// False positive healthcheck errors are being ignored during maintenance windows. | ||
func checkClusterAvailability(healthErr error, describer clusterDescriber, arn *string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (non-blocking): Just semantics - from a function named checkClusterAvailability I wouldn't expect to receive an error as argument. I would expect to check whether the cluster is available or not and to let the code using the function decides what to do in case of Maintenance error. The downside is what you've described. I would rename the function to something like verifyHealthErrorServerity
.
ctx, cancel := context.WithTimeout(context.Background(), clusterDescriptionTimeout) | ||
defer cancel() | ||
|
||
cluster, err := describer.DescribeClusterV2(ctx, &kafka.DescribeClusterV2Input{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (non-blocking): A very theoretical scenario would be for newClusterDescriber()
to return an error when created in the NewConsumer
or NewProducer
. When using checkClusterAvailability
you only check for available config.ClusterArn
. Again theoretically describer
could be nil here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NewConsumer
and NewProducer
functions will only attempt to create a describer if ClusterArn
is provided. If that fails, they'd return an error which would cause the termination of the whole application. By doing this the describer is never going to be nil.
Then I opted for checking ClusterArn
just because it's simpler than nil checking an interface. Do you think it's better to check the describer instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I understand the idea now, my bad. I think it's fine that way it's implemented having in mind that the whole logic is private.
If retrieveClusterState
was public and we wanted to be super foolproof we can check if the describer is not null before describer.DescribeClusterV2
call.
Description
What
Introduces AWS SDK connection in order to determine cluster maintenance windows.
Why
JIRA Ticket
Scope and particulars of this PR (Please tick all that apply)
This Pull Request follows the rules described in our Pull Requests Guide