Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix undefined log levels #402

Merged
merged 4 commits into from
Apr 12, 2021
Merged

fix undefined log levels #402

merged 4 commits into from
Apr 12, 2021

Conversation

bwagner5
Copy link
Contributor

@bwagner5 bwagner5 commented Apr 8, 2021

Issue #, if available:
N/A

Description of changes:
Fix undefined log levels:

BEFORE:

2021/04/02 15:44:08 ??? Trying to get token from IMDSv2
2021/04/02 15:44:15 ??? Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Put \"http://169.254.169.254/latest/api/token\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
2021/04/02 15:44:18 ??? Request failed. Attempts remaining: 2
2021/04/02 15:44:18 ??? Sleep for 2.973889705s seconds
2021/04/02 15:44:23 ??? Request failed. Attempts remaining: 1
2021/04/02 15:44:23 ??? Sleep for 8.84038228s seconds
2021/04/02 15:44:34 ??? Unable to fetch metadata from IMDS error="Unable to parse metadata response: Unable to get a response from IMDS: Get \"http://169.254.169.254/latest/dynamic/instance-identity/document\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
2021/04/02 15:44:34 ??? aws-node-termination-handler arguments: 

AFTER:

build/node-termination-handler --dry-run --node-name hi
2021/04/08 15:37:37 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:37 ERR Unable to fetch metadata from IMDS error="Metadata request received http status code: 403"
2021/04/08 15:37:37 INF aws-node-termination-handler arguments:
	dry-run: true,
	node-name: hi,
	metadata-url: http://169.254.169.254,
	kubernetes-service-host: ,
	kubernetes-service-port: ,
	delete-local-data: true,
	ignore-daemon-sets: true,
	pod-termination-grace-period: -1,
	node-termination-grace-period: 120,
	enable-scheduled-event-draining: false,
	enable-spot-interruption-draining: true,
	enable-sqs-termination-draining: false,
	enable-rebalance-monitoring: false,
	metadata-tries: 3,
	cordon-only: false,
	taint-node: false,
	json-logging: false,
	log-level: INFO,
	webhook-proxy: ,
	webhook-headers: <not-displayed>,
	webhook-url: ,
	webhook-template: <not-displayed>,
	uptime-from-file: ,
	enable-prometheus-server: false,
	prometheus-server-port: 9092,
	aws-region: us-east-1,
	queue-url: ,
	check-asg-tag-before-draining: true,
	managed-asg-tag: aws-node-termination-handler/managed,
	aws-endpoint: ,

2021/04/08 15:37:37 INF Started watching for interruption events
2021/04/08 15:37:37 INF Kubernetes AWS Node Termination Handler has started successfully!
2021/04/08 15:37:37 INF Started watching for event cancellations
2021/04/08 15:37:37 INF Started monitoring for events event_type=SPOT_ITN
2021/04/08 15:37:39 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:39 WRN There was a problem monitoring for events error="There was a problem checking for spot ITNs: Metadata request received http status code: 403" event_type=SPOT_ITN
2021/04/08 15:37:41 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:41 WRN There was a problem monitoring for events error="There was a problem checking for spot ITNs: Metadata request received http status code: 403" event_type=SPOT_ITN
2021/04/08 15:37:43 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:43 WRN There was a problem monitoring for events error="There was a problem checking for spot ITNs: Metadata request received http status code: 403" event_type=SPOT_ITN
2021/04/08 15:37:45 ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1 error="Received an http status code 403"
2021/04/08 15:37:45 WRN There was a problem monitoring for events error="There was a problem checking for spot ITNs: Metadata request received http status code: 403" event_type=SPOT_ITN
2021/04/08 15:37:45 WRN Stopping NTH - Duplicate Error Threshold hit.
panic: There was a problem checking for spot ITNs: Metadata request received http status code: 403

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@brycahta brycahta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Had some minor comments/questions

cmd/node-termination-handler.go Outdated Show resolved Hide resolved
return metadata
}
err = json.NewDecoder(strings.NewReader(identityDoc)).Decode(&metadata)
if err != nil {
log.Log().Msg("Unable to fetch instance identity document from ec2 metadata")
log.Warn().Msg("Unable to fetch instance identity document from ec2 metadata")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Err(err) ?

  • Both comments are nit for sure, but thinking if there's a log within if err != nil then it should be error for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can make a strict rule like that. We have a fallback for this error, so I think it can stay a Warn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree that all loggin within err != nil should be error level, but we should come up with a consistent understanding of what these levels mean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed the rule can't be that strict, but if having a fallback doesn't classify as error, then what about:

ERR Unable to retrieve an IMDSv2 token, continuing with IMDSv1

v1 is the fallback in this case? Are there any formal docs for log-levels similar to semantic versioning or something? Agreed we need some kinda consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I changed that one to WARN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything formal for log levels... I think we probably log way too much in NTH, so maybe we should consider what actually makes sense to log so that someone can view the logs and be confident things are working and have enough information to debug certain error scenarios.

pkg/node/node.go Outdated Show resolved Hide resolved
pkg/node/node.go Show resolved Hide resolved
pkg/node/node.go Outdated Show resolved Hide resolved
pkg/node/node.go Outdated Show resolved Hide resolved
pkg/webhook/webhook.go Outdated Show resolved Hide resolved
@@ -186,7 +186,7 @@ func main() {
previousErr = err
}
if duplicateErrCount >= duplicateErrThreshold {
log.Log().Msg("Stopping NTH - Duplicate Error Threshold hit.")
log.Warn().Msg("Stopping NTH - Duplicate Error Threshold hit.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.panic since we're panicking ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but IMHO it's clearer when they're separate. I don't think I'd expect a log to panic if I was reading and I didn't know of that weirdness.

cmd/node-termination-handler.go Show resolved Hide resolved
return metadata
}
err = json.NewDecoder(strings.NewReader(identityDoc)).Decode(&metadata)
if err != nil {
log.Log().Msg("Unable to fetch instance identity document from ec2 metadata")
log.Warn().Msg("Unable to fetch instance identity document from ec2 metadata")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree that all loggin within err != nil should be error level, but we should come up with a consistent understanding of what these levels mean

Copy link
Contributor

@brycahta brycahta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants