Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent resetting valid agent state db when IMDS fails on startup #3509

Merged
merged 1 commit into from
Dec 8, 2022

Conversation

sparrc
Copy link
Contributor

@sparrc sparrc commented Dec 7, 2022

Summary

This is a followup to #2861

In the rare occasion that IMDS fails multiple times (also bumping this multiple from 3 to 5), then we should try to avoid resetting the agent state db if we have an existing agent state.

This is to prevent a situation like this from occurring, since resetting the state db will result in a duplicate container instance on the same ec2 instance id, and will also result in some tasks being dropped by the agent.

level=warn time=2022-11-23T15:43:00Z msg="Unable to access EC2 Metadata service to determine EC2 ID: RequestError: send request failed\ncaused by: Get \"http://169.254.169.254/latest/meta-data/instance-id\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" module=agent.go
level=warn time=2022-11-23T15:45:47Z msg="Data mismatch; saved InstanceID 'i-030e42d5588188300' does not match current InstanceID ''. Overwriting old datafile" module=agent.go

Implementation details

currentEC2InstanceID will fallback to the instance ID saved in agent's state db if IMDS fails to return an ec2 instance id.

Testing

New tests cover the changes: no

existing functional tests

Description for the changelog

Bugfix: Prevent resetting valid agent state db when IMDS fails on startup

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sparrc sparrc requested a review from a team as a code owner December 7, 2022 19:07
@sparrc sparrc force-pushed the imds-instance-id-state-reset branch from f8a15f3 to 93b2324 Compare December 7, 2022 19:29
@@ -602,7 +602,10 @@ func TestMetricsDisabled(t *testing.T) {
published <- struct{}{}
}).Return(nil).MinTimes(1)

go cs.Serve()
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this change as part of this PR?(was is it an intended change?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry I meant to comment on this before sending for review.

This test randomly failed on me once and I was trying to debug, but I couldn't figure anything out without knowing what happened to this Serve() function call.

Afterwards I couldnt reproduce. If it's just very slightly flakey, at least in the future we can see if this function failed so I thought we should keep it in, since it's harmless to check this in a unit test anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@sparrc sparrc merged commit 690ba00 into aws:dev Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants