Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reopen Docker event stream when getting EOF error #2240

Merged
merged 2 commits into from
Oct 18, 2019

Conversation

fenxiong
Copy link
Contributor

@fenxiong fenxiong commented Oct 16, 2019

Summary

Fix #591.

Previously, agent did not reopen docker event stream when it gets EOF/UnexpectedEOF error. When fsouza/go-dockerclient was used, we had this behavior because fsouza/go-dockerclient can't reopen the stream upon EOF/UnexpectedEOF due to some of its own implementation details (see fsouza/go-dockerclient#171) and we did not retry in our layer. When migrated to docker/docker/client, this behavior was maintained.

In the case that Docker is restarted (and container live-restore is enabled), agent's event stream listening goroutine gets an io.UnexpectedEOF and finishes, and we never open the stream again and as a result loss all Docker events until the agent is restarted. So making this changes to reopen Docker event stream when getting EOF error.

Implementation details

Few changes in ContainerEvents.

Testing

Updated the TestContainerEventsEOFError to conform with new behavior.

Manually tested with the steps to reproduce described in #591 and verified that it's fixed.

Description for the changelog

Fixed a bug where when Docker container live restore is enabled, agent stops listening to Docker container event after Docker restarts #2240.

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@fenxiong fenxiong requested review from adnxn and a team October 17, 2019 16:42
@shubham2892
Copy link
Contributor

Is error io.UnexpectedEOF only expected when docker restarts, are there any other cases when we get the same error, that we need to handle differently?

@fenxiong
Copy link
Contributor Author

Is error io.UnexpectedEOF only expected when docker restarts, are there any other cases when we get the same error, that we need to handle differently?

It can happen as long as the connection to docker socket is closed. IMO the agent should always attempt to reopen Docker event stream, since we rely on that to detect changes from customer's container. I can't think of a situation where we want the agent to stop listening to Docker events other than the case that the task engine is shut down - in that case the context will be canceled and we will stop listening.

Copy link
Contributor

@adnxn adnxn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm, non blocking but would be nice to test more exhaustively as i mentioned offline.

@fierlion
Copy link
Member

These changes look good to me.

@fenxiong fenxiong merged commit 4320cde into aws:dev Oct 18, 2019
@fenxiong fenxiong deleted the test-socket branch October 18, 2019 17:13
@fenxiong fenxiong added this to the 1.32.1 milestone Oct 22, 2019
@adnxn adnxn modified the milestones: 1.32.1, 1.33.0 Oct 23, 2019
@shubham2892 shubham2892 modified the milestones: 1.33.0, 1.32.1 Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants