Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get container exit code from event if we can't get it from inspect. #2940

Merged
merged 1 commit into from Jul 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
28 changes: 28 additions & 0 deletions agent/dockerclient/dockerapi/docker_client.go
Expand Up @@ -64,6 +64,10 @@ const (
maxHealthCheckOutputLength = 1024
// VolumeDriverType is one of the plugin capabilities see https://docs.docker.com/engine/reference/commandline/plugin_ls/#filtering
VolumeDriverType = "volumedriver"
// dockerContainerDieEvent is the name of the event generated by Docker when a container died.
dockerContainerDieEvent = "die"
// dockerContainerEventExitCodeAttribute is the attribute name to get exit code from Docker event attribute.
dockerContainerEventExitCodeAttribute = "exitCode"
)

// Timelimits for docker operations enforced above docker
Expand Down Expand Up @@ -1015,6 +1019,10 @@ func (dg *dockerGoClient) handleContainerEvents(ctx context.Context,
}

metadata := dg.containerMetadata(ctx, containerID)
// In case when we received a container die event but was not able to inspect the container (e.g. due to timeout),
// we will use the exit code from the event, so that the exit code of the container is still reported and
// available for customer to see from describing task.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if exit code is set here, do we still need the inspect later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, because we need to get the exitAt timestamp https://github.com/aws/amazon-ecs-agent/blob/master/agent/api/container/container.go#L310. this is shown in the metadata, and is only available by inspecting the container (see https://github.com/moby/moby/blob/master/daemon/monitor.go#L48, the exit code is stored in event but exitAt is not). otherwise i would have remove the inspect 😞

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. if inspect fails, do we still show the error in task stopped reason even when container exits fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we will show the error in the reason field. see the Testing section in pr description for an example

setExitCodeFromEvent(event, &metadata)

changedContainers <- DockerContainerChangeEvent{
Status: status,
Expand All @@ -1024,6 +1032,26 @@ func (dg *dockerGoClient) handleContainerEvents(ctx context.Context,
}
}

// setExitCodeFromEvent tries to get exit code from event and stores it in metadata, if metadata doesn't
// contain the exit code already.
func setExitCodeFromEvent(event *events.Message, metadata *DockerContainerMetadata) {
// exit code is only available from die event.
if metadata.ExitCode != nil || event.Status != dockerContainerDieEvent {
return
}
exitCode, ok := event.Actor.Attributes[dockerContainerEventExitCodeAttribute]
if !ok {
return
}
code, err := strconv.Atoi(exitCode)
if err != nil {
seelog.Errorf("Received invalid exit code from docker container event. exit code: %s, parse err: %v",
exitCode, err)
return
}
metadata.ExitCode = &code
}

// ListContainers returns a slice of container IDs.
func (dg *dockerGoClient) ListContainers(ctx context.Context, all bool, timeout time.Duration) ListContainersResponse {
ctx, cancel := context.WithTimeout(ctx, timeout)
Expand Down
68 changes: 68 additions & 0 deletions agent/dockerclient/dockerapi/docker_client_test.go
Expand Up @@ -995,6 +995,74 @@ func TestContainerEventsError(t *testing.T) {
}
}

func TestSetExitCodeFromEvent(t *testing.T) {
var (
exitCodeInt = 42
exitCodeStr = "42"
altExitCodeInt = 1
)

defaultEvent := &events.Message{
Status: dockerContainerDieEvent,
Actor: events.Actor{
Attributes: map[string]string{
dockerContainerEventExitCodeAttribute: exitCodeStr,
},
},
}

testCases := []struct {
name string
event *events.Message
metadata DockerContainerMetadata
expectedExitCode *int
}{
{
name: "exit code set from event",
event: defaultEvent,
metadata: DockerContainerMetadata{},
expectedExitCode: &exitCodeInt,
},
{
name: "exit code not set from event when metadata already has it",
event: defaultEvent,
metadata: DockerContainerMetadata{
ExitCode: &altExitCodeInt,
},
expectedExitCode: &altExitCodeInt,
},
{
name: "exit code not set from event when event does not has it",
event: &events.Message{
Status: dockerContainerDieEvent,
Actor: events.Actor{},
},
metadata: DockerContainerMetadata{},
expectedExitCode: nil,
},
{
name: "exit code not set from event when event has invalid exit code",
event: &events.Message{
Status: dockerContainerDieEvent,
Actor: events.Actor{
Attributes: map[string]string{
dockerContainerEventExitCodeAttribute: "invalid",
},
},
},
metadata: DockerContainerMetadata{},
expectedExitCode: nil,
},
}

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
setExitCodeFromEvent(tc.event, &tc.metadata)
assert.Equal(t, tc.expectedExitCode, tc.metadata.ExitCode)
})
}
}

func TestDockerVersion(t *testing.T) {
mockDockerSDK, client, _, _, _, done := dockerClientSetup(t)
defer done()
Expand Down