Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We found 3 issues while testing failure scenarios in
containerd
i.e. Ifcontainerd
goes down (or restarts), how doesnomad-driver-containerd
handles that situation:Issues:
When
containerd-driver
makes a gRPC call tocontainerd
e.g. duringfingerprinting operation
, a context timeout must be set. Right now, without the timeout, ifcontainerd
goes down, that call never returns leavingcontainerd-driver
in a hung state.handleWait() throwing a nil pointer exception: Sending an
empty return
tonomad client
results in nomad client dereferencing a nil pointer which results in a nil pointer exception.Issue in recovering task if nomad/nomad-driver-containerd restarts. This might also be related to the networking error we are observing
This PR addresses (1) and (2). I will open a separate PR for Issue (3).