New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worker: add Destroy functionality #4784
Conversation
End of Tuesday notes:
|
2a87dd6
to
5dfa7e8
Compare
Here's a useful script to destroy all containers across all containerd namespaces: https://gist.github.com/deniseyu/55c573026a83e3e1b0011201bcf7b3fc 💥 |
End of Thursday notes (we should probably rename our branch 😭)
Still need to do:
|
fbe21a4
to
7251ebf
Compare
7251ebf
to
0dfe3c4
Compare
Going to split out |
d132766
to
f65bb43
Compare
End of Thursday notes: @pivotal-bin-ju and I finished building out the local integration test tooling and we're able to run and develop the tests in Docker now, woo. We thought about adding a test for Lookup, but decided that since Destroy implicitly calls Lookup, for now there's not much value in adding an integration test for it - this may change if we want to do more interesting things as part of Lookup. Next week we'll work on filling out the rest of the Garden client and container and process interfaces! Maybe once we get the process stuff working (can refer to Alex's spike branch) we'll be able to see a basic task running in the UI. |
- still need to actually invoke containerd's Destroy and add integration tests - add typed validation error for missing inputs Signed-off-by: Denise Yu <dyu@pivotal.io>
Signed-off-by: Denise Yu <dyu@pivotal.io>
- backend: lookup calls client's `GetContainer`, which wraps containerd's LoadContainer ps.: this functionality still lacks an integration test. Signed-off-by: Denise Yu <dyu@pivotal.io> Co-authored-by: Ciro S. Costa <cscosta@pivotal.io>
need to push in order to test integration tests on linux SQUASH ME LATER Signed-off-by: Denise Yu <dyu@pivotal.io>
Signed-off-by: Denise Yu <dyu@pivotal.io>
- add task fakes - start implementing finding & killing task, a prereq for removing the container Signed-off-by: Denise Yu <dyu@pivotal.io>
- This enables us to test the "happy path" where a task can be deleted before the context timeout triggers. This means that containerd was able to gracefully shut down a running task with SIGTERM. Signed-off-by: Bin Ju <bju@pivotal.io>
Reset task kill timeout to 10s Signed-off-by: Bin Ju <bju@pivotal.io> Signed-off-by: Denise Yu <dyu@pivotal.io>
f65bb43
to
932eaf0
Compare
return InputValidationError{} | ||
} | ||
|
||
container, err := b.client.GetContainer(ctx, handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk what containerd does to lookup containers but I feel like we should include a timeout for this context as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, that's true - we can add this in the next PR where the client timeout is configurable, we'll just decorate every context at the start of each method with the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added it as a to-do item here: #4783 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task runs don't have timeouts, but pretty much every other operation does :)
Signed-off-by: Bin Ju <bju@pivotal.io>
_, err = task.Delete(ctx) // todo: we're swallowing exitcodes in both these forks, do we care? | ||
return err | ||
case <-ctx.Done(): | ||
err = task.Kill(ctx, syscall.SIGKILL) // should return GRPC DeadlineExceeded error type, wrapped up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ctx has already expired. So task.Kill
may or may not be a noop. Does it make more sense to have another ctx?
|
||
select { | ||
case <-exitStatus: | ||
_, err = task.Delete(ctx) // todo: we're swallowing exitcodes in both these forks, do we care? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an edge case where the task ran to completion prior to being killed and that result gets swallowed currently.
@@ -148,7 +216,20 @@ func (b *Backend) BulkMetrics(handles []string) (metrics map[string]garden.Conta | |||
// | |||
// Errors: | |||
// * Container not found. | |||
func (b *Backend) Lookup(handle string) (container garden.Container, err error) { return } | |||
func (b *Backend) Lookup(handle string) (garden.Container, error) { | |||
ctx := namespaces.WithNamespace(context.Background(), b.namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense for this to be a context.TODO as we intend it to be replaced ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
} | ||
|
||
return b.client.Destroy(ctx, handle) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about to refactor the logic as below (move killTasks
into else branch, to have less return route path):
task, err := container.Task(ctxWithTimeout, nil)
if err != nil { //error occurs
if !errdefs.IsNotFound(err) { //a real error but the task can not be found
return ClientError { InnerError: err }
}
// do nothing if we could not find the task
} else {
// kill the task
err = killTasks(ctxWithTimeout, task)
if err != nil {
return ClientError{ InnerError: err }
}
}
// no matter the task exists or is killed, the container should be destroyed anyway.
err = b.client.Destroy(ctxWithTimeout, handle)
if err != nil {
return ClientError{ InnerError: err }
}
const maxTaskKillWaitTime = 10 * time.Second | ||
|
||
if handle == "" { | ||
return InputValidationError{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better if the handle check is at the first line of the function.?
} | ||
|
||
select { | ||
case <-exitStatus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should not ignore the exitCode. From the comment of the source code:
// ExitStatus encapsulates a process's exit status.
// It is used by `Wait()` to return either a process exit code or an error
It may not return the error immediately after we SIGTERM
the task.
_, err = task.Delete(ctx) // todo: we're swallowing exitcodes in both these forks, do we care? | ||
return err | ||
case <-ctx.Done(): | ||
err = task.Kill(ctx, syscall.SIGKILL) // should return GRPC DeadlineExceeded error type, wrapped up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need another wait after we SIGKILL
. the task.Kill
will return the status of GRPC call.
|
||
func (s *BackendSuite) TestLookupGetContainer() { | ||
s.client.GetContainerReturns(new(libcontainerdfakes.FakeContainer), nil) | ||
container, err := s.backend.Lookup("non-existent-handle") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-existent-handle
could be changed to some existing container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this test is bad right now -- related to the incomplete implementation of Lookup
. Will be sorted out later 😂
s.Equal(testNamespace, namespace) | ||
} | ||
|
||
func (s *BackendSuite) TestDestroyWaitError() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestDestroyWaitError -> TestDestroyTaskWaitError
|
||
err := s.backend.Destroy("some-handle") | ||
|
||
s.Equal(2, fakeTask.KillCallCount()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For correctness, assert that syscall.SIGKILL was passed to fakeTask.Kill
container containerd.Container, err error, | ||
) | ||
|
||
// Destroy stops any running tasks on a container and remove the container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: removes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some inline comments
thanks @xtreme-sameer-vohra for the inline comments! We implemented your suggestions but we're going to fold them into the top of this branch: #4881 because in the last 2 days we restructured the backend and I'm a coward for merge conflicts 🏋️♀️ |
Implemented changes in branch at #4881
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted with Denise and Bin, suggestions for this PR will be incorporated in https://github.com/concourse/concourse/pull/4881/files
Why do we need this PR?
As part of #4783, this PR adds the
functionality required to get garden's
Destroy
working with containerd.Changes proposed in this pull request
Lookup integrationadd Dockerfile to help integration tests run on OSXsplit out to separate story Create a dev Dockerfile to support running integration tests on any OS #4850Contributor Checklist
Updated documentation (located at https://github.com/concourse/docs)Updated release notes (located at https://github.com/concourse/concourse/tree/master/release-notes)Reviewer Checklist
Documentation reviewedRelease notes reviewedNew config flags added?