Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVC clean up job is not stable #301

Closed
sleshchenko opened this issue Mar 5, 2021 · 1 comment · Fixed by #304
Closed

PVC clean up job is not stable #301

sleshchenko opened this issue Mar 5, 2021 · 1 comment · Fixed by #304
Assignees
Labels
sprint/current Is assigned to issues which are planned to work on in the current team sprint
Milestone

Comments

@sleshchenko
Copy link
Member

sleshchenko commented Mar 5, 2021

There are two issues with PVC clean up job:

  1. Sometimes the created pod fails with:
Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2021-03-05T08:20:44Z" level=error msg="container_linux.go:366: starting container process caused: process_linux.go:472: container init caused: read init-p: connection reset by peer"

and continue fail with:

Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2021-03-05T08:17:20Z" level=warning msg="Timed out while waiting for StopUnit(crio-c7b61b00b6956b61c4dd78c2a311df3bc0d52ae4f81725cfb3c571cb32fbd48b.scope) completion signal from dbus. Continuing..." time="2021-03-05T08:18:06Z" level=error msg="container_linux.go:366: starting container process caused: process_linux.go:472: container init caused: "

The same with quay.io/libpod/busybox:1.30.1 and http://registry.access.redhat.com/ubi8-minimal:8.3-291
On my RHDPS OpenShift 4.6 or 4.7 is happens pretty often, 7 workspaces are stuck in finalizing phase:

kc get pod
NAME                                      READY   STATUS                 RESTARTS   AGE
cleanup-workspace0626cde384894b68-c6gkm   0/1     CreateContainerError   0          6m19s
cleanup-workspace434f901860ac4833-hgzm5   0/1     CreateContainerError   0          5m47s
cleanup-workspace4b001b2ad6a54bfc-snzt8   0/1     CreateContainerError   0          6m
cleanup-workspace524f63636daa4156-7cc56   0/1     CreateContainerError   0          5m52s
cleanup-workspace52e229018fcd40dc-tzz65   0/1     CreateContainerError   0          6m9s
cleanup-workspace82a2427ff8174376-jrd5b   0/1     Error                  0          5m37s
cleanup-workspace82a2427ff8174376-xzfcb   0/1     CreateContainerError   0          49s
  1. The second issue, after I updated busybox to ubi, jobs were recreated but old busybox pods where left on cluster as zoobies without ownerRef, maybe we need to set explicitly propagation strategy, when we remove job.
    ^ I'm not sure but I may have both (local and on cluster) controllers run and it may cause that failure.
@amisevsk
Copy link
Collaborator

amisevsk commented Mar 5, 2021

Regarding issue 2, I think it may be the propagation policy that's the issue -- I think k8s had a similar issue in the past, judging by kubernetes/kubernetes#71801. I've added this to the linked PR.

@sleshchenko sleshchenko added the sprint/current Is assigned to issues which are planned to work on in the current team sprint label Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sprint/current Is assigned to issues which are planned to work on in the current team sprint
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants