nvidia-container-toolkit-daemonset-hrncd goes in CrashLoopBackOff after a Completed state indefinitely

### 1. Issue or feature description

**nvidia-container-toolkit-daemonset-hrncd** goes in _CrashLoopBackOff_ after a _Completed_ state indefinitely.

### 2. Steps to reproduce the issue

Create a cluster with _Ubuntu 20.04_, _Kubectl serveur version 1.23.13_, _Containerd version 1.6.10_

Install **nvidia-gpu-operator**:
```
helm install --wait --generate-name      -n gpu-operator --create-namespace      nvidia/gpu-operator
NAME: gpu-operator-1670258596
LAST DEPLOYED: Mon Dec  5 17:43:19 2022
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
```
**nvidia-container-toolkit-daemonset-hrncd** goes in _CrashLoopBackOff_ after a _Completed_ state (due to _Daemonset_).
During my investigation on this I can see that the container change the config in `/etc/containerd/config.toml`, and restart _containerd_. My understanding is that it should be looping in `Waiting for signal` but it goes for a cleanup of the configuration. I can see the _containerd_ service been restarted 3 times during the running phase. I don't know what's happening, even by looking in the code I don't understand why he receives a signal (Idk from where either).
https://github.com/NVIDIA/nvidia-container-toolkit/blob/a9fb7a4a8807f1fa7e09e63138a928a410bb15a7/tools/container/nvidia-toolkit/run.go#L261

Do you have any clue on this ? Ty ! 

### 3. Information to [attach](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/) (optional if deemed irrelevant)

 - [ ] nvidia-container-toolkit-ctr log`
 
[nvidia-container-toolkit-ctr.txt](https://github.com/NVIDIA/gpu-operator/files/10156482/nvidia-container-toolkit-ctr.txt)

 - [ ] kubernetes pods status: `kubectl get pods --all-namespaces`

[pods_overview.txt](https://github.com/NVIDIA/gpu-operator/files/10156485/pods_overview.txt)

 - [ ] Containerd configuration file: `cat /etc/containerd/config.toml`
 
[coontainerd_config.txt](https://github.com/NVIDIA/gpu-operator/files/10156492/coontainerd_config.txt)

 - [ ] NVIDIA shared directory: `ls -la /run/nvidia`
 
[nvidia_shared_directory.txt](https://github.com/NVIDIA/gpu-operator/files/10156496/nvidia_shared_directory.txt)

 - [ ] NVIDIA packages directory: `ls -la /usr/local/nvidia/toolkit`
 
[NVIDIA_packages_directory.txt](https://github.com/NVIDIA/gpu-operator/files/10156500/NVIDIA_packages_directory.txt)

 - [ ] NVIDIA driver directory: `ls -la /run/nvidia/driver`
 
[NVIDIA_driver_directory.txt](https://github.com/NVIDIA/gpu-operator/files/10156504/NVIDIA_driver_directory.txt)

 - [ ] kubelet logs `journalctl -u kubelet > kubelet.logs`
 
[kubelet_logs.txt](https://github.com/NVIDIA/gpu-operator/files/10156505/kubelet_logs.txt)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvidia-container-toolkit-daemonset-hrncd goes in CrashLoopBackOff after a Completed state indefinitely #456

1. Issue or feature description

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nvidia-container-toolkit-daemonset-hrncd goes in CrashLoopBackOff after a Completed state indefinitely #456

Description

1. Issue or feature description

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions