feature discovery worker pod unable to connect to worker node 

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._

### 1. Quick Debug Checklist
- [ ] Are you running on an Ubuntu 18.04 node?
- [X] Are you running Kubernetes v1.13+?
- [X] Are you running Docker (>= 18.06) or CRIO (>= 1.13+)?
- [ ] Do you have `i2c_core` and `ipmi_msghandler` loaded on the nodes?
- [X] Did you apply the CRD (`kubectl describe clusterpolicies --all-namespaces`)

### 1. Issue or feature description
My nvidia-feature-discovery-worker pod is unable to connect with master pod. Fails with the error 
```
core]grpc: addrConn.createTransport failed to connect to {gpu-operator-1680394617-node-feature-discovery-master:8080 gpu-operator-1680394617-node-feature-discovery-master:8080 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.105.174.41:8080: connect: connection refused". Reconnecting...
```

### 2. Steps to reproduce the issue
execute the command
```
helm install --wait --generate-name  -n gpu-operator --create-namespace  nvidia/gpu-operator  --set driver.enabled=false --set toolkit.enabled=false --set operator.defaultRuntime=crio --set operator.cleanupCRD=true
```

### 3. Information to [attach](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/) (optional if deemed irrelevant)

 - [ ] kubernetes pods status: `kubectl get pods --all-namespaces`
 - [ ] kubernetes daemonset status: `kubectl get ds --all-namespaces`
 - [ ] If a pod/ds is in an error state or pending state `kubectl describe pod -n NAMESPACE POD_NAME`
 - [X] If a pod/ds is in an error state or pending state `kubectl logs -n NAMESPACE POD_NAME`
 [worker-pod-log.log](https://github.com/NVIDIA/gpu-operator/files/11130925/worker-pod-log.log)
 - [ ] Output of running a container on the GPU machine: `docker run -it alpine echo foo`
 - [ ] Docker configuration file: `cat /etc/docker/daemon.json`
 - [ ] Docker runtime configuration: `docker info | grep runtime`

 - [ ] NVIDIA shared directory: `ls -la /run/nvidia`
 - [ ] NVIDIA packages directory: `ls -la /usr/local/nvidia/toolkit`
 - [ ] NVIDIA driver directory: `ls -la /run/nvidia/driver`
 - [ ] kubelet logs `journalctl -u kubelet > kubelet.logs`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature discovery worker pod unable to connect to worker node #512

1. Quick Debug Checklist

1. Issue or feature description

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feature discovery worker pod unable to connect to worker node #512

Description

1. Quick Debug Checklist

1. Issue or feature description

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions