Skip to content
This repository has been archived by the owner on Nov 14, 2023. It is now read-only.

Constant "Error fetching info for pid" errors when run alongside docker containers #37

Closed
fotinakis opened this issue Sep 29, 2016 · 9 comments
Labels
Milestone

Comments

@fotinakis
Copy link

Attempting to run dd-agent on a host machine which also runs docker containers, I've found that /var/log/datadog/collector.log ends up full of these logs:

2016-09-29 01:46:05 UTC | WARNING | dd.collector | checks.collector(collector.py:774) | GOHAI LOG | Error fetching info for pid 4021: user: unknown userid 9999
Error fetching info for pid 4028: user: unknown userid 9999
Error fetching info for pid 4035: user: unknown userid 9999
Error fetching info for pid 4042: user: unknown userid 9999
Error fetching info for pid 4049: user: unknown userid 9999
Error fetching info for pid 4056: user: unknown userid 9999
Error fetching info for pid 4063: user: unknown userid 9999
Error fetching info for pid 4070: user: unknown userid 9999
...

There is a user inside the docker containers with uid 9999, so I assume that this is erroring because the user does not exist on the host itself.

@remh
Copy link
Contributor

remh commented Nov 18, 2016

Thanks for the feedback @fotinakis
We'll work on a fix.

@masci masci added this to the Triage milestone Nov 30, 2016
@masci masci added the bug label Nov 30, 2016
@szymonpk
Copy link

Is any workaround for this? or any ETA? I know we could create user with id 9999, but it doesn't look right.

@creatorzim
Copy link

Also experiencing this issue.

@nzlosh
Copy link

nzlosh commented Mar 18, 2017

This issue may not be limited to docker. I'm seeing the same type of error with datadog+lxc containers.

Mar 18 06:05:15 127.0.0.1 dd.collector[180994]: WARNING (collector.py:774): GOHAI LOG | Error fetching info for pid 22462: user: unknown userid 998
Error fetching info for pid 24647: user: unknown userid 121
Error fetching info for pid 24649: user: unknown userid 121
Error fetching info for pid 24650: user: unknown userid 121
Error fetching info for pid 24651: user: unknown userid 121
Error fetching info for pid 24652: user: unknown userid 121
Error fetching info for pid 24653: user: unknown userid 121
Error fetching info for pid 24654: user: unknown userid 10000
Error fetching info for pid 24681: user: unknown userid 10000
Error fetching info for pid 24684: user: unknown userid 10000
Error fetching info for pid 24687: user: unknown userid 10000
Error fetching info for pid 24688: user: unknown userid 10000
Error fetching info for pid 24689: user: unknown userid 10000
Error fetching info for pid 24690: user: unknown userid 10000
Error fetching info for pid 24697: user: unknown userid 10000
Error fetching info for pid 24698: user: unknown userid 10000
Error fetching info for pid 24699: user: unknown userid 10000
Error fetching info for pid 24700: user: unknown userid 10000
Error fetching info for pid 24701: user: unknown userid 10000
...
# dpkg -l | egrep 'docker|lxc|datadog'
ii  datadog-agent                      1:5.9.1-1                       amd64        Datadog Monitoring Agent
ii  liblxc1                            2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools (library)
ii  lxc                                2.0.7-0ubuntu1~16.04.2          all          Transitional package for lxc1
ii  lxc-common                         2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools (common tools)
ii  lxc-templates                      2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools (templates)
ii  lxc1                               2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools
ii  python-lxc                         0.1-0ubuntu6                    amd64        Linux container userspace tools (Python 2.x bindings)
ii  python3-lxc                        2.0.7-0ubuntu1~16.04.2          amd64        Linux Containers userspace tools (Python 3.x bindings)

A workaround would be welcome.

@szymonpk
Copy link

@nzlosh Creating user with specific id helps. However, I would expect fix from DD.

@parity3
Copy link

parity3 commented Aug 3, 2018

This bug has apparently been carried over to version 1.6 of datadog-agent. There must be a workaround but I would expect it to be posted in a comment on this open issue. Will continue googling.

@eraac
Copy link

eraac commented Feb 6, 2019

Same issue on kubernetes using datadog helm chart

values.yml # Default values for datadog. image: # This chart is compatible with different images, please choose one #repository: datadog/agent # Agent6 repository: datadog/dogstatsd # Standalone DogStatsD6 tag: 6.9.0 # Use 6.9.0-jmx to enable jmx fetch collection pullPolicy: IfNotPresent ## It is possible to specify docker registry credentials ## See https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod # pullSecrets: # - name: regsecret # NB! Normally you need to keep Datadog DaemonSet enabled! # The exceptional case could be a situation when you need to run # single DataDog pod per every namespace, but you do not need to # re-create a DaemonSet for every non-default namespace install. # Note, that StatsD and DogStatsD work over UDP, so you may not # get guaranteed delivery of the metrics in Datadog-per-namespace setup! daemonset: enabled: false ## Bind ports on the hostNetwork. Useful for CNI networking where hostPort might ## not be supported. The ports will need to be available on all hosts. Can be ## used for custom metrics instead of a service endpoint. ## WARNING: Make sure that hosts using this are properly firewalled otherwise ## metrics and traces will be accepted from any host able to connect to this host. # useHostNetwork: true ## Sets the hostPort to the same value of the container port. Needs to be used ## to receive traces in a standard APM set up. Can be used as for sending custom metrics. ## The ports will need to be available on all hosts. ## WARNING: Make sure that hosts using this are properly firewalled otherwise ## metrics and traces will be accepted from any host able to connect to this host. # useHostPort: true ## Run the agent in the host's PID namespace. This is required for Dogstatsd origin ## detection to work. See https://docs.datadoghq.com/developers/dogstatsd/unix_socket/ useHostPID: true ## Annotations to add to the DaemonSet's Pods # podAnnotations: # scheduler.alpha.kubernetes.io/tolerations: '[{"key": "example", "value": "foo"}]' ## Allow the DaemonSet to schedule on tainted nodes (requires Kubernetes >= 1.6) # tolerations: [] ## Allow the DaemonSet to schedule on selected nodes # Ref: https://kubernetes.io/docs/user-guide/node-selection/ # nodeSelector: {} ## Allow the DaemonSet to schedule ussing affinity rules # Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity # affinity: {} ## Allow the DaemonSet to perform a rolling update on helm update ## ref: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/ # updateStrategy: RollingUpdate ## Sets PriorityClassName if defined # priorityClassName: # Apart from DaemonSet, deploy Datadog agent pods and related service for # applications that want to send custom metrics. Provides DogStasD service. # # HINT: If you want to use datadog.collectEvents, keep deployment.replicas set to 1. deployment: enabled: true replicas: 1 # Affinity for pod assignment # Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity affinity: {} # Tolerations for pod assignment # Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ tolerations: [] # If you're using a NodePort-type service and need a fixed port, set this parameter. # dogstatsdNodePort: 8125 # traceNodePort: 8126 service: type: ClusterIP annotations: {} ## Sets PriorityClassName if defined # priorityClassName: ## deploy the kube-state-metrics deployment ## ref: https://github.com/kubernetes/charts/tree/master/stable/kube-state-metrics kubeStateMetrics: enabled: false rbac: create: false # This is the new cluster agent implementation that handles cluster-wide # metrics more cleanly, separates concerns for better rbac, and implements # the external metrics API so you can autoscale HPAs based on datadog # metrics clusterAgent: containerName: cluster-agent image: repository: datadog/cluster-agent tag: 1.1.0 pullPolicy: IfNotPresent enabled: false ## This needs to be at least 32 characters a-zA-z ## It is a preshared key between the node agents and the cluster agent token: "" replicas: 1 ## Enable the metricsProvider to be able to scale based on metrics in Datadog metricsProvider: enabled: false resources: requests: cpu: 200m memory: 256Mi limits: cpu: 200m memory: 256Mi ## Override the agent's liveness probe logic from the default: ## In case of issues with the probe, you can disable it with the ## following values, to allow easier investigating: # livenessProbe: # exec: # command: ["/bin/true"] ## Override the cluster-agent's readiness probe logic from the default: # readinessProbe: datadog: ## You'll need to set this to your Datadog API key before the agent will run. ## ref: https://app.datadoghq.com/account/settings#agent/kubernetes ## apiKey: xxxxxxxxxx ## You can modify the security context used to run the containers by ## modifying the label type below: # securityContext: # seLinuxOptions: # seLinuxLabel: "spc_t" ## Use existing Secret which stores API key instead of creating a new one # apiKeyExistingSecret: ## If you are using clusterAgent.metricsProvider.enabled = true, you'll need ## a datadog app key for read access to the metrics # appKey: ## Use existing Secret which stores APP key instead of creating a new one # appKeyExistingSecret: ## Daemonset/Deployment container name ## See clusterAgent.containerName if clusterAgent.enabled = true ## name: datadog # The site of the Datadog intake to send Agent data to. # Defaults to 'datadoghq.com', set to 'datadoghq.eu' to send data to the EU site. # site: datadoghq.com # The host of the Datadog intake server to send Agent data to, only set this option # if you need the Agent to send data to a custom URL. # Overrides the site setting defined in "site". # dd_url: https://app.datadoghq.com ## Set logging verbosity. ## ref: https://github.com/DataDog/docker-dd-agent#environment-variables ## Note: For Agent6 (image `datadog/agent`) the valid log levels are ## trace, debug, info, warn, error, critical, and off ## logLevel: INFO ## Un-comment this to make each node accept non-local statsd traffic. ## ref: https://github.com/DataDog/docker-dd-agent#environment-variables ## # nonLocalTraffic: true ## Enable container runtime socket volume mounting useCriSocketVolume: true ## Set host tags. ## ref: https://github.com/DataDog/docker-dd-agent#environment-variables ## # tags: ## Enables event collection from the kubernetes API ## ref: https://github.com/DataDog/docker-dd-agent#environment-variables ## collectEvents: false ## Enables log collection ## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup ## # logsEnabled: false # logsConfigContainerCollectAll: false ## Un-comment this to enable APM and tracing, on port 8126 ## ref: https://github.com/DataDog/docker-dd-agent#tracing-from-the-host ## # apmEnabled: true ## Un-comment this to enable live process monitoring ## ref: https://docs.datadoghq.com/graphing/infrastructure/process/#kubernetes-daemonset ## # processAgentEnabled: true ## The dd-agent supports many environment variables ## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#environment-variables ## # env: # - name: # value: ## The dd-agent supports detailed process and container monitoring and ## requires control over the volume and volumeMounts for the daemonset ## or deployment. ## ref: https://docs.datadoghq.com/guides/process/ ## # volumes: # - hostPath: # path: /etc/passwd # name: passwd # volumeMounts: # - name: passwd # mountPath: /etc/passwd # readOnly: true ## Enable leader election mechanism for event collection ## # leaderElection: false ## Set the lease time for leader election ## # leaderLeaseDuration: 600 ## Provide additional check configurations (static and Autodiscovery) ## Each key will become a file in /conf.d ## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#optional-volumes ## ref: https://docs.datadoghq.com/agent/autodiscovery/ ## # confd: # redisdb.yaml: |- # init_config: # instances: # - host: "name" # port: "6379" # kubernetes_state.yaml: |- # ad_identifiers: # - kube-state-metrics # init_config: # instances: # - kube_state_url: http://%%host%%:8080/metrics ## Provide additional custom checks as python code ## Each key will become a file in /checks.d ## ref: https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#optional-volumes ## # checksd: # service.py: |- ## Path to the container runtime socket (if different from Docker) ## This is supported starting from agent 6.6.0 # criSocketPath: /var/run/containerd/containerd.sock ## Provide a mapping of Kubernetes Labels to Datadog Tags # podLabelsAsTags: # app: kube_app # release: helm_release ## Provide a mapping of Kubernetes Annotations to Datadog Tags # podAnnotationsAsTags: # iam.amazonaws.com/role: kube_iamrole ## Override the agent's liveness probe logic from the default: ## In case of issues with the probe, you can disable it with the ## following values, to allow easier investigating: # livenessProbe: # exec: # command: ["/bin/true"] ## datadog-agent resource requests and limits ## Make sure to keep requests and limits equal to keep the pods in the Guaranteed QoS class ## Ref: http://kubernetes.io/docs/user-guide/compute-resources/ ## resources: requests: cpu: 200m memory: 256Mi limits: cpu: 200m memory: 256Mi rbac: ## If true, create & use RBAC resources create: false ## Ignored if rbac.create is true serviceAccountName: default tolerations: [] kube-state-metrics: rbac: create: false ## Ignored if rbac.create is true serviceAccountName: default
error Output
│ 2019-02-06 12:37:27 UTC | INFO | (log.go:473 in func1) | Config will be read from env variables                                                                                                                                                                              │
│ 2019-02-06 12:37:27 UTC | INFO | (forwarder.go:154 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://6-9-0-app.agent.datadoghq.com" (1 api key(s))                                                                                     │
│ 2019-02-06 12:37:28 UTC | INFO | (processes.go:16 in getProcesses) | Error fetching info for pid 1: user: lookup userid 0: no such file or directory                                                                                                                         │
│ 2019-02-06 12:37:28 UTC | ERROR | (gohai.go:40 in getGohaiInfo) | Failed to retrieve filesystem metadata: df failed to collect filesystem data: exit status 1                                                                                                                │
│ 2019-02-06 12:37:28 UTC | INFO | (serializer.go:246 in SendMetadata) | Sent host metadata payload, size: 1857 bytes.                                                                                                                                                         │
│ 2019-02-06 12:37:28 UTC | INFO | (udp.go:76 in Listen) | dogstatsd-udp: starting to listen on :8125                                                                                                                                                                          │
│ 2019-02-06 12:37:28 UTC | INFO | (transaction.go:193 in Process) | Successfully posted payload to "https://6-9-0-app.agent.datadoghq.com/intake/?api_key=*************************xxxxx", the agent will only log transaction success every 20 transactions                  │
│ 2019-02-06 12:38:01 UTC | INFO | (main.go:219 in start) | See ya!      

And the pod has the status CrashLoopBackOff

@ringerc
Copy link

ringerc commented Oct 26, 2023

Causes lots of noise in DataDog otel exporter see open-telemetry/opentelemetry-collector-contrib#14186 .

@pgimalac
Copy link
Contributor

Hello, this repository has been archived, the gohai library now lives in https://github.com/DataDog/datadog-agent/pkg/gohai, feel free to re-open your issue in https://github.com/DataDog/datadog-agent if it is still relevant.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

10 participants