Datadog Cluster Agent | Containerized environments
This is the technical documentation for the Datadog Cluster Agent image, available here.
Pre-requisites for the Datadog Cluster Agent
Review the RBAC files in the manifests folder to get the full scope of the requirements. These manifests create a Service Account, a Cluster Role with a restricted scope and actions detailed below and a Cluster Role Binding as well.
API Server requirements
Componenentstatusesto produce the controle plane service checks.
datadogtokento update and query the most up to date version token corresponding to the latest event stored in ETCD.
Servicesto perform the Autodiscovery based off of services activity.
Endpointsto run cluster level health checks.
horizontalpodautoscalersto run the AutoscalersController and serve custom metrics.
To store the
event.tokenKey and the
event.tokenTimestamp, deploy your ConfigMap in the same namespace as the cluster agent with the name
datadogtoken, namespace can be configured otherwise with
For this, run
kubectl create configmap datadogtoken --from-literal="event.tokenKey"="0" .
NB:Set any resversion here, make sure it's not set to a value superior to the actual current resversion.
If not present, set the
event.tokenTimestamp, it is automatically set.
Deploying the Datadog Cluster Agent
Communication with the Datadog Node Agent
For the Datadog Cluster Agent to communicate with the Node Agent, you need to share an authentication token between the two agents.
The token needs to be greater or equal to 32 characters and should only have upper case or lower case letters and numbers.
You can pass the token as an environment variable:
Besides this token, you need to set the
DD_CLUSTER_AGENT_ENABLED=true in the manifest of the Datadog Node Agent.
Running the Datadog Cluster Agent with Kubernetes
We strongly recommend using a secret to authenticate communication between Agents with the Datadog Cluster Agent. You must modify the value of the secret in the dca-secret.yaml then create it:
kubectl create -f Dockerfiles/manifests/cluster-agent/dca-secret.yaml
kubectl get secret datadog-auth-token NAME TYPE DATA AGE datadog-auth-token Opaque 1 16s
If you are running the Datadog Node Agent 6.4.2+, to deploy the Datadog Cluster Agent you need to:
- Deploy the datadog-cluster-agent_service.yaml
- Configure and create the dca-secret.yaml
- Configure (DD_API_KEY and other options) the the cluster-agent.yaml and deploy it
- Specify the required options to ensure the communication between the Datadog Cluster Agent and the Datadog Node Agent.
- Apply the new configuration to the Datadog Node Agent DaemonSet and redeploy the DaemonSet.
- Once the Datadog Cluster Agent and the Datadog Node Agents are running, you can run the
agent statuscommand to confirm the successful communication.
Refer to the Datadog Cluster Agent troubleshooting section for more information.
Command line interface of the Datadog Cluster Agent
The available commands for the Datadog Cluster Agents are:
datadog-cluster-agent status: Gives an overview of the components of the agent and their health.
datadog-cluster-agent metamap [nodeName]: Queries the local cache of the mapping between the pods living on
nodeNameand the cluster level metadata it's associated with (endpoints ...). Not specifying the
nodeNamewill run the mapper on all the nodes of the cluster.
datadog-cluster-agent flare [caseID]: Similarly to the node agent, the cluster agent can aggregate the logs and the configurations used and forward an archive to the support team or be deflated and used locally.
In order to collect events, you need the following environment variables:
- name: DD_COLLECT_KUBERNETES_EVENTS value: "true" - name: DD_LEADER_ELECTION value: "true"
Enabling the leader election will ensure that only one agent collects the events.
Cluster metadata provider
Ensure the Node Agents and the Datadog Cluster Agent can properly communicate. Create a service in front of the Datadog Cluster Agent. Ensure an auth_token is properly shared between the agents. Confirm the RBAC rules are properly set.
In the Node Agent, set the env var
DD_CLUSTER_AGENT_ENABLED to true.
The env var
DD_KUBERNETES_METADATA_TAG_UPDATE_FREQ can be set to specify how often the Node Agents hit the Datadog Cluster Agent.
You can disable the Kubernetes metadata tag collection with
Custom Metrics Server
The Datadog Cluster Agent implements the External Metrics Provider's interface (currently in beta). Therefore it can serve Custom Metrics to Kubernetes for Horizontal Pod Autoscalers. It is referred throughout the documentation as the Custom Metrics Server, per Kubernetes' terminology.
To enable the Custom Metrics Server:
truein the Deployment of the Datadog Cluster Agent.
- Configure the
<DD_APP_KEY>as well as the
<DD_API_KEY>in the Deployment of the Datadog Cluster Agent.
- Create a service exposing the port 443 and register it as an APIService for External Metrics.
Refer to the dedicated guide to configure the Custom Metrics Server and get more details about this feature.
The following environment variables are supported:
DD_API_KEY- required - your Datadog API key.
DD_HOSTNAME: hostname to use for the Datadog Cluster Agent.
DD_CLUSTER_AGENT_CMD_PORT: port for the Datadog Cluster Agent to serve, default is
DD_USE_METADATA_MAPPER: enables the cluster level metadata mapping, default is
DD_COLLECT_KUBERNETES_EVENTS- configures the agent to collect Kubernetes events. Default to
false. See the Event collection section for more details.
DD_LEADER_ELECTION: activates the leader election. You must set
trueto activate this feature. Default value is
DD_LEADER_LEASE_DURATION: used only if the leader election is activated. See the details here. Value in seconds, 60 by default.
DD_CLUSTER_AGENT_AUTH_TOKEN: 32 characters long token that needs to be shared between the node agent and the Datadog Cluster Agent.
DD_KUBE_RESOURCES_NAMESPACE: configures the namespace where the Cluster Agent creates the configmaps required for the Leader Election, the Event Collection (optional) and the Horizontal Pod Autoscaling.
DD_KUBERNETES_INFORMERS_RESYNC_PERIOD: frequency in seconds to query the API Server to resync the local cache. The default is 5 minutes.
DD_KUBERNETES_INFORMERS_RESTCLIENT_TIMEOUT: timeout in seconds of the client communicating with the API Server. Default is 60 seconds.
DD_METRICS_PORT: change the port for exposing metrics from the Datadog Cluster Agent. The default is port 5000.
DD_EXTERNAL_METRICS_PROVIDER_BATCH_WINDOW: time waited in seconds to process a batch of metrics from multiple Autoscalers. Default to 10 seconds.
DD_EXTERNAL_METRICS_PROVIDER_MAX_AGE: maximum age in seconds of a datapoint before considering it invalid to be served. Default to 120 seconds.
DD_EXTERNAL_METRICS_AGGREGATOR: aggregator for the Datadog metrics. Applies to all Autoscalers processed. Chose among [sum/avg/max/min]
DD_EXTERNAL_METRICS_PROVIDER_BUCKET_SIZE: size of the window in seconds used to query metric from Datadog. Default to 300 seconds.
DD_EXTERNAL_METRICS_PROVIDER_LOCAL_COPY_REFRESH_RATE: rate to resync local cache of processed metrics with the global store. Useful when there are several replicas of the Cluster Agent.
DD_PROXY_HTTP: an http URL to use as a proxy for
DD_PROXY_HTTPS: an http URL to use as a proxy for
DD_PROXY_NO_PROXY: a space-separated list of URLs for which no proxy should be used.
How to build it
The Datadog Cluster Agent is designed to be used in a containerized ecosystem.
Start by creating the binary by running
inv -e cluster-agent.build from the
datadog-agent package. This will add a binary in
Then from the current folder, run
inv -e cluster-agent.image-build.