Collect logs and store in ClickHouse using Vector.
Installs an Vector agent as a StatefulSet (for an aggregator) and as a deamonset to collect logs from each node.
helm repo add vector https://helm.vector.devCREATE database vector
CREATE TABLE vector.vector_logs
(
`file` String,
`timestamp` DateTime64(3),
`kubernetes_container_id` LowCardinality(String),
`kubernetes_container_image` LowCardinality(String),
`kubernetes_namespace_labels` Map(LowCardinality(String), String),
`kubernetes_node_labels` Map(LowCardinality(String), String),
`kubernetes_container_name` LowCardinality(String),
`kubernetes_pod_annotations` Map(LowCardinality(String), String),
`kubernetes_pod_ip` IPv4,
`kubernetes_pod_ips` Array(IPv4),
`kubernetes_pod_labels` Map(LowCardinality(String), String),
`kubernetes_pod_name` LowCardinality(String),
`kubernetes_pod_namespace` LowCardinality(String),
`kubernetes_pod_node_name` LowCardinality(String),
`kubernetes_pod_owner` LowCardinality(String),
`kubernetes_pod_uid` LowCardinality(String),
`message` String,
`source_type` LowCardinality(String),
`stream` Enum('stdout', 'stderr')
)
ENGINE = MergeTree
ORDER BY (`kubernetes_container_name`, timestamp)Remember to adapt you ORDER BY key to suit your access patterns.
Download the agent and aggregator value files for the helm chart.
wget https://raw.githubusercontent.com/ClickHouse/examples/main/observability/logs/kubernetes/vector_to_vector/aggregator.yaml
wget https://raw.githubusercontent.com/ClickHouse/examples/main/observability/logs/kubernetes/vector_to_vector/agent.yamlThe aggregator.yaml provides a full sample aggregator configuration, requiring only minor changes for most cases.
To deploy an aggregator, we make a few key configuration changes to the charts values.yaml:
- Set the
roleto “Aggregator”role: "Aggregator"
- Modify the
customConfigkey to use vector as our source. This vector-specific protocol allows agent instances to forward logs to the aggregator over port 6000. Note also our remap transform, which uses VRL to ensure columns use_as delimiter and not..customConfig: data_dir: /vector-data-dir api: enabled: true address: 127.0.0.1:8686 playground: false sources: vector: address: 0.0.0.0:6000 type: vector version: "2" transforms: dots_to_underscores: type: remap inputs: [vector] source: | .kubernetes_namespace_labels = .kubernetes.namespace_labels .kubernetes_node_labels = .kubernetes.node_labels .kubernetes_pod_annotations = .kubernetes.pod_annotations .kubernetes_pod_labels = .kubernetes.pod_labels .kubernetes_container_image = .kubernetes.container_image .kubernetes_container_name = .kubernetes.container_name .kubernetes_pod_ip = .kubernetes.pod_ip .kubernetes_pod_ips = .kubernetes.pod_ips .kubernetes_pod_name = .kubernetes.pod_name .kubernetes_pod_namespace = .kubernetes.pod_namespace .kubernetes_pod_node_name = .kubernetes.pod_node_name .kubernetes_pod_owner = .kubernetes.pod_owner .kubernetes_pod_uid = .kubernetes.pod_uid del(.kubernetes)
Important
Under the customConfig key, configure the ClickHouse sink. Note the need to specify a protocol prefix in the endpoint and settings to encourage larger batch sizes. Also ensure you tune the resources to fit your throughput.
customConfig:
sinks:
clickhouse:
type: clickhouse
inputs: [dots_to_underscores]
database: vector
endpoint: "https://<host>:8443"
table: vector_logs
compression: gzip
auth:
password: <password>
strategy: basic
user: <username>
batch:
timeout_secs: 10
max_events: 10000
max_bytes: 10485760
skip_unknown_fields: trueInstalls the StatefulSet as a deployment.
helm install vector-aggregator vector/vector \
--namespace vector \
--create-namespace \
--values aggregator.yaml
kubectl get pods -n=vector
NAME READY STATUS RESTARTS AGE
vector-aggregator-0 1/1 Running 0 39sThe agent.yaml provides a full sample agent configuration.
Vector agents communicate over the Vector sink to the aggregator instance using an equivalent source. Our key configuration:
- Set the role to be "Agent"
role: "Agent"
- Use the
customConfigkey to configure the Kubernetes logs input and vector sink. This represents the actual Vector configuration file.customConfig: data_dir: /vector-data-dir api: enabled: true address: 127.0.0.1:8686 playground: false sources: kubernetes_logs: type: kubernetes_logs sinks: vector: type: vector inputs: [kubernetes_logs] address: vector-aggregator:6000
Installs the collector as a daemonset. Ensure you modify the resources to fit your environment.
helm install vector-agent vector/vector \
--namespace vector \
--create-namespace \
--values agent.yaml
kubectl get pods -n=vector
NAME READY STATUS RESTARTS AGE
vector-agent-2nxgv 1/1 Running 0 75s
vector-agent-4m2vj 1/1 Running 0 75s
vector-agent-6jdg4 1/1 Running 0 75s
vector-agent-74cbd 1/1 Running 0 75sSELECT count()
FROM vector.vector_logs
┌─count()─┐
│ 4695341 │
└─────────┘