Skip to content

Commit

Permalink
Refactor ClickHouse monitor implementation (#3498)
Browse files Browse the repository at this point in the history
Use a separate container in the clickhouse-server Pod (run as a ReplicaSet)
instead of a CronJob.
If we support multiple replicase (HA ClickHouse), we should only run the monitor
for the first replica.

This implementation also brings the following advantages.
* It reduces the overhead of creating and destroying a new Pod every time the monitor executes
* It avoids reading the K8s logs to check the last state of execution
* It reduces the overall number of Pods

Signed-off-by: Yanjun Zhou <zhouya@vmware.com>
  • Loading branch information
yanjunz97 committed Mar 25, 2022
1 parent 10afc0d commit a2c6a1f
Show file tree
Hide file tree
Showing 8 changed files with 132 additions and 756 deletions.
122 changes: 29 additions & 93 deletions build/yamls/flow-visibility.yml
Expand Up @@ -14,41 +14,12 @@ volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: flow-visibility
name: clickhouse-monitor
namespace: flow-visibility
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana
namespace: flow-visibility
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
app: flow-visibility
name: clickhouse-monitor-role
namespace: flow-visibility
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- apiGroups:
- ""
resources:
- pods/log
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
app: flow-visibility
Expand All @@ -66,22 +37,6 @@ rules:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
app: flow-visibility
name: clickhouse-monitor-role-binding
namespace: flow-visibility
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: clickhouse-monitor-role
subjects:
- kind: ServiceAccount
name: clickhouse-monitor
namespace: flow-visibility
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
app: flow-visibility
Expand Down Expand Up @@ -4929,54 +4884,6 @@ spec:
name: grafana-dashboard-config-gkkgc9d727
name: grafana-dashboard-config
---
apiVersion: batch/v1
kind: CronJob
metadata:
labels:
app: clickhouse-monitor
name: clickhouse-monitor
namespace: flow-visibility
spec:
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
app: clickhouse-monitor
spec:
containers:
- env:
- name: CLICKHOUSE_USERNAME
valueFrom:
secretKeyRef:
key: username
name: clickhouse-secret
- name: CLICKHOUSE_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: clickhouse-secret
- name: DB_URL
value: tcp://clickhouse-clickhouse.flow-visibility.svc:9000
- name: TABLE_NAME
value: default.flows
- name: MV_NAMES
value: default.flows_pod_view default.flows_node_view default.flows_policy_view
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MONITOR_LABEL
value: app=clickhouse-monitor
image: projects.registry.vmware.com/antrea/flow-visibility-clickhouse-monitor:latest
imagePullPolicy: IfNotPresent
name: clickhouse-monitor
restartPolicy: OnFailure
serviceAccountName: clickhouse-monitor
schedule: '* * * * *'
successfulJobsHistoryLimit: 1
---
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
Expand All @@ -4997,6 +4904,7 @@ spec:
defaults:
templates:
podTemplate: pod-template
serviceTemplate: service-template
templates:
podTemplates:
- name: pod-template
Expand All @@ -5009,6 +4917,26 @@ spec:
name: clickhouse-configmap-volume
- mountPath: /var/lib/clickhouse
name: clickhouse-storage-volume
- env:
- name: CLICKHOUSE_USERNAME
valueFrom:
secretKeyRef:
key: username
name: clickhouse-secret
- name: CLICKHOUSE_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: clickhouse-secret
- name: DB_URL
value: tcp://localhost:9000
- name: TABLE_NAME
value: default.flows
- name: MV_NAMES
value: default.flows_pod_view default.flows_node_view default.flows_policy_view
image: projects.registry.vmware.com/antrea/flow-visibility-clickhouse-monitor:latest
imagePullPolicy: IfNotPresent
name: clickhouse-monitor
volumes:
- configMap:
name: clickhouse-mounted-configmap-dkbmg82ctg
Expand All @@ -5017,3 +4945,11 @@ spec:
medium: Memory
sizeLimit: 8Gi
name: clickhouse-storage-volume
serviceTemplates:
- name: service-template
spec:
ports:
- name: http
port: 8123
- name: tcp
port: 9000
115 changes: 28 additions & 87 deletions build/yamls/flow-visibility/base/clickhouse.yml
Expand Up @@ -26,7 +26,16 @@ spec:
defaults:
templates:
podTemplate: pod-template
serviceTemplate: service-template
templates:
serviceTemplates:
- name: service-template
spec:
ports:
- name: http
port: 8123
- name: tcp
port: 9000
podTemplates:
- name: pod-template
spec:
Expand All @@ -38,6 +47,25 @@ spec:
mountPath: /docker-entrypoint-initdb.d
- name: clickhouse-storage-volume
mountPath: /var/lib/clickhouse
- name: clickhouse-monitor
image: flow-visibility-clickhouse-monitor
env:
- name: CLICKHOUSE_USERNAME
valueFrom:
secretKeyRef:
name: clickhouse-secret
key: username
- name: CLICKHOUSE_PASSWORD
valueFrom:
secretKeyRef:
name: clickhouse-secret
key: password
- name: DB_URL
value: "tcp://localhost:9000"
- name: TABLE_NAME
value: "default.flows"
- name: MV_NAMES
value: "default.flows_pod_view default.flows_node_view default.flows_policy_view"
volumes:
- name: clickhouse-configmap-volume
configMap:
Expand All @@ -46,90 +74,3 @@ spec:
emptyDir:
medium: Memory
sizeLimit: 8Gi
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: flow-visibility
name: clickhouse-monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
app: flow-visibility
name: clickhouse-monitor-role
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- apiGroups:
- ""
resources:
- pods/log
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
app: flow-visibility
name: clickhouse-monitor-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: clickhouse-monitor-role
subjects:
- kind: ServiceAccount
name: clickhouse-monitor
---
apiVersion: batch/v1
kind: CronJob
metadata:
labels:
app: clickhouse-monitor
name: clickhouse-monitor
spec:
schedule: "* * * * *"
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
app: clickhouse-monitor
spec:
serviceAccountName: clickhouse-monitor
containers:
- name: clickhouse-monitor
image: flow-visibility-clickhouse-monitor
env:
- name: CLICKHOUSE_USERNAME
valueFrom:
secretKeyRef:
name: clickhouse-secret
key: username
- name: CLICKHOUSE_PASSWORD
valueFrom:
secretKeyRef:
name: clickhouse-secret
key: password
- name: DB_URL
value: "tcp://clickhouse-clickhouse.flow-visibility.svc:9000"
- name: TABLE_NAME
value: "default.flows"
- name: MV_NAMES
value: "default.flows_pod_view default.flows_node_view default.flows_policy_view"
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MONITOR_LABEL
value: "app=clickhouse-monitor"
restartPolicy: OnFailure
17 changes: 3 additions & 14 deletions build/yamls/flow-visibility/patches/dev/imagePullPolicy.yml
@@ -1,14 +1,3 @@
apiVersion: batch/v1
kind: CronJob
metadata:
labels:
app: clickhouse-monitor
name: clickhouse-monitor
spec:
jobTemplate:
spec:
template:
spec:
containers:
- name: clickhouse-monitor
imagePullPolicy: IfNotPresent
- op: add
path: /spec/templates/podTemplates/0/spec/containers/1/imagePullPolicy
value: IfNotPresent
36 changes: 26 additions & 10 deletions docs/network-flow-visibility.md
Expand Up @@ -33,7 +33,7 @@
- [About Grafana and ClickHouse](#about-grafana-and-clickhouse)
- [Deployment Steps](#deployment-steps-1)
- [Credentials Configuration](#credentials-configuration)
- [ClickHouse Performance Configuration](#clickhouse-performance-configuration)
- [ClickHouse Configuration](#clickhouse-configuration)
- [Pre-built Dashboards](#pre-built-dashboards)
- [Flow Records Dashboard](#flow-records-dashboard)
- [Pod-to-Pod Flows Dashboard](#pod-to-pod-flows-dashboard)
Expand Down Expand Up @@ -615,12 +615,12 @@ The expected results will be like:

```bash
NAME READY STATUS RESTARTS AGE
pod/chi-clickhouse-clickhouse-0-0-0 1/1 Running 0 1m
pod/chi-clickhouse-clickhouse-0-0-0 2/2 Running 0 1m
pod/grafana-5c6c5b74f7-x4v5b 1/1 Running 0 1m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/chi-clickhouse-clickhouse-0-0 ClusterIP None <none> 8123/TCP,9000/TCP,9009/TCP 1m
service/clickhouse-clickhouse LoadBalancer 10.105.198.192 <pending> 8123:30001/TCP,9000:31044/TCP 1m
service/clickhouse-clickhouse ClusterIP 10.102.124.56 <none> 8123/TCP,9000/TCP 1m
service/grafana LoadBalancer 10.97.171.150 <pending> 3000:31171/TCP 1m

NAME READY UP-TO-DATE AVAILABLE AGE
Expand All @@ -632,12 +632,6 @@ replicaset.apps/grafana-5c6c5b74f7 1 1 1 1m
NAME READY AGE
statefulset.apps/chi-clickhouse-clickhouse-0-0 1/1 1m


NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/clickhouse-monitor * * * * * False 0 30s 1m

NAME COMPLETIONS DURATION AGE
job.batch/clickhouse-monitor-27434986 1/1 6s 30s
```

Run the following commands to print the IP of the workder Node and the NodePort
Expand Down Expand Up @@ -698,7 +692,28 @@ a new manifest:
make manifest
```

##### ClickHouse Performance Configuration
##### ClickHouse Configuration

The ClickHouse database can be accessed through the service `clickhouse-clickhouse`.
The pod exposes HTTP port at 8123 and TCP port at 9000 by default. The ports are
specified in [clickhouse.yml][clickhouse_manifest_yaml] as `serviceTemplates`.
To use other ports, please update the following section accordingly.

```yaml
serviceTemplates:
- name: service-template
spec:
ports:
- name: http
port: 8123
- name: tcp
port: 9000
```

This service is also used by the Flow Aggregator and Grafana. If you update the
HTTP port, please update `url` in [datasource_provider.yml][grafana_datasouce_provider_yaml].
If you update the TCP port, please update `jsonData.port` in [datasource_provider.yml][grafana_datasouce_provider_yaml]
and `databaseURL` in the [Flow Aggregator Configuration](#configuration-1).

The ClickHouse throughput depends on two factors - the storage size of the ClickHouse
and the time interval between the batch commits to the ClickHouse. Larger storage
Expand Down Expand Up @@ -1005,4 +1020,5 @@ Visualization Network Policy Dashboard">
[clickhouse_manifest_yaml]: ../build/yamls/flow-visibility/base/clickhouse.yml
[flow_aggregator_manifest_yaml]: ../build/yamls/flow-aggregator/base/flow-aggregator.yml
[grafana_manifest_yaml]: ../build/yamls/flow-visibility/base/grafana.yml
[grafana_datasouce_provider_yaml]: ../build/yamls/flow-visibility/base/provisioning/datasources/datasource_provider.yml
[flow_visibility_kustomization_yaml]: ../build/yamls/flow-visibility/base/kustomization.yml

0 comments on commit a2c6a1f

Please sign in to comment.