Support ClickHouse deployment with Persistent Volume #3608

yanjunz97 · 2022-04-08T21:50:21Z

This commit supports deploying ClickHouse with Persistent Volume. It provides examples using Local PV, NFS PV or other customized StorageClass.

Signed-off-by: Yanjun Zhou zhouya@vmware.com

codecov-commenter · 2022-04-08T21:57:46Z

Codecov Report

Merging #3608 (4409342) into main (8f451f7) will increase coverage by 0.50%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #3608      +/-   ##
==========================================
+ Coverage   64.44%   64.95%   +0.50%     
==========================================
  Files         278      278              
  Lines       39513    39640     +127     
==========================================
+ Hits        25463    25747     +284     
+ Misses      12068    11910     -158     
- Partials     1982     1983       +1

Flag	Coverage Δ
e2e-tests	`46.37% <ø> (?)`
kind-e2e-tests	`52.92% <ø> (+0.57%)`	⬆️
unit-tests	`43.77% <ø> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/agent/flowexporter/utils.go	`0.00% <0.00%> (-76.60%)`	⬇️
pkg/agent/openflow/pod_connectivity.go	`42.26% <0.00%> (-26.06%)`	⬇️
.../flowexporter/connections/conntrack_connections.go	`72.54% <0.00%> (-4.42%)`	⬇️
...agent/flowexporter/connections/deny_connections.go	`80.45% <0.00%> (-3.45%)`	⬇️
pkg/ovs/openflow/ofctrl_nxfields.go	`62.06% <0.00%> (-3.45%)`	⬇️
...g/controller/networkpolicy/store/appliedtogroup.go	`87.61% <0.00%> (-2.86%)`	⬇️
pkg/agent/openflow/framework.go	`87.00% <0.00%> (-2.70%)`	⬇️
pkg/apiserver/storage/ram/watch.go	`90.66% <0.00%> (-2.67%)`	⬇️
pkg/controller/ipam/antrea_ipam_controller.go	`76.41% <0.00%> (-2.67%)`	⬇️
pkg/agent/openflow/client.go	`70.70% <0.00%> (-1.48%)`	⬇️
... and 19 more

antoninbas · 2022-04-08T22:00:09Z

build/yamls/flow-visibility/patches/chmonitor/chMonitor.yml

@@ -0,0 +1,28 @@
+- op: add


Is this change independent from the rest of this PR (PersistentVolume support)? If yes, could it go into a separate PR?

It is inspired by the PV support. This change allows us to generate a manifest without the ClickHouse monitor. We suppose this may happen when user have a large enough PV storage. In that case, the monitor won't be triggered as the throughput bottleneck would be in the flow aggregator side instead of the ClickHouse storage space side. But overall it make sense to me to put it in a separate PR.

This part has been removed from this PR now. Hope it is more clear now. Thanks!

Updated, the file is added back to allow customized storage size. But the optional monitor is still not added in this PR.

antoninbas · 2022-04-08T22:02:32Z

plugins/flow-visibility/clickhouse-monitor/main.go

@@ -17,7 +17,10 @@ package main
 import (


same comment as above; are the monitor changes related to the PersistentVolume support?

Most of the monitor changes are related to the PV support. As with PV, user may store ClickHouse data together with other software, in which way the disk usage percentage does not indicate the actual space usage percentage of the ClickHouse. thus we need some update in the disk usage checking logic.

The only unrelated change is I move deletePercentage and threshold parameter from hard-coded ones to env variables in yaml to make customized value more convenient. I could take out it from this PR.

Removed some unrelated changes.

antoninbas · 2022-04-08T22:10:53Z

hack/generate-manifest-flow-visibility.sh

+        --mode (dev|release)        Choose the configuration variant that you need (default is 'dev').
+        --keep                      Debug flag which will preserve the generated kustomization.yml.
+        --volume (ram|pv)           Choose the volume provider that you need (default is 'ram').
+        --storageclass -sc <name>   Provide the StorageClass used to dynamically provision the 
+                                    Persistent Volume for ClickHouse storage.
+        --local <path>              Create the Persistent Volume for ClickHouse with a provided
+                                    local path.
+        --nfs <hostname:path>       Create the Persistent Volume for ClickHouse with a provided
+                                    NFS server hostname or IP address and the path exported in the
+                                    form of hostname:path.
+        --no-ch-monitor             Generate a manifest without the ClickHouse monitor.


I feel like instead of adding all these flags we should work on adding helm support for the flow visibility solution right away (see #3578) as it provides a more natural / standardized way of generating the manifests than our custom generate-manifest scripts. Configuring a PV / PVC would be much easier with helm charts.

It sounds good to add helm support for the flow visibility solution. But as we are migrating from the main repo to a child repo theia, the purpose of opening this PR in main repo is to include PV support for the first round of CQE testing. I wonder if it is possible to have the PV support with Kustomize first. When we finish the migration, we can add helm support on the child repo side.

I also notice you mention

In a future PR, we will look into the release process for the Helm chart. After that, Helm charts could be added for Antrea components (Flow Aggregator, Flow visibility).

Does this mean it takes more time to support Helm chart for other Antrea components? Maybe we can do the update from Kustomize to Helm in another PR and Theia side?

I think we should definetely have a target to provide a helm chart for flow visibility in 1.7, just like Antonin is doing for Antrea.

I also think this can be done in a dedicated PR, to avoid conflating too many changes into a single PR.

salv-orlando · 2022-04-15T11:59:16Z

build/yamls/flow-visibility/patches/pv/mountPv.yml

+          - ReadWriteOnce
+        resources:
+          requests:
+            storage: 8Gi


If we do not specify the request, the pvc will be allowed access to the whole volume, if I recall correctly.
Since the volume is created specifically for clickhouse, I think it might be ok.

The point of this comment is simply to avoid specifying volume size in too many places.

Actually Kubernetes does not have any enforcement on the storage capacity. The storage size on PV is a required field only for informative. Similarly, the storage size on PVC is required but does not have enforcement on how many space the pod can use. But it will be used to match a PV. Only a PV with a capacity equal to or larger the value specified in the PVC can be bound with this PVC.

docs/network-flow-visibility.md

salv-orlando · 2022-04-15T12:05:08Z

hack/generate-manifest-flow-visibility.sh

+    --nfs)
+    NFSPATH="$2"
+    shift 2
+    ;;


do you think we can add something specify the size?
I can think of two ways

we consider the yaml patches as templates and we replace the size with the actual one before generating the manifests (we should be careful about not doing in place modification)

(not sure if feasible) we add vars to the patch files where size is specified and point this vars to an env variable with the desired size.

I update the shell with the first way. I think it might be easier to do something similar to the second way when we move to Helm, with which we can load this value from value.yaml. Will target Helm chart support in release 1.7.

plugins/flow-visibility/clickhouse-monitor/main.go

salv-orlando · 2022-04-15T17:00:01Z

plugins/flow-visibility/clickhouse-monitor/main.go

+func getDiskUsage(connect *sql.DB, freeSpace *uint64, totalSpace *uint64) {
+	// Get free space from ClickHouse system table
+	if err := wait.PollImmediate(queryRetryInterval, queryTimeout, func() (bool, error) {
+		if err := connect.QueryRow("SELECT free_space, total_space FROM system.disks").Scan(freeSpace, totalSpace); err != nil {


If you can query total_space from clickhouse, do you need to pass the storage size as an env variable?

This might already be explained by the #3608 (comment). The total_space here is the size of the partition on the disk where the ClickHouse storages its data in. It might be shared with other software, thus we need a storage size from user to be used as a desired space for ClickHouse.

dreamtalen · 2022-04-18T17:44:25Z

build/yamls/flow-visibility/patches/pv/createLocalPv.yml

+  name: clickhouse-storage
+provisioner: kubernetes.io/no-provisioner
+volumeBindingMode: WaitForFirstConsumer
+reclaimPolicy: Delete


After deleting our flow visibility deployment, the PV users may like to retain the flows data inside their disk/volume. Could we achieve that by having an option (or add instruction in doc) to change the reclaimPolicy to "Retain"?

Thanks Yongming for pointing out this! I investigated the reclaimPolicy a little bit more and noticed that the Delete only supported by a few dynamically provisioned PVs, which does not include our two examples. Thus even if I specify Delete here, the reclaimPolicy of the PVs deployed is still Retain. I updated the manifest and docs to make it clear.

docs/network-flow-visibility.md

plugins/flow-visibility/clickhouse-monitor/main.go

yanjunz97 · 2022-04-27T21:13:27Z

Hi @antoninbas Could you take another look at this PR?

antoninbas · 2022-04-28T21:19:00Z

plugins/flow-visibility/clickhouse-monitor/main.go

@@ -200,3 +255,37 @@ func getDeleteRowNum(connect *sql.DB) (uint64, error) {
 	deleteRowNum = uint64(float64(count) * deletePercentage)
 	return deleteRowNum, nil
 }
+
+// Parse human readable storage size to number in bytes
+func parseSize(sizeString string) (uint64, error) {


I am curious: there is no existing function to do that in K8s libraries?

Thanks Antonin for reviewing! I took a deeper look into the K8S libraries, and found an API from resource. Replaced the function.

docs/network-flow-visibility.md

hack/generate-manifest-flow-visibility.sh

plugins/flow-visibility/clickhouse-monitor/main.go

antoninbas · 2022-05-03T22:57:12Z

docs/network-flow-visibility.md

+    In both examples, you can set `.spec.capacity.storage` in PersistentVolume
+    to your storage size. This value is for informative purpose as K8s does not
+    enforce the capacity of PVs. If you want to limit the storage usage, you need
+    to ask for your storage system to enforce that. For example, you can create
+    a Local PV on a partition with the limited size. We recommend using a dedicated
+    saving space for the ClickHouse if you are going to run the Flow Collector in
+    production.
+
+    As these examples do not use any dynamic provisioner, the reclaim policy
+    for the PVs is `Retain` by default. After stopping the Grafana Flow Collector,
+    if you no long need the data for future use, you may need to manually clean
+    up the data on the local disk or NFS server.


I don't think that these 2 paragraphs should be indented?

It seems the ordered list index for the steps will be broken with these two paragraphs unindented.

I see, that's my bad

antoninbas · 2022-05-03T23:03:06Z

docs/network-flow-visibility.md

+    a specific Node. Refer to [createLocalPv.yml][local_pv_yaml] to create the
+    PV. Please replace `LOCAL_PATH` with the path to store the ClickHouse data
+    and label the Node used to store the ClickHouse data with
+    `clickhouse/instance=data`.


I think this label was created by you and is not a standard ClickHouse label?
If it is indeed the case, I think you should prefix the label name with antrea.io, like we do for other Antrea-specific labels.

Maybe the label should not have a value either. Something like antrea.io/clickhouse-data-node=. I am not sure if you plan to use the same label key for anything else.

Yes. This label is only for the Local PV node affinity. Updated.

antoninbas · 2022-05-04T22:29:07Z

@yanjunz97 I approved but looks like you have a merge conflict

yanjunz97 · 2022-05-05T00:49:36Z

@yanjunz97 I approved but looks like you have a merge conflict

Thanks! I have rebased and resolved the conflicts.

yanjunz97 · 2022-05-05T17:17:43Z

/test-e2e
/test-conformance
/test-networkpolicy

antoninbas · 2022-05-05T18:26:59Z

hack/generate-manifest-flow-visibility.sh

+    cp $KUSTOMIZATION_DIR/patches/chmonitor/*.yml .

+    # patch the clickhouse monitor with desired storage size
    $KUSTOMIZE edit add base base
+    sed -i.bak -E "s/STORAGE_SIZE_VALUE/$SIZE/" chMonitor.yml
+    $KUSTOMIZE edit add patch --path chMonitor.yml --group clickhouse.altinity.com --version v1 --kind ClickHouseInstallation --name clickhouse


@wsquan171 could you look at this. Do we need the monitor for e2e tests, that doesn't seem right to me?

As previously we do not separate the monitor from the ClickHouse deployment, it is included by default in the e2e test. Thus I keep it in this update. But I'm glad to remove it if @wsquan171 thinks it is safe to do this.

We don't really need or test the monitor now. As Yanjun mentioned it's included before since there's no harm to have it, but also as it's a hassle to just removed it for e2e. since now the manifest generation takes the monitor out by default I don't see the need to introduce more work to just add it back. we'd also want to mentioned that monitor is absent in the help message print when in e2e mode.

Thanks! Removed the monitor related contents for the e2e mode.

Signed-off-by: Yanjun Zhou <zhouya@vmware.com>

yanjunz97 · 2022-05-06T00:44:57Z

/test-e2e
/test-conformance
/test-networkpolicy
/test-integration

yanjunz97 · 2022-05-06T03:14:40Z

/test-e2e

yanjunz97 · 2022-05-06T18:38:47Z

Hi @antoninbas This PR can be merged if no other comment.

yanjunz97 requested review from heanlan, dreamtalen and wsquan171 April 8, 2022 21:54

antoninbas reviewed Apr 8, 2022

View reviewed changes

yanjunz97 force-pushed the clickhouse-pv branch from 9252fe4 to 93372b5 Compare April 8, 2022 23:15

yanjunz97 requested a review from salv-orlando April 13, 2022 01:15

salv-orlando reviewed Apr 15, 2022

View reviewed changes

yanjunz97 added this to the Antrea v1.7 release milestone Apr 18, 2022

dreamtalen reviewed Apr 18, 2022

View reviewed changes

yanjunz97 requested a review from antoninbas April 25, 2022 21:15

yanjunz97 force-pushed the clickhouse-pv branch 2 times, most recently from 0928b4e to 1396fd4 Compare April 25, 2022 21:51

antoninbas reviewed Apr 28, 2022

View reviewed changes

yanjunz97 requested a review from antoninbas April 29, 2022 20:41

antoninbas reviewed May 3, 2022

View reviewed changes

yanjunz97 force-pushed the clickhouse-pv branch from a538c5b to ad5c7f4 Compare May 4, 2022 01:16

yanjunz97 requested a review from antoninbas May 4, 2022 17:15

antoninbas previously approved these changes May 4, 2022

View reviewed changes

yanjunz97 dismissed antoninbas’s stale review via 440783d May 5, 2022 00:27

yanjunz97 force-pushed the clickhouse-pv branch from ad5c7f4 to 440783d Compare May 5, 2022 00:27

antoninbas reviewed May 5, 2022

View reviewed changes

Support ClickHouse deployment with Persistent Volume

4409342

Signed-off-by: Yanjun Zhou <zhouya@vmware.com>

yanjunz97 force-pushed the clickhouse-pv branch from 440783d to 4409342 Compare May 5, 2022 19:43

antoninbas approved these changes May 6, 2022

View reviewed changes

antoninbas merged commit 92dded2 into antrea-io:main May 6, 2022

Support ClickHouse deployment with Persistent Volume #3608

Support ClickHouse deployment with Persistent Volume #3608

Conversation

yanjunz97 commented Apr 8, 2022

codecov-commenter commented Apr 8, 2022 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanjunz97 Apr 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanjunz97 Apr 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanjunz97 commented Apr 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoninbas commented May 4, 2022

yanjunz97 commented May 5, 2022

yanjunz97 commented May 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanjunz97 commented May 6, 2022

yanjunz97 commented May 6, 2022

yanjunz97 commented May 6, 2022

codecov-commenter commented Apr 8, 2022 •

edited

yanjunz97 Apr 8, 2022 •

edited

yanjunz97 Apr 8, 2022 •

edited