Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Helm support for Theia with ClickHouse PV #15

Merged
merged 1 commit into from
May 25, 2022

Conversation

yanjunz97
Copy link
Contributor

This PR adds Helm chart for Theia and uses Helm templates instead of Kustomize to generate Theia manifest.

Signed-off-by: Yanjun Zhou zhouya@vmware.com

build/charts/theia/values.yaml Show resolved Hide resolved
build/charts/theia/values.yaml Outdated Show resolved Hide resolved
Comment on lines 11 to 29
# -- HTTP port number for the ClickHouse service.
httpPort: 8123
# -- TCP port number for the ClickHouse service.
tcpPort: 9000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it sometimes make sense to disable one of the ports to only use HTTP or TCP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Flow Aggregator uses TCP to connect the ClickHouse while Grafana uses HTTP for connection. Thus it might not have a use case to disable one of them.

build/charts/theia/values.yaml Outdated Show resolved Hide resolved
@yanjunz97 yanjunz97 marked this pull request as draft May 5, 2022 22:27
@yanjunz97
Copy link
Contributor Author

This PR is converted to draft to add PV supports and some more customizations.

@yanjunz97 yanjunz97 force-pushed the helm-support branch 7 times, most recently from 9e38a79 to fa14676 Compare May 10, 2022 18:17
@yanjunz97
Copy link
Contributor Author

Hi @antoninbas I have added the ClickHouse PV support and some more customizations in this PR. I also have several topics regarding Helm support in Theia to bring up. I would like to hear from your options on these topics.

  1. Namespace
    It seems Helm has the convention to use --namespace and --create-namespace to deploy the chart in a specific Namespace. In the flow visibility case, it works when I use helm install. But when I try to generate the manifest with helm template, the creation of Namespace will not be included in the generated YAML file even if I use --create-namespace. I find here --create-namespace is ignored for helm template helm/helm#9813 shows Helm team does not plan to provide --create-namespace support for helm template.
    I come up with 2 solutions:
  • Use the YAML file without a creation of flow-visibility Namespace, and add instructions to create Namespace flow-visibility before applying the flow-visibility.yml in docs. But I’m worry about whether it would be a problem as it may bring inconsistency comparing to our previous versions.
  • Add a value createNamespace to values.yaml and set it to true when running the manifest generation script. This can produce a manifest with Namespace as previous. But it seems strange to expose this variable as a value to user.
    I’m not sure which solution to choose or whether there is a better solution.
  1. ClickHouse Operator
    ClickHouse Operator is a CRD which should be installed before applying flow-visibility.yml as it defines the kind ClickHouseInstallation. I see there is a discussion about whether to put it into the crds dir at the issue [Helm chart] Should CRDs be placed in the special crds/ directory? antrea#3665. It seems there is no final conclusion on this topic; and ClickHouse Operator is slightly different from CRDs in Antrea:
  • The ClickHouse Operator is provided by a third party instead of a CRD developed by us.
  • It seems we cannot treat ClickHouse Operator as a regular templated resource as it must be installed before we try to deploy the ClickHouse server.
    I’m not sure whether it is better to put ClickHouse Operator into the crds dir?
  1. Release and documentation
    It is mentioned in Add an Helm chart for Antrea antrea#3578 that we will have a future PR about the Helm chart release. And I see Helm provides chart-releaser-action at https://github.com/helm/chart-releaser-action. Do we plan to use it or just let user clone our repo first for now?

    There are a lot of instructions on how to modify the YAML file in our doc. IMO these should be removed when we introduce the Helm chart. But before we have the chart in the release, it seems we still need to keep it? Also after we have a chart in release, I’m not sure whether it is replicated to add instructions on how to modify the Helm values when we introduce each features in flow visibility docs, as we already have an auto-generated Helm doc.

also cc @salv-orlando @dreamtalen @heanlan @wsquan171

@yanjunz97 yanjunz97 marked this pull request as ready for review May 10, 2022 18:25
@antoninbas
Copy link
Contributor

@yanjunz97

  1. I know that the Nginx ingress controller uses a kustomize step after rendering the manifest with Helm to inject the Namespace resource: https://github.com/kubernetes/ingress-nginx/blob/main/hack/manifest-templates/common/kustomization.yaml. You could consider doing the same thing. Note that in the case of flow visibility, I think we should encourage the user to deploy with Helm directly, instead of applying a manifest kubectl apply -f. We should still provide a reference YAML manifest though for users who don't want to use Helm.
  2. If you want to include the ClickHouse operator in the Theia Helm chart, you should really put the ClickHouseInstallation under the crds/ directory. Your use case is very different from the Antrea one, given that you actually need to create a CR for that CRD, as part of the Helm deployment. So you should follow Helm's best practices, and ignore the discussion in [Helm chart] Should CRDs be placed in the special crds/ directory? antrea#3665. BTW, if you package everything as one Helm chart, how do you plan to generate 2 distinct manifests for "manual" installation? If you have a single manifest with both the CRD and the corresponding CR, I think using kubectl apply may fail, as the CR may be "created" before the CRD has been registered.
  3. We do plan to release the Antrea Helm chart properly (and not require users to clone the repo), but I am trying to figure out [Helm chart] Should CRDs be placed in the special crds/ directory? antrea#3665 first. We will have a chart repo for all the Antrea charts, including Theia. If Helm becomes your primary installation mechanism for Theia, I think the documentation should reflect that, with instructions on how to modify Helm values to customize the installation.

@yanjunz97 yanjunz97 marked this pull request as draft May 10, 2022 19:10
@yanjunz97 yanjunz97 force-pushed the helm-support branch 2 times, most recently from 3829d9c to a6bdc16 Compare May 11, 2022 01:14
Copy link
Contributor

@salv-orlando salv-orlando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be some changes not related to helm charts - can we have distinct PRs for those?

Makefile Outdated Show resolved Hide resolved
VERSION Outdated
@@ -1 +1 @@
v1.7.0-dev
v0.1.0-dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes make a lot of sense, but I'd create separate PRs for those

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I remove it from this PR.

build/charts/theia/README.md Show resolved Hide resolved
build/charts/theia/README.md Outdated Show resolved Hide resolved
build/charts/theia/README.md Outdated Show resolved Hide resolved
build/charts/theia/README.md Outdated Show resolved Hide resolved

CREATE MATERIALIZED VIEW flows_pod_view
CREATE MATERIALIZED VIEW IF NOT EXISTS flows_pod_view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems a change not related to helm charts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the ClickHouse PV support. PV support is also added into this PR as

  • There are only a few customization variables before we add PV support
  • PV support contents have been reviewed at Antrea side. Adding PV to this PR mainly requires reviews on the Helm related contents.

docs/network-flow-visibility.md Outdated Show resolved Hide resolved
docs/network-flow-visibility.md Show resolved Hide resolved
hack/generate-manifest.sh Show resolved Hide resolved
@yanjunz97 yanjunz97 force-pushed the helm-support branch 2 times, most recently from b95f5d2 to 4cf09f8 Compare May 11, 2022 18:35
@yanjunz97 yanjunz97 marked this pull request as ready for review May 11, 2022 18:35
@yanjunz97 yanjunz97 force-pushed the helm-support branch 2 times, most recently from c27684b to abd0628 Compare May 11, 2022 19:15
@yanjunz97
Copy link
Contributor Author

Thanks @antoninbas ! This PR is updated and ready for review now. Here is the update on the topic we have discussed above.

  1. I have updated the scripts to use Kustomize to inject the Namespace.
  2. For the ClickHouse Operator, I put it into the crds dir. It seems if we do not add the --include-crds option when running helm template, the contents in crds dir will not be included into the generated YAML file. Thus we still have two distinct yaml files, one in the crds dir and the flow-visibility.yml generated by the script. Users can still use the same way as before to manually deploy the flow visibility contents.
  3. The instructions on how to modify the YAML file will be replaced by Helm instructions in a future PR when we release the Helm chart.

@yanjunz97
Copy link
Contributor Author

  1. For the ClickHouse Operator, I put it into the crds dir. It seems if we do not add the --include-crds option when running helm template, the contents in crds dir will not be included into the generated YAML file. Thus we still have two distinct yaml files, one in the crds dir and the flow-visibility.yml generated by the script. Users can still use the same way as before to manually deploy the flow visibility contents.

I kind of think that moving ch op install bundle into crd sub dir is a bit hacky. it's not technically just a crd yml, and if in the future we need to introduce crd for Theia we'll eventually have to use the --include-crds switch. is the main goal of putting it under crd for having one single helm install command?

Thanks @wsquan171 for pointing out this. Yes, by putting it under crds, we can support a single helm install command for the whole product. As stated in Helm docs, I think the purpose of crds dir is mainly for the installation of CRD declaration before using the resource, which is pretty much similar to our case IMO. And I'm worry about it would be confusing if users have to use kubectl apply and helm install at the same time to deploy Theia.

But I also understand your concern about what if we add our own CRDs. I think these CRDs are under the similar conditions mentioned in the Antrea issue antrea-io/antrea#3665 . These CRDs are kept as Template in Antrea chart now.

Basically there are two strategies to treat the ClickHouse Operator:

  • Remove ClickHouse Operator from Theia Chart and state it as a prerequisite of Theia.
  • Keep as it is and revisit the topic when we have our own CRDs and there is an update of the placement CRDs on the Antrea chart.

I'd also like to have @antoninbas option on this.

@@ -230,6 +258,8 @@ Collector in production.

#### ClickHouse Configuration

##### Service Customization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment on all of the original configuration instructions: If we keep all these original configuration instructions for users who don't to use Helm, can we write it explicitly in the doc, with something like: for all of the configurations, option 1 - with Helm, edit values.yaml, option 2 - without Helm, edit flow-visibility.yml directly.

Otherwise it seems a bit confusing to me, if we recommend using Helm to template yaml, and we also recommend making a series of configurations directly on flow-visibility.yml in the meanwhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I make an update on the document, hope it clearer now.

Makefile Show resolved Hide resolved
build/charts/theia/templates/grafana/deployment.yaml Outdated Show resolved Hide resolved
build/charts/theia/README.md Show resolved Hide resolved
@yanjunz97 yanjunz97 force-pushed the helm-support branch 8 times, most recently from 79d3753 to 3074158 Compare May 19, 2022 21:22
@yanjunz97 yanjunz97 changed the title Add Helm support for Theia Add Helm support for Theia with ClickHouse PV May 19, 2022
@yanjunz97
Copy link
Contributor Author

Hi @antoninbas Could you take another look at this PR?

build/charts/theia/values.yaml Outdated Show resolved Hide resolved
build/charts/theia/values.yaml Outdated Show resolved Hide resolved
build/charts/theia/values.yaml Outdated Show resolved Hide resolved
# one of these unit suffixes SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER,
# YEAR.
ttl: 1 HOUR
persistentVolume:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole section can be a bit confusing, because it conflates some things, like PV type & storage class provisioner, and doesn't clearly distinguish between manual volume provisioning and dynamic volume provisioning.

It seems that there is also no option for users to provide a custom PersistentVolume (if they don't want to use dynamic provisioning).

storage:
  size: "8Gi"
  createPersistentVolume:
    # can be set to "Local" or "NFS"
    type: ""
    local:
      path: "/data"
    nfs:
      host: ""
      path: ""
  # ignored if persistentVolume.type is non-empty
  persistentVolumeClaimSpec: {}
    # storageClassName: ""
    # volumeName: 

By default, memory storage is used.

To use a local volume created by us, one can set:

storage.createPersistentVolume.type = "Local"

To use a custom PersistentVolume, one can set:

storage.persistentVolumeClaimSpec:
  storageClassName: ""
  volumeName: "<my-pv>"

To dynamically provision a PersistentVolume, one can set:

storage.persistentVolumeClaimSpec
  storageClassName: <"my-storage-class">

Let me know what you think of that solution

Copy link
Contributor Author

@yanjunz97 yanjunz97 May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PersistentVolumeClaim does not accept PV name as a specification. As introduced in K8S doc, I think it means

  • If storageClassName is specified, PVC will look for PV in the cluster with that class
  • If storageClassName is set to "", PVC will look for PV with no class

From my experience, if users don't want to use dynamic provisioning, they could set StorageClass to "" or create a StorageClass with kubernetes.io/no-provisioner as provisioner, and make sure the corresponding PV is created manually.

We can still adopt the solution, with the storage.persistentVolumeClaimSpec only expecting storageClassName. Memory storage will be used when storage.createPersistentVolume.type is "" and storage.persistentVolumeClaimSpec is empty. But I'm not sure whether this change is still good enough as it might still not clearly distinguish between manual volume provisioning and dynamic volume provisioning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 cases for storageClassName: set, omitted and set to empty
I do think it is possible to manually bind to a PV with the volumeName field: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reserving-a-persistentvolume

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this. Thanks! The storage section is refactored now.

build/charts/theia/values.yaml Outdated Show resolved Hide resolved
build/charts/theia/values.yaml Outdated Show resolved Hide resolved
build/charts/theia/values.yaml Outdated Show resolved Hide resolved
build/charts/theia/values.yaml Outdated Show resolved Hide resolved
build/charts/theia/values.yaml Outdated Show resolved Hide resolved
build/charts/theia/values.yaml Show resolved Hide resolved
Copy link
Contributor

@salv-orlando salv-orlando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM.

Due to operations done in this PR on Grafana's clickhouse queries - mostly replacing namespace name with variable, I suggest to merge PR #12 , and then rebase this one.

displayName: Theia
home: https://antrea.io/
version: 0.1.0-dev
appVersion: 0.1.0-dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to update these before we release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think we will need to update these in the release branch.

"rawQuery": "SELECT SUM(octetDeltaCount), (sourcePodName, destinationIP) AS pair\nFROM default.flows_pod_view\nWHERE flowEndSeconds >= toDateTime(1642534343) AND flowEndSeconds <= toDateTime(1642536143)\nAND flowType == 3\nAND sourcePodNamespace NOT IN ('kube-system', 'flow-aggregator', 'flow-visibility')\nGROUP BY pair",
"rawSql": "select SUM(octetDeltaCount) as bytes, sourcePodName as source, destinationIP as destination\nFrom flows_pod_view\nWHERE flowType == 3\nAND sourcePodNamespace NOT IN ('kube-system', 'flow-visibility', 'flow-aggregator')\nAND $__timeFilter(flowEndSeconds)\nGROUP BY source, destination\nHAVING bytes != 0",
"rawQuery": "SELECT SUM(octetDeltaCount), (sourcePodName, destinationIP) AS pair\nFROM default.flows_pod_view\nWHERE flowEndSeconds >= toDateTime(1642534343) AND flowEndSeconds <= toDateTime(1642536143)\nAND flowType == 3\nAND sourcePodNamespace NOT IN ('kube-system', 'flow-aggregator', '{{ .Release.Namespace }}')\nGROUP BY pair",
"rawSql": "select SUM(octetDeltaCount) as bytes, sourcePodName as source, destinationIP as destination\nFrom flows_pod_view\nWHERE flowType == 3\nAND sourcePodNamespace NOT IN ('kube-system', '{{ .Release.Namespace }}', 'flow-aggregator')\nAND $__timeFilter(flowEndSeconds)\nGROUP BY source, destination\nHAVING bytes != 0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As PR #12 is making some changes to query, we might want to merge that one, rebase this PR, and then make sure {{ .Release.Namespace }} is used in every query in place of flow-visibility

| clickhouse.storage.createPersistentVolume.type | string | `""` | Type of PersistentVolume. Can be set to "Local" or "NFS". Please set this value to use a PersistentVolume created by Theia. |
| clickhouse.storage.persistentVolumeClaimSpec | string | `nil` | Specification for PersistentVolumeClaim. This is ignored if createPersistentVolume.type is non-empty. To use a custom PersistentVolume, please set storageClassName: "" volumeName: "<my-pv>" To dynamically provision a PersistentVolume, please set storageClassName: "<my-storage-class>" Memory storage is used if both createPersistentVolume.type and persistentVolumeClaimSpec are empty. |
| clickhouse.storage.size | string | `"8Gi"` | ClickHouse storage size. Can be a plain integer or as a fixed-point number using one of these quantity suffixes: E, P, T, G, M, K. Or the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. |
| clickhouse.ttl | string | `"1 HOUR"` | Time to live for data in the ClickHouse. Can be a plain integer using one of these unit suffixes SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support small intervals, such as second? Seems it is hard to do anything if the ttl is set at second level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are units supported by ClickHouse. I think keeping it may be useful for users who prefer to use the small units like 3600 SECOND instead of 1 HOUR.

@@ -14,6 +14,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.

{{- $ttl := split " " .Values.clickhouse.ttl }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DAY, WEEK, MONTH, QUARTER, YEAR, we do not have any mapping here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map is only used to set merge_with_ttl_timeout. This value only requires to be set when TTL is under merge_with_ttl_timeout default value, 14400 (seconds). TTL does not need map as Interval is supported by the ClickHouse.

@yanjunz97
Copy link
Contributor Author

Code LGTM.

Due to operations done in this PR on Grafana's clickhouse queries - mostly replacing namespace name with variable, I suggest to merge PR #12 , and then rebase this one.

Thanks @salv-orlando This PR is rebased now. Would you like to take another look?

Copy link
Contributor

@salv-orlando salv-orlando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR LGTM!

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments, otherwise LGTM

$KUSTOMIZE edit set image clickhouse-monitor=projects.registry.vmware.com/antrea/theia-clickhouse-monitor:latest
$KUSTOMIZE edit add patch --path imagePullPolicy.yml --group clickhouse.altinity.com --version v1 --kind ClickHouseInstallation --name clickhouse
fi
HELM_TMP_DIR=$(mktemp -d $THIS_DIR/../build/yamls/chart-values.XXXXXXXX)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't seem to be using this anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused tmp dir. Thanks!

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: flow-visibility

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why the empty line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

2> >(grep -v 'This is insecure' >&2)\
> $MANIFEST

# Add flow-visibility Namespace resource by Kustomize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Add flow-visibility Namespace resource by Kustomize
# Add flow-visibility Namespace resource with Kustomize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 28 to 31
secret:
# -- ClickHouse username. It will be stored in a secret.
username: "clickhouse_operator"
# -- ClickHouse password. It will be stored in a secret.
password: "clickhouse_operator_password"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that these 2 fields will be stored in the same secret, and not 2 separate secrets?
I suggest changing this to:

# -- Credentials to connect to ClickHouse. They will be stored in a secret.
connectionSecret:
    username : "clickhouse_operator"
    password: "clickhouse_operator_password"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They are in one secret. Updated the value.

Comment on lines 80 to 82
secret:
# -- Grafana username. It will be stored in a secret.
username: "admin"
# -- Grafana password. It will be stored in a secret.
password: "admin"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

repository: "projects.registry.vmware.com/antrea/theia-grafana"
pullPolicy: "IfNotPresent"
tag: "8.3.3"
# -- Credentials to login Grafana. They will be stored in a secret.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

login to Grafana

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks!

Signed-off-by: Yanjun Zhou <zhouya@vmware.com>
@yanjunz97 yanjunz97 merged commit 275f371 into antrea-io:main May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants