New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ClickHouse monitor implementation #3498
Refactor ClickHouse monitor implementation #3498
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3498 +/- ##
===========================================
- Coverage 65.59% 54.50% -11.09%
===========================================
Files 268 383 +115
Lines 26780 42043 +15263
===========================================
+ Hits 17567 22917 +5350
- Misses 7314 16776 +9462
- Partials 1899 2350 +451
Flags with carried forward coverage won't be shown. Click here to find out more.
|
build/yamls/flow-visibility.yml
Outdated
@@ -4829,6 +4784,44 @@ spec: | |||
--- | |||
apiVersion: apps/v1 | |||
kind: Deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were going to use a new container within the same Deployment, and not a new Deployment?
There is value in keeping the number of Pods down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As We consider to deploy the ClickHouse cluster in future which may have multiple ClickHouse pods, but we only need one monitor for the whole ClickHouse cluster. Thus we split the monitor deployment from the ClickHouse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact is that at the moment, there is a single replica so there is no reason not to go with the simple solution.
Looking at the ClickHouse operator in more details it is very easy to add a container:
antrea/build/yamls/flow-visibility.yml
Line 5004 in 55e6797
containers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks Antonin, I updated the code to move the monitor to the ClickHouse pod.
// Retry connection to ClickHouse every 5 seconds if it fails. | ||
connRetryInterval = 5 * time.Second | ||
// Retry connection to ClickHouse every 10 seconds if it fails. | ||
connRetryInterval = 10 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why we changed the retry interval this time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed 4 to 5 retries at the beginning of the monitor as the ClickHouse server was not ready. Thought it would make sense to slightly lower the retry frequency.
I've thought about adding some sleeping time before the monitor starts, but it seems that does not make sense to sleep first if the monitor is recovered from a the crash.
4924745
to
100cd4d
Compare
build/yamls/flow-visibility.yml
Outdated
key: password | ||
name: clickhouse-secret | ||
- name: DB_URL | ||
value: tcp://clickhouse-clickhouse.flow-visibility.svc:9000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no need to use the svc DNS anymore. They are in the same network namespace now, so you can just use tcp://localhost:9000
another question related to @dreamtalen's comment above: I don't see a way to change the port to something other than 9000 for the ClickHouse server in this manifest. How would a user be able to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the DNS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we do not provide a clear way to change the port. I updated the code to allow that. But currently the user might need multiple steps to complete the change.
I'm not sure whether there are use cases that the ports only exposed to the ClickHouse pod required to be changed, but if the user would like to change the port exposed in the pod, they need to update
- The corresponding port in the
hostTemplates
inclickhouse.yml
- The
DB_URL
env used by the monitor inclickhouse.yml
- The port used to connect to the database when initializing the ClickHouse server in
create_table.sh
If the user would like to change the port exposed out of the pod, which is the clickhouse-clickhouse
service, they need to update
- The corresponding port in the
serviceTemplates
inclickhouse.yml
- The port used by the Grafana defined in
datasouce_provider.yml
- The
databaseURL
used by the Flow Aggregator defined in flow-aggregator.conf, which is introduced in PR Add ClickHouse Client #3196
I think it is not convenient for users to do all these changes just to use another port other than 9000. Would like to hear from @heanlan about suggestions on what we can do to make the change easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the investigation. It looks fine to me for now. @antoninbas previously mentioned we'll support Helm later. With Helm, port values at different places can be configured easily by sharing a common values.yaml.
Signed-off-by: Yanjun Zhou <zhouya@vmware.com>
Signed-off-by: Yanjun Zhou <zhouya@vmware.com>
43e3e53
to
9f6fcb8
Compare
Signed-off-by: Yanjun Zhou <zhouya@vmware.com>
After some more discussions with @heanlan , we do not come up with the use case that user may want to change the port exposed only inside the ClickHouse server container. And I think to enable this change may introduce some confusions. Thus I only keep the way to change the |
port: 8123 | ||
- name: tcp | ||
port: 9000 | ||
type: LoadBalancer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be a LoadBalancer Service? It's not accessed from outside the cluster, it seems that the typical deployment at the moment is to deploy all the flow visibility stuff in the same cluster as the workloads. If the cluster doesn't support LoadBalancer Services, then the external IP will always show as pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing this. It makes sense for me to use ClusterIP for now. Updated.
Signed-off-by: Yanjun Zhou <zhouya@vmware.com>
1bc8958
to
d730119
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/skip-all can skip tests based on code changes |
This PR implements the flow visibility ClickHouse monitor with a long-running pod instead of the Cronjob which is used before. This implementation brings the following advantages.
Signed-off-by: Yanjun Zhou zhouya@vmware.com