[CELEBORN-1552] automatically support prometheus to scrape metrics for helm chart#2673
[CELEBORN-1552] automatically support prometheus to scrape metrics for helm chart#2673lianneli wants to merge 1 commit intoapache:mainfrom
Conversation
|
@RexXiong Could you please take a look at this? |
| {{- range $key, $val := .Values.celeborn }} | ||
| {{- if eq $key "celeborn.metrics.enabled" }} | ||
| {{- $metricsEnabled = $val -}} | ||
| {{- end }} |
There was a problem hiding this comment.
I have a question about helm chart indentation. Is there any guideline or best practice?
There was a problem hiding this comment.
@FMX Thanks for you review. The file _helpers.tpl is go template syntax, some complex logic can be coded in this file, I understand. The param celeborn.metrics.enabled, and others, is written in values.yaml as a single key, but the dot . inside key does not conform to the specification which cannot be used directly by .Values.xxx. So, I imitate the way in configMap.yaml, using the range syntax to loop through, and using if to scratch the value. The indentation and format in this file is not significant, and it is processed through variable in template/xx/service.yaml.

| {{- end }} | ||
| {{- end }} | ||
|
|
||
| {{/* |
There was a problem hiding this comment.
Is this also required for Celeborn Master?
There was a problem hiding this comment.
Is this also required for Celeborn Master?
@RexXiong Not required. Something interesting is although the port set in master service is not for metrics, prometheus still can scrape metrics successfully. This is not proper for worker service, since there is not any port set in it. So I only set the prometheus http port in worker service other than master worker. All the changes have tested in our clusters, containing both dev and prod environment.
Of Course, we can add the prometheus http port in master service for understanding more easily. Should I do that?
There was a problem hiding this comment.
Of Course, we can add the prometheus http port in master service for understanding more easily. Should I do that?
No, but I'm still very interested in why Prometheus can still scrape metrics successfully.
There was a problem hiding this comment.
Of Course, we can add the prometheus http port in master service for understanding more easily. Should I do that?
No, but I'm still very interested in why Prometheus can still scrape metrics successfully.
@RexXiong
Good question, and I made a test.
In k8s, the core role of Service is translating Pod's ip to domain name like "xxx.namespace.svc.cluster.local", whether the port set in Service is a real one. However, Service will not work if there is none port set.
Prometheus would search all the "on working" service, fetch annotations to get real port, then get Pod's ip though K8s Api to finish the metrics scratching.
That's why we need not set another port in Master Service.
There was a problem hiding this comment.
Sounds reasonable. Thanks.
…r helm chart ### What changes were proposed in this pull request? 1. Add Annotations to Master Service and Worker Service for automatically scraping by Prometheus. 2. Add Ports to Worker Service, since it's empty before that prometheus cannot connect to workers. ### Why are the changes needed? Although master and worker provide http interfaces, We still need add annotations manually for prometheus automatically scraping. ### Does this PR introduce _any_ user-facing change? No. Users will not feel any changes to install and use Celeborn. ### How was this patch tested? test locally and in dev environment. Before:  After:  Closes #2673 from lianneli/PR-1552. Authored-by: Lianne Li <lmlianne@outlook.com> Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com> (cherry picked from commit 1ea704f) Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
…r helm chart ### What changes were proposed in this pull request? 1. Add Annotations to Master Service and Worker Service for automatically scraping by Prometheus. 2. Add Ports to Worker Service, since it's empty before that prometheus cannot connect to workers. ### Why are the changes needed? Although master and worker provide http interfaces, We still need add annotations manually for prometheus automatically scraping. ### Does this PR introduce _any_ user-facing change? No. Users will not feel any changes to install and use Celeborn. ### How was this patch tested? test locally and in dev environment. Before:  After:  Closes apache#2673 from lianneli/PR-1552. Authored-by: Lianne Li <lmlianne@outlook.com> Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
What changes were proposed in this pull request?
Why are the changes needed?
Although master and worker provide http interfaces, We still need add annotations manually for prometheus automatically scraping.
Does this PR introduce any user-facing change?
No. Users will not feel any changes to install and use Celeborn.
How was this patch tested?
test locally and in dev environment.
Before:

After:
