Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway-api: Expose metric port setting #23104

Closed
wants to merge 1 commit into from

Conversation

sayboras
Copy link
Member

@sayboras sayboras commented Jan 16, 2023

Description

This is to make sure that users can disable and set port number for Gateway API controller accordingly.

Fixes: #23082
Reported-by: Karsten Nielsen karsten.nielsen@ingka.ikea.com
Signed-off-by: Tam Mach tam.mach@cilium.io

Tasks

Testing

Testing was done locally as per below

Installation with default value

docker@minikube:~$ sudo lsof -i :9965   
COMMAND      PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
cilium-op 206673 root   10u  IPv6 2432329      0t0  TCP *:9965 (LISTEN)

Installation with metric port disabled

docker@minikube:~$ sudo lsof -i TCP | grep cilium-op
cilium-op 213975   root    3u  IPv4 2457566      0t0  TCP localhost:9891 (LISTEN)
cilium-op 213975   root    7u  IPv4 2457570      0t0  TCP minikube:44918->minikube:8443 (ESTABLISHED)
cilium-op 213975   root    8u  IPv4 2466675      0t0  TCP localhost:9234 (LISTEN)
cilium-op 213975   root    9u  IPv4 2474711      0t0  TCP minikube:51888->minikube:8443 (ESTABLISHED)
cilium-op 213975   root   10u  IPv6 2477908      0t0  TCP *:37571 (LISTEN)

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 16, 2023
@sayboras sayboras added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Jan 16, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 16, 2023
@sayboras sayboras added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jan 16, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 16, 2023
@sayboras sayboras added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. area/servicemesh GH issues or PRs regarding servicemesh and removed release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jan 16, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 16, 2023
@sayboras sayboras marked this pull request as ready for review January 16, 2023 07:57
@sayboras sayboras requested review from a team as code owners January 16, 2023 07:57
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

install/kubernetes/cilium/values.yaml.tmpl Outdated Show resolved Hide resolved
operator/option/config.go Outdated Show resolved Hide resolved
@sayboras sayboras force-pushed the tam/gw-port-metrics branch 2 times, most recently from ee9c467 to 83cc64d Compare January 16, 2023 11:46
@squeed
Copy link
Contributor

squeed commented Jan 16, 2023

These are hostNetwork ports, which are a somewhat limited resource. Is there a reason the gateway controller uses a separate metrics port despite being in the same process?

FWIW the PR looks fine, just curious.

This is to make sure that users can disable and set port number for
Gateway API controller accordingly.

Fixes: cilium#23082
Reported-by: Karsten Nielsen <karsten.nielsen@ingka.ikea.com>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
@@ -625,6 +625,10 @@ gatewayAPI:
# This will automatically set enable-envoy-config as well.
enabled: false

# -- Metrics port for Gateway API controller.
# If set to 0, metrics will be disabled.
metricsPort: 9965
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think that metricsPort is the better name, all most other components (agent, operator, proxy, relay) use prometheus.port for the name, e.g. gatewayAPI.prometheus.port. Something to consider, I don't have strong feelings about this (and there are other agent metrics like Hubble (not relay) which do not use prometheus.port either).

@@ -217,6 +217,7 @@ data:
enable-envoy-config: "true"
enable-gateway-api-secrets-sync: {{ .Values.gatewayAPI.secretsNamespace.sync | quote }}
gateway-api-secrets-namespace: {{ .Values.gatewayAPI.secretsNamespace.name | quote }}
gateway-api-metrics-addr: ":{{ .Values.gatewayAPI.metricsPort }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this port to the metrics service as well, so they can be discovered via the ServiceMonitor infrastructure (though it seems that for the proxy, the infrastructure might also be incomplete).

https://github.com/cilium/cilium/blob/d914e6cd50965446fca04e71d4e199f3887c5872/install/kubernetes/cilium/templates/cilium-agent/service.yaml

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for @cilium/operator owned files.

Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in service mesh terms, but I also wondered about why it needs to be a separate port.

@gandro
Copy link
Member

gandro commented Jan 18, 2023

Regarding the multiple ports - my understanding here is that the metrics are exposed by a third-party library ("sigs.k8s.io/controller-runtime"), thus we are just exposing that. There is precedent for doing that: The envoy metrics of the Cilium pod are also exposed in a separate port.

The Hubble and Cilium-Agent metrics are also exposed on a different port, but the reason there is not technical, but more use-case driven: Having them on separate ports allows them to be scraped by different consumers (the Hubble metrics will be more interesting to app teams, while the Cilium metrics will be interesting to platform teams).

Having said that, the gateway-api and envoy metrics will likely have the same consumers as the regular cilium-agent metrics, i.e. platform teams. So I think it's worth having a discussion if it makes sense to expose then under a common port. But it would likely require us to implement some kind of gather functionality inside the agent itself, which I don't think I think is a larger undertaking that will not make it into v1.13.

@squeed
Copy link
Contributor

squeed commented Jan 20, 2023

While the controller-runtime supports starting up an independent metrics server, there's no need to use that. You can also access the registry separately via sigs.k8s.io/controller-runtime/pkg/metrics.Registry and attach it to the existing metrics server.

Or just overwrite it with a separate registry.

Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a datapath perspective.

I would personally prefer to fold these metrics into the main Cilium metrics registry, having multiple ports for metrics seems overkill. But I will let the other people already in this deal with that.

@sayboras sayboras marked this pull request as draft January 31, 2023 11:41
@sayboras
Copy link
Member Author

While the controller-runtime supports starting up an independent metrics server, there's no need to use that. You can also access the registry separately via sigs.k8s.io/controller-runtime/pkg/metrics.Registry and attach it to the existing metrics server. Or just overwrite it with a separate registry.

This is good point, I didn't know that Registry is global variable in controller package.

@sayboras
Copy link
Member Author

sayboras commented Feb 1, 2023

Closed in favour of #23501

@sayboras sayboras closed this Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/servicemesh GH issues or PRs regarding servicemesh release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gateway-api: Address already in use :8080
7 participants