Skip to content

[Feature] Add OpenShift monitoring required RBAC in OLM bundle#366

Merged
sajmera-pensando merged 1 commit intoROCm:release-v1.4.1from
yansun1996:ocp_monitor_rbac_v1.4.1
Oct 24, 2025
Merged

[Feature] Add OpenShift monitoring required RBAC in OLM bundle#366
sajmera-pensando merged 1 commit intoROCm:release-v1.4.1from
yansun1996:ocp_monitor_rbac_v1.4.1

Conversation

@yansun1996
Copy link
Copy Markdown
Member

Motivation

Collecting metrics with Prometheus documented the requirements to configure OpenShift cluster monitoring to scrape metrics from other namespace;

  1. label the namespace with
    openshift.io/cluster-monitoring: true

  2. Create the RBAC resources that this PR is trying to add.

  3. Create ServiceMonitor/PodMonitor

  • Item 1 needs to be documented

  • Item 2 needs to be added into OLM bundle

  • Item 3 could already got created by the released GPU Operator

Technical Details

this PR is adding documentation for Item 1 and adding RBAC resources into OLM bundle for Item 2.

Test Plan

Bring up a fresh new OpenShift cluster, follow the updated documentation to install NFD, KMM, AMD GPU Operator. Create a DeviceConfig to install amdgpu kmod, enable metrics exporter and serviceMonitor. Label the namespace to enable the cluster monitoring on target namespace.

Test Result

On the OpenShift web console, the Observe page shows active metrics exporter target and users could query AMD GPU metrics on Observe --- Metrics webpage.

Submission Checklist

Signed-off-by: yansun1996 <Yan.Sun3@amd.com>
Copy link
Copy Markdown
Collaborator

@sajmera-pensando sajmera-pensando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sajmera-pensando sajmera-pensando merged commit f7b0c37 into ROCm:release-v1.4.1 Oct 24, 2025
1 check passed
leslie-qiwa pushed a commit to leslie-qiwa/gpu-operator that referenced this pull request Feb 6, 2026
* Create a docker/shell build container

* Bump golang version to 1.23.0 for operator repo

* Upgrade golint to 1.63.4 and fix linter errors

- Upgrade linter to 1.63.4 as it supports go 1.23
- Fix new linter errors reported by golint 1.63.4

* Build openshift helm charts using kmm main branch

- Checkout to the commit in kmm openshift main branch that adds support
  for go 1.23

* Changes to operator build container

* Move to go1.21.13 and KMM release 2.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants