Skip to content

feat: Make IAM role name configurable in CSP health monitor for EKS#877

Merged
KaivalyaMDabhadkar merged 2 commits intoNVIDIA:mainfrom
KaivalyaMDabhadkar:configure-csp-monitor-807
Feb 18, 2026
Merged

feat: Make IAM role name configurable in CSP health monitor for EKS#877
KaivalyaMDabhadkar merged 2 commits intoNVIDIA:mainfrom
KaivalyaMDabhadkar:configure-csp-monitor-807

Conversation

@KaivalyaMDabhadkar
Copy link
Contributor

@KaivalyaMDabhadkar KaivalyaMDabhadkar commented Feb 17, 2026

Summary

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📚 Documentation
  • 🔧 Refactoring
  • 🔨 Build/CI

Component(s) Affected

  • Core Services
  • Documentation/CI
  • Fault Management
  • Health Monitors
  • Janitor
  • Other: ____________

Testing

  • Tests pass locally
  • Manual testing completed
  • No breaking changes (or documented)

Checklist

  • Self-review completed
  • Documentation updated (if needed)
  • Ready for review

Testing

Completed manual testing in a dev AWS cluster. Verified that the change is working as expected with the set iamRoleName in the values.yaml:

$ kubectl get serviceaccount csp-health-monitor -n nvsentinel -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::381491907752:role/ta135790-dgxc-csp-health-monitor-role
    meta.helm.sh/release-name: nvsentinel
    meta.helm.sh/release-namespace: nvsentinel
  creationTimestamp: "2026-02-18T06:58:47Z"
  labels:
    app.kubernetes.io/instance: nvsentinel
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: csp-health-monitor
    app.kubernetes.io/version: 1.16.0
    helm.sh/chart: csp-health-monitor-0.1.0
  name: csp-health-monitor
  namespace: nvsentinel
  resourceVersion: "181030424"
  uid: 0a8b2685-b88e-4310-a949-4244396aa6be
kdabhadkar@NV-60MP224:~/workspace_go/src/bcp-dot-next-deployment/bcp-release$ kubectl get pods -n nvsentinel -l app.kubernetes.io/name=csp-health-monitor
NAME                                 READY   STATUS    RESTARTS       AGE
csp-health-monitor-96d485777-4p64l   2/2     Running   4 (105s ago)   2m57s
kdabhadkar@NV-60MP224:~/workspace_go/src/bcp-dot-next-deployment/bcp-release$ kubectl logs -n nvsentinel -l app.kubernetes.io/name=csp-health-monitor -c csp-health-monitor --tail=50
2026/02/18 07:00:13 INFO Registered datastore provider provider=mongodb
2026/02/18 07:00:13 INFO Registering MongoDB builder factory
{"time":"2026-02-18T07:00:13.047067436Z","level":"INFO","source":{"function":"main.main","file":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/cmd/csp-health-monitor/main.go","line":62},"msg":"Starting csp-health-monitor","module":"csp-health-monitor","version":"v0.7.0","version":"v0.7.0","commit":"3f98a4562f6ab5396359adfd7ccbce3285619bd5","date":"2026-01-20T13:10:26Z"}
{"time":"2026-02-18T07:00:13.04743659Z","level":"INFO","source":{"function":"github.com/nvidia/nvsentinel/commons/pkg/server.NewServer","file":"github.com/nvidia/nvsentinel/commons@v0.0.0/pkg/server/server.go","line":390},"msg":"server initialized","module":"csp-health-monitor","version":"v0.7.0","port":2112,"read_timeout":10000000000,"write_timeout":10000000000}
{"time":"2026-02-18T07:00:13.047473615Z","level":"INFO","source":{"function":"main.run","file":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/cmd/csp-health-monitor/main.go","line":208},"msg":"CSP Health Monitor (Main Container) components started successfully.","module":"csp-health-monitor","version":"v0.7.0"}
{"time":"2026-02-18T07:00:13.047478979Z","level":"INFO","source":{"function":"main.run.func1","file":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/cmd/csp-health-monitor/main.go","line":154},"msg":"Starting metrics server","module":"csp-health-monitor","version":"v0.7.0","port":2112}
{"time":"2026-02-18T07:00:13.04751034Z","level":"INFO","source":{"function":"main.run.func2","file":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/cmd/csp-health-monitor/main.go","line":164},"msg":"Initializing datastore connection...","module":"csp-health-monitor","version":"v0.7.0"}
{"time":"2026-02-18T07:00:13.047542431Z","level":"INFO","source":{"function":"github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher.pollTillCACertIsMountedSuccessfully","file":"github.com/nvidia/nvsentinel/store-client@v0.0.0/pkg/datastore/providers/mongodb/watcher/watch_store.go","line":724},"msg":"Trying to read CA cert","module":"csp-health-monitor","version":"v0.7.0","path":"/etc/ssl/client-certs/ca.crt"}
{"time":"2026-02-18T07:00:13.047589782Z","level":"INFO","source":{"function":"github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher.pollTillCACertIsMountedSuccessfully","file":"github.com/nvidia/nvsentinel/store-client@v0.0.0/pkg/datastore/providers/mongodb/watcher/watch_store.go","line":735},"msg":"Successfully read CA cert","module":"csp-health-monitor","version":"v0.7.0"}
{"time":"2026-02-18T07:00:13.047688412Z","level":"INFO","source":{"function":"github.com/nvidia/nvsentinel/commons/pkg/server.(*server).Serve","file":"github.com/nvidia/nvsentinel/commons@v0.0.0/pkg/server/server.go","line":502},"msg":"starting server","module":"csp-health-monitor","version":"v0.7.0","addr":":2112"}
{"time":"2026-02-18T07:00:13.048002762Z","level":"INFO","source":{"function":"github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher.confirmConnectivityWithDBAndCollection","file":"github.com/nvidia/nvsentinel/store-client@v0.0.0/pkg/datastore/providers/mongodb/watcher/watch_store.go","line":492},"msg":"Trying to ping database to confirm connectivity","module":"csp-health-monitor","version":"v0.7.0","database":"HealthEventsDatabase"}
{"time":"2026-02-18T07:00:32.552719096Z","level":"INFO","source":{"function":"github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher.confirmConnectivityWithDBAndCollection","file":"github.com/nvidia/nvsentinel/store-client@v0.0.0/pkg/datastore/providers/mongodb/watcher/watch_store.go","line":503},"msg":"Successfully pinged database to confirm connectivity","module":"csp-health-monitor","version":"v0.7.0","database":"HealthEventsDatabase"}
{"time":"2026-02-18T07:00:32.553017473Z","level":"INFO","source":{"function":"github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher.confirmConnectivityWithDBAndCollection","file":"github.com/nvidia/nvsentinel/store-client@v0.0.0/pkg/datastore/providers/mongodb/watcher/watch_store.go","line":521},"msg":"Confirmed that the collection exists in the database","module":"csp-health-monitor","version":"v0.7.0","collection":"MaintenanceEvents","database":"HealthEventsDatabase"}
{"time":"2026-02-18T07:00:32.553041254Z","level":"INFO","source":{"function":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/pkg/datastore.NewStore","file":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/pkg/datastore/datastore.go","line":91},"msg":"Database client initialized successfully using store-client","module":"csp-health-monitor","version":"v0.7.0","collection":"MaintenanceEvents"}
{"time":"2026-02-18T07:00:32.55305677Z","level":"INFO","source":{"function":"main.run.func2","file":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/cmd/csp-health-monitor/main.go","line":171},"msg":"Datastore initialized successfully.","module":"csp-health-monitor","version":"v0.7.0"}
{"time":"2026-02-18T07:00:32.553069456Z","level":"INFO","source":{"function":"main.run.func2","file":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/cmd/csp-health-monitor/main.go","line":180},"msg":"Event processor initialized successfully.","module":"csp-health-monitor","version":"v0.7.0"}
{"time":"2026-02-18T07:00:32.553075427Z","level":"INFO","source":{"function":"main.initActiveMonitor","file":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/cmd/csp-health-monitor/main.go","line":245},"msg":"AWS configuration is enabled.","module":"csp-health-monitor","version":"v0.7.0"}
{"time":"2026-02-18T07:00:32.553262663Z","level":"INFO","source":{"function":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/pkg/csp/aws.NewClient","file":"github.com/nvidia/nvsentinel/health-monitors/csp-health-monitor/pkg/csp/aws/aws.go","line":142},"msg":"Successfully initialized AWS Health client","module":"csp-health-monitor","version":"v0.7.0","region":"us-east-1"}

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for custom AWS IAM role names in CSP Health Monitor Helm chart configuration.
    • Default role naming remains unchanged, with automatic fallback when custom names are not provided.
  • Documentation

    • Updated configuration guides with new IAM role name option.
    • Added examples and IAM requirements for AWS deployments.
    • Included guidance for handling long cluster names.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

This change introduces optional custom IAM role name support (iamRoleName) in AWS IRSA configuration for the CSP Health Monitor Helm chart. The serviceaccount template now conditionally assigns the IAM role ARN using either a custom iamRoleName or the default pattern derived from clusterName, with supporting configuration and documentation updates.

Changes

Cohort / File(s) Summary
Template & Configuration
distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/serviceaccount.yaml, distros/kubernetes/nvsentinel/charts/csp-health-monitor/values.yaml
Added optional iamRoleName field to values.yaml. Updated serviceaccount template to conditionally set eks.amazonaws.com/role-arn using either custom iamRoleName or default pattern with clusterName suffix.
AWS Configuration Documentation
docs/configuration/csp-health-monitor.md, docs/csp-health-monitor-iam.md
Documented new iamRoleName option with default behavior, character limits (64-char IAM role, 19-char cluster name), and examples. Expanded guidance with AWS CLI commands and Helm YAML configuration snippets for custom IAM role setup.
Runbook Updates
docs/runbooks/csp-health-monitor-iam.md
Generalized role-name references to use <IAM_ROLE_NAME> placeholder instead of explicit cluster-specific pattern. Added clarification that role name derives from aws.iamRoleName if set, otherwise defaults to <CLUSTER_NAME>-nvsentinel-health-monitor-assume-role-policy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A hop and a bound through roles we have found,
Where custom names dance on the AWS ground,
No more truncated names when clusters grow tall,
The iamRoleName fix handles it all! 🎩✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding configurability for IAM role names in CSP health monitor for EKS. It is specific, concise, and directly reflects the primary feature addition across all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/configuration/csp-health-monitor.md`:
- Around line 144-147: The iamRoleName field is marked "(Optional)" but is
placed inside the "### Required Fields" section; update the docs so iamRoleName
is moved out of the required block—either create a "### Optional Fields"
subsection and place iamRoleName: "" under it or relocate that line after the
required fields block—ensure the header names ("### Required Fields" and the
field key iamRoleName) are adjusted accordingly to avoid contradictory
documentation.

@KaivalyaMDabhadkar KaivalyaMDabhadkar enabled auto-merge (squash) February 17, 2026 19:04
@KaivalyaMDabhadkar KaivalyaMDabhadkar merged commit d5784a4 into NVIDIA:main Feb 18, 2026
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Make IAM role name configurable in CSP health monitor for EKS

2 participants

Comments