Skip to content

Add Kubernetes node problem detector #707

Description

@BigBen-7

Description

Deploy node problem detector for early hardware issue detection.

Requirements

  • NPD DaemonSet deployment
  • Custom problem definitions
  • Alert rules for node problems
  • Automatic node cordoning
  • Incident creation integration

Files to Touch

  • infrastructure/k8s/node-problem-detector/daemonset.yaml
  • infrastructure/k8s/node-problem-detector/config.yaml
  • infrastructure/monitoring/node-alerts.yml

Complexity: 250 points

Metadata

Metadata

Assignees

Labels

Stellar WaveIssues in the Stellar wave programinfrastructureInfrastructure and DevOpskubernetesKubernetes configsmonitoringMonitoring and observability

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions