Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow user instance monitoring #835

Merged

Conversation

jaideepr97
Copy link
Collaborator

@jaideepr97 jaideepr97 commented Dec 22, 2022

What type of PR is this?

Uncomment only one /kind line, and delete the rest.
For example, > /kind bug would simply become: /kind bug

/kind bug
/kind chore
/kind cleanup
/kind failing-test
/kind enhancement
/kind documentation
/kind code-refactoring

What does this PR do / why we need it:
This PR allows users to enable workload status monitoring for a given argo-cd instance. Users can enable monitoring for their instance by setting .spec.monitoring.enabled to true, which creates a PrometheusRule for alerts (with 7 opinionated alert rules within it - one for each workload). The rules are configured to fire when a workload has not had the required number of ready replicas for a certain duration of time out of the box. Users are free to delete or make changes to the alert rule and the operator would not overwrite the action.

Have you updated the necessary documentation?
Not yet

  • Documentation update is required by this PR.
  • Documentation has been updated.

Which issue(s) this PR fixes:

Fixes #?

How to test changes / Special notes to the reviewer:

  1. Launch operator locally against an openshift cluster or any cluster that contains the prometheus API
  2. create a namespace and argo-cd instance within that namespace
  3. If using openshift - enable user workload monitoring for your cluster by running oc -n openshift-monitoring edit configmap cluster-monitoring-config and setting data.config.yaml.enableUserWorkload = true (https://docs.openshift.com/container-platform/4.6/monitoring/enabling-monitoring-for-user-defined-projects.html?extIdCarryOver=true&sc_cid=701f2000001Css5AAC#enabling-monitoring-for-user-defined-projects_enabling-monitoring-for-user-defined-projects)
  4. Set .spec.monitoring.enabled=true in CR
  5. Check that prometheusRule is created in the namespace
  6. On OpenShift you can go to Admin > Observe > Alerting > Alerting Rules to ensure that all 7 alerting rules show up in the list (Remove any pre-applied filters0)
  7. Set .spec.applicationSet.image=test-image to create a situation where # desired replicas for appset controller != # ready replicas
  8. Go back to Admin > Observe > Alerting > Alerts and you should see the ApplicationSetControllerNotReady alert in this list (remove all pre-applied filters) with state initially set as "pending" which should change to "firing" after 10m

Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
@jaideepr97 jaideepr97 marked this pull request as ready for review January 11, 2023 23:56
@drpaneas
Copy link

Great work @jaideepr97 💪 🚀 🚀 I've added a couple of comments, but I wouldn't consider any of those as blockers.

  • Some corner test-cases are missing, but in general awesome work for adding unit-tests for the alerts!!!
  • This PR adds a new YAML manifest. Is this auto-generated or you wrote it by hand?
  • This can be done in separate story as well: a doc listing the alerts we support.

@jaideepr97
Copy link
Collaborator Author

@drpaneas Thanks for taking a look :) I don't see any comments on here though, did they get saved?

controllers/argocd/prometheus.go Show resolved Hide resolved
controllers/argocd/prometheus.go Show resolved Hide resolved
controllers/argocd/prometheus.go Show resolved Hide resolved
@drpaneas
Copy link

@drpaneas Thanks for taking a look :) I don't see any comments on here though, did they get saved?

sorry I forgot to submit them. They should be visible now ;)

Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Copy link

@drpaneas drpaneas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
@wtam2018
Copy link
Collaborator

Please add documentation in the same PR.

Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
Copy link
Collaborator

@iam-veeramalla iam-veeramalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for adding the documentation @jaideepr97

Approved.

@jaideepr97 jaideepr97 merged commit cbe0bbc into argoproj-labs:master Jan 23, 2023
@jaideepr97 jaideepr97 deleted the allow-user-instance-monitoring branch February 18, 2023 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants