Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implementing certificate expiry detail in security dashboard #3000

Merged
merged 15 commits into from
Jun 28, 2024

Conversation

Hardikl
Copy link
Contributor

@Hardikl Hardikl commented Jun 18, 2024

image

@cla-bot cla-bot bot added the cla-signed label Jun 18, 2024
@Hardikl Hardikl linked an issue Jun 18, 2024 that may be closed by this pull request
@Hardikl
Copy link
Contributor Author

Hardikl commented Jun 19, 2024

image image

@Hardikl
Copy link
Contributor Author

Hardikl commented Jun 20, 2024

Updated the changes in .127 system for review.

image

Copy link
Collaborator

@cgrinds cgrinds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other comments shared in chat

grafana/dashboards/cmode/security.json Outdated Show resolved Hide resolved
conf/rest/9.12.0/certificate.yaml Outdated Show resolved Hide resolved
@@ -1900,14 +1900,185 @@
],
"type": "stat"
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this dashboard, root SVMs are excluded by default. Are we sure that root SVM certificates also need to be excluded for certificates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, While checking the security/certificates Rest call, there are no certificates records for root svms. We are good to go here.

Also, I would be adding the scope field, which shows cluster or svm to help customer to see the scope of the certificate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying:
a) that root SVMs do not have certificates or
b) that ONTAP does not return certificates for root SVMs

I think you're saying that root SVM have certificates, but ONTAP is not returning them? If that's the case, we should check the expiry for root SVM certificates some other way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there is little correction on my above note.

  • Out of admin and node svm (which we treat as root svm), admin svm can have certificates
  • But, It's cluster scope and not svm scope, (scope is only in Rest) which means we don't get svm name in certificate in Rest calls.
  • We would do little more work to get those detail in Rest, which we are already doing in existing plugin, and we have this comment as well for that reference.
    // Admin SVM certificate is cluster scoped, but the REST API does not return the SVM name in its response. Add here for ZAPI parity
  • As there are cluster scoped and svm scoped certificates in table, I need to remove the svm filter from query,
    So, even the SVM drop down have limited svms but this table shows certificates from all of them.
    Screenshot from .127 system
image

Just to note, Above the table the stats count is showing those admin svm's certificates only(admin svm is unique in cluster) and not all of them.

- name
- svm
- uuid
instance_labels:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to write a Prometheus alert for expiry time, what should the query be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is label, it's quite difficult to write alert, Let me explore to handle this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to use metric instead of label for this:
This is the alert query for certificates expiring within 1 month:
0 < (certificate_expiry_time{} - time()) < (30*24*3600)

I will add sample warning alert as well for reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good. Also let's add a alert for expired certificates.

@Hardikl
Copy link
Contributor Author

Hardikl commented Jun 21, 2024

Added 2 alerts,
one warning alert for certificates expiring within 1 month and second critical alert for certificates expired.
image

rahulguptajss
rahulguptajss previously approved these changes Jun 21, 2024
@Hardikl
Copy link
Contributor Author

Hardikl commented Jun 24, 2024

image

@Hardikl
Copy link
Contributor Author

Hardikl commented Jun 24, 2024

security_certificate.yml creates 2 metrics, one security_certificate_labels with extra 2 labels and one metric security_certificate_expiry_time which never used in dashboard.

harvest % curl -s http://localhost:13001/metrics | grep -Ev "#|metadata_" | grep security 
security_certificate_labels{certificateExpiryStatus="expired",certificateIssuerType="self_signed",cluster="umeng-aff300-01-02",datacenter="rest",name="umeng-aff300-01-02_172BB302639CAC9B",scope="cluster",serial_number="172BB302639CAC9B",svm="umeng-aff300-01-02",type="server",uuid="806b6fe6-6ef8-11ed-a5cd-00a098d39e12"} 1.0
security_certificate_expiry_time{cluster="umeng-aff300-01-02",datacenter="rest",uuid="806b6fe6-6ef8-11ed-a5cd-00a098d39e12"} 1701160909
harvest % curl -s http://localhost:13001/metrics | grep -Ev "#|metadata_" | grep security  | wc -l
       2

So, to re-use this existing template, only one change is required in dashboard where passing certificateExpiryStatus!=\"\",certificateIssuerType!=\"\" in security_certificate_labels where earlier it was used all extra filters would be removed from the template.

rahulguptajss
rahulguptajss previously approved these changes Jun 24, 2024
@cgrinds cgrinds merged commit e1f5826 into main Jun 28, 2024
6 checks passed
@cgrinds cgrinds deleted the hl_certificate_feat branch June 28, 2024 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dashboard for monitoring certificate expiration
3 participants