-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IoTEdge Check: Check expired production certs #5699
IoTEdge Check: Check expired production certs #5699
Conversation
let certificate_info = CertificateValidity::parse( | ||
"Device CA certificate".to_owned(), | ||
check.device_ca_cert_path.clone().unwrap(), | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parse call is implemented here:
https://github.com/Azure/iotedge/blob/release/1.1/edgelet/iotedge/src/check/checks/identity_certificate_expiry.rs#L85
Are there any concerns that the parse will fail for certain production certificates? The relevant call the parse function uses is stack_from_pem
. As long as it is a pem it should be fine I think.
https://docs.rs/openssl/0.10.23/openssl/x509/struct.X509.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no concern.
} else if not_after < now { | ||
Ok(CheckResult::Failed( | ||
Context::new(format!( | ||
"The device CA cert has expired at {}. Renew the certificate to restore functionality.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the critical bit I have added. Let me know if I should change the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eustacea is there any recommended industry best practice on how far in advance someone should renew their production certificates before they expire? If so then I'm wondering whether it would be helpful to include a warning check to indicate the production certificate will expire in X days. Also if you have any feedback on wording.
@and-rewsmith I suggest we continue to reference our docs on production best practices for certificates. If someone hits this error then it seems like they'd benefit from the reminder. E.g. The device CA cert expired at {}. Renew the certificate to restore functionality. See https://aka.ms/iotedge-prod-checklist-certs for best practices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a case for production certs expiring within the next 6 months, but can change the time range upon response from @eustacea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@and-rewsmith I'm fine with the new language. I did some brief research and don't believe there is a one-size-fits-all answer to how far in advance customers want to be alerted to production certificate expiry. For example, the docs for Key Vault talk about setting an alert based on either a percentage lifetime remaining or number of days to expiry. Other places where I've seen mention of default values it tends to be 90 days or less (see for example Digicert). I suggest we go with 90 days as the warning threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some industries might have regulations on certs expiry but I'm not aware of a universal recommendation beyond adapting expiry to deployment/solution specific security profile (e.g. from threat modeling). Since we don't have the solution specific detail, would it be possible to specify this value through a config.toml parameter? A default threshold of 90 days sounds reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code change looks fine. The error's wording should be checked by Venkat or Micah.
e60a0f2
|
||
if autogenerated_certs { | ||
if not_after < now { | ||
CheckResult::Failed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expired autogenerated certs used to be a warning, but I made it a failure.
The iotedge check for device ca was added in this PR a long time ago:
https://github.com/Azure/iotedge/pull/1559/files#diff-5b05b8525be1ca907a0bc313fd71312d95a964b5f60d1903aa0fc2992a247027R1339
The PR approached from the angle that we have a warning of using self-signed certs and now we want to add cert expiry information to it. We should be checking cert expiry in general too though, which is what I have changed in this PR.
I have tested this locally and the check errors when the device ca cert is expired. There are currently no tests for this check.
Azure IoT Edge PR checklist:
This checklist is used to make sure that common guidelines for a pull request are followed.
General Guidelines and Best Practices
Testing Guidelines
Draft PRs
Draft
mode if it is:Note: We use the kodiakhq bot to merge PRs once the necessary checks and approvals are in place. When it merges a PR, kodiakhq converts the PR title to the commit title, PR description to the commit description, and squashes all the commits in the PR to a single commit. The net effect is that entire PR becomes a single commit. Please follow the best practices mentioned here for the PR title and description