-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle zero byte ssh key files #5306
Conversation
Sometimes, due to file system corruption from a linux kernel crash or for some other reasons, the system may be left with zero byte ssh keyfiles. These zero byte files are of no use and generate errors like the following upon next reboot: sshd[1454]: Unable to load host key: /etc/ssh/ssh_host_rsa_key sshd[1454]: Unable to load host key "/etc/ssh/ssh_host_ecdsa_key": invalid format Therefore, we should check for the presence of zero-byte files and if found delete and regenerate the key files in order to recover from the error. Fixes: canonicalGH-5305 Signed-off-by: Ani Sinha <anisinha@redhat.com>
@ani-sinha , thanks for the PR, but I don't think that this will work. The ssh module runs once per instance, meaning after first boot it won't run again unless you've launched a new instance (or if your filesystem corruption also didn't write the semaphore file). More generally, I'm not sure cloud-init should be concerned with trying to recover from filesystem corruption. If corruption broke your instance, then you have a broken instance. Why would we want to paper over that? In the case of SSH host keys, if one set of keys were generated (but not written to disk yet), then a new set of keys next boot would incorrectly indicate that there's a security issue. We don't want to signal that either. From the bug you filed:
|
I hear you and I agree with your points. However, I was wondering what would be the difference between no certs and zero-byte certs? Could we treat them in the same category and generate new certs in those cases?
That is just a test case scenario. Actual scenario would be a kernel crash for example due to a bug or for some other reason. When that happens, buffer cache is not sync'd and we end up with zero byte files. |
We still have the problem that the SSH module doesn't run on 2nd boot. Without some new machinery like a new module, we have no way of even checking these files on the next boot.
If your kernel is crashing within 30 seconds of first boot, I'm not sure why regenerating SSH host keys would be a primary concern. This still feels like a very contrived scenario to me and quite literally any file written by cloud-init (or any other process) could have the same problem. Why are we focusing on host keys? If your concern is troubleshooting or debugging, the serial console should still be usable. Is there something broader that you're trying to accomplish here or a use case that I'm not understanding? |
Yes I think that if the kernel crashed leaving a bunch of corrupted files around, there are lot of other things that may not function properly, not just ssh or cloud-init. It would be a system wide issue and not specific to cloud-init. Making cloud-init resilient may not make the system usable depending on the extent of the corruption. So I will close this PR since it seems to me that this problem is not worth fixing in cloud-init. |
Proposed Commit Message
Sometimes, due to file system corruption from a linux kernel crash or for some other reasons, the system may be left with zero byte ssh keyfiles. These zero byte files are of no use and generate errors like the following upon next reboot:
sshd[1454]: Unable to load host key: /etc/ssh/ssh_host_rsa_key sshd[1454]: Unable to load host key "/etc/ssh/ssh_host_ecdsa_key": invalid format
Therefore, we should check for the presence of zero-byte files and if found delete and regenerate the key files in order to recover from the error.
Fixes: GH-5305
Merge type