Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle zero byte ssh key files #5306

Closed
wants to merge 1 commit into from

Conversation

ani-sinha
Copy link
Contributor

Proposed Commit Message

Sometimes, due to file system corruption from a linux kernel crash or for some other reasons, the system may be left with zero byte ssh keyfiles. These zero byte files are of no use and generate errors like the following upon next reboot:
sshd[1454]: Unable to load host key: /etc/ssh/ssh_host_rsa_key sshd[1454]: Unable to load host key "/etc/ssh/ssh_host_ecdsa_key": invalid format

Therefore, we should check for the presence of zero-byte files and if found delete and regenerate the key files in order to recover from the error.

Fixes: GH-5305

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

Sometimes, due to file system corruption from a linux kernel crash or for some
other reasons, the system may be left with zero byte ssh keyfiles. These zero
byte files are of no use and generate errors like the following upon next
reboot:
sshd[1454]: Unable to load host key: /etc/ssh/ssh_host_rsa_key
sshd[1454]: Unable to load host key "/etc/ssh/ssh_host_ecdsa_key": invalid format

Therefore, we should check for the presence of zero-byte files and if found
delete and regenerate the key files in order to recover from the error.

Fixes: canonicalGH-5305
Signed-off-by: Ani Sinha <anisinha@redhat.com>
@TheRealFalcon
Copy link
Member

@ani-sinha , thanks for the PR, but I don't think that this will work. The ssh module runs once per instance, meaning after first boot it won't run again unless you've launched a new instance (or if your filesystem corruption also didn't write the semaphore file).

More generally, I'm not sure cloud-init should be concerned with trying to recover from filesystem corruption. If corruption broke your instance, then you have a broken instance. Why would we want to paper over that? In the case of SSH host keys, if one set of keys were generated (but not written to disk yet), then a new set of keys next boot would incorrectly indicate that there's a security issue. We don't want to signal that either.

From the bug you filed:

after reboot system via sysrq 'b', the system is not accessible via ssh

sysrq b is a bit extreme. Can you instead find a way to reboot that ensures that your filesystems get unmounted cleanly first? This sounds like a problem with the test harness, not cloud-init.

@ani-sinha
Copy link
Contributor Author

More generally, I'm not sure cloud-init should be concerned with trying to recover from filesystem corruption. If corruption broke your instance, then you have a broken instance. Why would we want to paper over that? In the case of SSH host keys, if one set of keys were generated (but not written to disk yet), then a new set of keys next boot would incorrectly indicate that there's a security issue. We don't want to signal that either.

I hear you and I agree with your points. However, I was wondering what would be the difference between no certs and zero-byte certs? Could we treat them in the same category and generate new certs in those cases?
Another thing I was thinking is, maybe cloud-init could be made more resilient in addressing this case.

sysrq b is a bit extreme. Can you instead find a way to reboot that ensures that your filesystems get unmounted cleanly first? This sounds like a problem with the test harness, not cloud-init.

That is just a test case scenario. Actual scenario would be a kernel crash for example due to a bug or for some other reason. When that happens, buffer cache is not sync'd and we end up with zero byte files.
The question then is, is there anything cloud-init can do to recover from this situation? Can we give users some option to regenerate the certs and get going?

@TheRealFalcon
Copy link
Member

However, I was wondering what would be the difference between no certs and zero-byte certs? Could we treat them in the same category and generate new certs in those cases?

We still have the problem that the SSH module doesn't run on 2nd boot. Without some new machinery like a new module, we have no way of even checking these files on the next boot.

Another thing I was thinking is, maybe cloud-init could be made more resilient in addressing this case.
Actual scenario would be a kernel crash for example due to a bug or for some other reason. When that happens, buffer cache is not sync'd and we end up with zero byte files.

If your kernel is crashing within 30 seconds of first boot, I'm not sure why regenerating SSH host keys would be a primary concern. This still feels like a very contrived scenario to me and quite literally any file written by cloud-init (or any other process) could have the same problem. Why are we focusing on host keys? If your concern is troubleshooting or debugging, the serial console should still be usable.

Is there something broader that you're trying to accomplish here or a use case that I'm not understanding?

@ani-sinha
Copy link
Contributor Author

If your kernel is crashing within 30 seconds of first boot, I'm not sure why regenerating SSH host keys would be a primary concern

Yes I think that if the kernel crashed leaving a bunch of corrupted files around, there are lot of other things that may not function properly, not just ssh or cloud-init. It would be a system wide issue and not specific to cloud-init. Making cloud-init resilient may not make the system usable depending on the extent of the corruption. So I will close this PR since it seems to me that this problem is not worth fixing in cloud-init.

@ani-sinha ani-sinha closed this May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[enhancement]: Deal with empty ssh key files
2 participants