Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement]: Deal with empty ssh key files #5305

Closed
ani-sinha opened this issue May 20, 2024 · 3 comments
Closed

[enhancement]: Deal with empty ssh key files #5305

ani-sinha opened this issue May 20, 2024 · 3 comments
Labels
enhancement New feature or request new An issue that still needs triage

Comments

@ani-sinha
Copy link
Contributor

Enhancement

Start an instance running centos on aws t2.large system, after reboot system via sysrq 'b', the system is not accessible via ssh

Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: sshd.service: Service RestartSec=42s expired, scheduling restart.
Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: sshd.service: Scheduled restart job, restart counter is at 14.
Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: Stopped OpenSSH server daemon.
Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: Stopped target sshd-keygen.target.
Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: Stopping sshd-keygen.target.
Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: Reached target sshd-keygen.target.
Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: Starting OpenSSH server daemon...
Feb 2 09:22:47 ip-10-22-1-137 sshd[1454]: Unable to load host key "/etc/ssh/ssh_host_rsa_key": invalid format
Feb 2 09:22:47 ip-10-22-1-137 sshd[1454]: Unable to load host key: /etc/ssh/ssh_host_rsa_key
Feb 2 09:22:47 ip-10-22-1-137 sshd[1454]: Unable to load host key "/etc/ssh/ssh_host_ecdsa_key": invalid format
Feb 2 09:22:47 ip-10-22-1-137 sshd[1454]: Unable to load host key: /etc/ssh/ssh_host_ecdsa_key
Feb 2 09:22:47 ip-10-22-1-137 sshd[1454]: Unable to load host key "/etc/ssh/ssh_host_ed25519_key": invalid format
Feb 2 09:22:47 ip-10-22-1-137 sshd[1454]: Unable to load host key: /etc/ssh/ssh_host_ed25519_key
Feb 2 09:22:47 ip-10-22-1-137 sshd[1454]: sshd: no hostkeys available – exiting.
Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: sshd.service: Main process exited, code=exited, status=1/FAILURE
Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: sshd.service: Failed with result 'exit-code'.
Feb 2 09:22:47 ip-10-22-1-137 systemd[1]: Failed to start OpenSSH server daemon.

Checking the disk we found that all ssh host key files are empty after reboot,

ls -l etc/ssh
rw-r----. 1 root ssh_keys      0 Feb  3 07:53 ssh_host_ecdsa_key
rw-rr-. 1 root root          0 Feb  3 07:53 ssh_host_ecdsa_key.pub
rw-r----. 1 root ssh_keys      0 Feb  3 07:53 ssh_host_ed25519_key
rw-rr-. 1 root root          0 Feb  3 07:53 ssh_host_ed25519_key.pub
rw-r----. 1 root ssh_keys      0 Feb  3 07:53 ssh_host_rsa_key
rw-rr-. 1 root root          0 Feb  3 07:53 ssh_host_rsa_key.pub

Version-Release number of selected component (if applicable):
cloud-init 22.1-8

How reproducible:
the image centos on aws t2.large can 100% reproduce

Steps to Reproduce:
reproduce in auto
$ os-tests --user ec2-user --keyfile /home/virtqe_s1.pem --platform_profile /home/aws.yaml -p test_reboot_simultaneous

Actual results:
cannot access system via ssh after boot up

Expected results:
system can access normally

Additional info:
Add a command 'sudo sync' before running 'echo b > /proc/sysrq-trigger & echo b > /proc/sysrq-trigger', and then run the case, all PASS.

Let's discuss if cloud-init can deal with this issue.

@ani-sinha ani-sinha added enhancement New feature or request new An issue that still needs triage labels May 20, 2024
@ani-sinha
Copy link
Contributor Author

I am working on a patch. Will post once I get some positive test results.

ani-sinha added a commit to ani-sinha/cloud-init that referenced this issue May 20, 2024
Sometimes, due to file system corruption from a linux kernel crash or for some
other reasons, the system may be left with zero byte ssh keyfiles. These zero
byte files are of no use and generate errors like the following upon next
reboot:
sshd[1454]: Unable to load host key: /etc/ssh/ssh_host_rsa_key
sshd[1454]: Unable to load host key "/etc/ssh/ssh_host_ecdsa_key": invalid format

Therefore, we should check for the presence of zero-byte files and if found
delete and regenerate the key files in order to recover from the error.

Fixes: canonicalGH-5305
Signed-off-by: Ani Sinha <anisinha@redhat.com>
ani-sinha added a commit to ani-sinha/cloud-init that referenced this issue May 20, 2024
Sometimes, due to file system corruption from a linux kernel crash or for some
other reasons, the system may be left with zero byte ssh keyfiles. These zero
byte files are of no use and generate errors like the following upon next
reboot:
sshd[1454]: Unable to load host key: /etc/ssh/ssh_host_rsa_key
sshd[1454]: Unable to load host key "/etc/ssh/ssh_host_ecdsa_key": invalid format

Therefore, we should check for the presence of zero-byte files and if found
delete and regenerate the key files in order to recover from the error.

Fixes: canonicalGH-5305
Signed-off-by: Ani Sinha <anisinha@redhat.com>
@blackboxsw
Copy link
Collaborator

Thank you @ani-sinha for both filing this issue and pursuing an upstream fix for this issue. I think the consensus reached on the related PR was that this error condition, where the kernel hasn't sync'd multiple files to disk due to a reboot in early boot, is something beyond a typical use-case that cloud-init needs to cope with as there are likely many such files corrupted byond just SSH host keys. Adding a single fix to ssh host key checking not a complete solution and appears to add logic around a case that is not aligned with the majority of support cases cloud-init should be resolving. I'll close this issue as "NOT planned" in case there is a significant objection that I missed.

@ani-sinha
Copy link
Contributor Author

No objection in closing the ticket.

@TheRealFalcon TheRealFalcon closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new An issue that still needs triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants