Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A reason why instance would go in rescue mode ? #267

Closed
rgarrigue opened this issue Apr 20, 2020 · 11 comments
Closed

A reason why instance would go in rescue mode ? #267

rgarrigue opened this issue Apr 20, 2020 · 11 comments
Labels

Comments

@rgarrigue
Copy link
Contributor

Hi there

This is just a question. My AWS EC2 debian 10 instances won't reboot properly, they end up in rescue mode like this

[OK] Started Initial cloud-init…ob (metadata service crawler).
[OK] Reached target Network is Online.
[OK] Reached target Cloud-config availability.

You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"
to boot into default mode.

Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.

At this point, the instance is as good as dead, there's no console in EC2 to troubleshoot / resume boot.

I've no idea why it happens. Other instances are fines, the difference behing running os-hardening. Would you have any idea why it could happens ?

@rndmh3ro
Copy link
Member

Hey @rgarrigue,

are you sure you only use the os-hardening role and not the ssh-hardening role with ssh_allow_users?

If you're sure, can you tell me what AMI you use and what os-hardening version so I can try to reproduce this?

@rndmh3ro rndmh3ro added the bug label Apr 20, 2020
@rgarrigue
Copy link
Contributor Author

I use both. About vars, I have

os_auth_pw_max_age: 99999
os_auditd_enabled: false

Here are my pinned versions

- src: dev-sec.os-hardening
  version: 5.2.1

- src: dev-sec.ssh-hardening
  version: 7.0.0

And about the AMI, I reproduced before opening this issue with this one (us-east-1)

    image_id              = "ami-0dedf6a6502877301"
    image_location        = "136693071363/debian-10-amd64-20191117-80"

I'm using Debian's official as listed here https://wiki.debian.org/Cloud/AmazonEC2Image/Buster, to be accurate I'm using Terraform which is looking up for the latest *debian-10-amd64* by owner 136693071363 on a new autoscaling group, the AMI id behing pinned for older ASG like example above

@AFriemann
Copy link

running into the same issue, it's definitely this role. ssh_allow_users is disabled.

I'm trying some more things, e.g. disabling the ufw stuff and will let you know if I find a solution.

@rgarrigue
Copy link
Contributor Author

Ok, I'm eager to understand why tweaking a SSH related option would send the OS in rescue mode.

@AFriemann
Copy link

the odd thing is that we use this without any issues on amazonlinux2, yet on Debian machines become entirely inaccessible. So it's either some problem in the role or an incompatibility with Debian/Buster.

@AFriemann
Copy link

Ok, I now got this down to vfat being disabled via modprobe.

whitelisting it

    os_filesystem_whitelist:
    - vfat

makes instances accessible again.

@AFriemann
Copy link

AFriemann commented Apr 24, 2020

the test

- name: check if efi is installed
  stat:
    path: "/sys/firmware/efi"
  register: efi_installed

clearly is not enough. the path doesn't exist, yet we have a vfat device mounted

/dev/xvda15 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro)

I think just checking for efi and then blindly disabling vfat is really unreasonable tbh.

there should be a check for existing mount points that adds used filesystems to the whitelist.

@rndmh3ro
Copy link
Member

Thanks for debugging this @AFriemann.

I think just checking for efi and then blindly disabling vfat is really unreasonable tbh.

Well that is the commonly accepted way to check this. At least it was, obviously it's not working anymore.

there should be a check for existing mount points that adds used filesystems to the whitelist.

That's a great idea. Do you want to create a PR for this?

@rgarrigue
Copy link
Contributor Author

Whitelisting vfat fixes it for me. Many thanks @AFriemann. Just out of curiosity, how did you debuggued it ?

@AFriemann
Copy link

Brute force @rgarrigue :D

Created an ami without this and checked mount points, saw vfat and from there it was pretty clear what the likely culprit was.

I'll try to write a fix this week @rndmh3ro

rndmh3ro added a commit that referenced this issue Jul 24, 2020
Only manage moduli when hardening server
@rndmh3ro
Copy link
Member

rndmh3ro commented Feb 7, 2021

Fixed by #289

@rndmh3ro rndmh3ro closed this as completed Feb 7, 2021
divialth pushed a commit to divialth/ansible-collection-hardening that referenced this issue Aug 3, 2022
Only manage moduli when hardening server
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants