A reason why instance would go in rescue mode ? #267

rgarrigue · 2020-04-20T08:35:12Z

Hi there

This is just a question. My AWS EC2 debian 10 instances won't reboot properly, they end up in rescue mode like this

[OK] Started Initial cloud-init…ob (metadata service crawler).
[OK] Reached target Network is Online.
[OK] Reached target Cloud-config availability.

You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"
to boot into default mode.

Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.

At this point, the instance is as good as dead, there's no console in EC2 to troubleshoot / resume boot.

I've no idea why it happens. Other instances are fines, the difference behing running os-hardening. Would you have any idea why it could happens ?

The text was updated successfully, but these errors were encountered:

rndmh3ro · 2020-04-20T19:49:31Z

Hey @rgarrigue,

are you sure you only use the os-hardening role and not the ssh-hardening role with ssh_allow_users?

If you're sure, can you tell me what AMI you use and what os-hardening version so I can try to reproduce this?

rgarrigue · 2020-04-21T06:13:44Z

I use both. About vars, I have

os_auth_pw_max_age: 99999
os_auditd_enabled: false

Here are my pinned versions

- src: dev-sec.os-hardening
  version: 5.2.1

- src: dev-sec.ssh-hardening
  version: 7.0.0

And about the AMI, I reproduced before opening this issue with this one (us-east-1)

    image_id              = "ami-0dedf6a6502877301"
    image_location        = "136693071363/debian-10-amd64-20191117-80"

I'm using Debian's official as listed here https://wiki.debian.org/Cloud/AmazonEC2Image/Buster, to be accurate I'm using Terraform which is looking up for the latest *debian-10-amd64* by owner 136693071363 on a new autoscaling group, the AMI id behing pinned for older ASG like example above

AFriemann · 2020-04-24T14:43:30Z

running into the same issue, it's definitely this role. ssh_allow_users is disabled.

I'm trying some more things, e.g. disabling the ufw stuff and will let you know if I find a solution.

rgarrigue · 2020-04-24T14:52:11Z

Ok, I'm eager to understand why tweaking a SSH related option would send the OS in rescue mode.

AFriemann · 2020-04-24T15:04:02Z

the odd thing is that we use this without any issues on amazonlinux2, yet on Debian machines become entirely inaccessible. So it's either some problem in the role or an incompatibility with Debian/Buster.

AFriemann · 2020-04-24T15:36:22Z

Ok, I now got this down to vfat being disabled via modprobe.

whitelisting it

    os_filesystem_whitelist:
    - vfat

makes instances accessible again.

AFriemann · 2020-04-24T15:40:17Z

the test

- name: check if efi is installed
  stat:
    path: "/sys/firmware/efi"
  register: efi_installed

clearly is not enough. the path doesn't exist, yet we have a vfat device mounted

/dev/xvda15 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro)

I think just checking for efi and then blindly disabling vfat is really unreasonable tbh.

there should be a check for existing mount points that adds used filesystems to the whitelist.

rndmh3ro · 2020-04-25T18:37:58Z

Thanks for debugging this @AFriemann.

I think just checking for efi and then blindly disabling vfat is really unreasonable tbh.

Well that is the commonly accepted way to check this. At least it was, obviously it's not working anymore.

there should be a check for existing mount points that adds used filesystems to the whitelist.

That's a great idea. Do you want to create a PR for this?

rgarrigue · 2020-04-27T06:22:33Z

Whitelisting vfat fixes it for me. Many thanks @AFriemann. Just out of curiosity, how did you debuggued it ?

AFriemann · 2020-04-27T10:40:39Z

Brute force @rgarrigue :D

Created an ami without this and checked mount points, saw vfat and from there it was pretty clear what the likely culprit was.

I'll try to write a fix this week @rndmh3ro

Only manage moduli when hardening server

rndmh3ro · 2021-02-07T20:36:42Z

Fixed by #289

Only manage moduli when hardening server

rndmh3ro added the bug label Apr 20, 2020

rndmh3ro added a commit that referenced this issue Jul 24, 2020

Merge pull request #267 from jbronn/moduli-when-hardening

31c4895

Only manage moduli when hardening server

rndmh3ro closed this as completed Feb 7, 2021

divialth pushed a commit to divialth/ansible-collection-hardening that referenced this issue Aug 3, 2022

Merge pull request dev-sec#267 from jbronn/moduli-when-hardening

787356e

Only manage moduli when hardening server

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A reason why instance would go in rescue mode ? #267

A reason why instance would go in rescue mode ? #267

rgarrigue commented Apr 20, 2020

rndmh3ro commented Apr 20, 2020

rgarrigue commented Apr 21, 2020

AFriemann commented Apr 24, 2020

rgarrigue commented Apr 24, 2020

AFriemann commented Apr 24, 2020

AFriemann commented Apr 24, 2020

AFriemann commented Apr 24, 2020 •

edited

rndmh3ro commented Apr 25, 2020

rgarrigue commented Apr 27, 2020

AFriemann commented Apr 27, 2020

rndmh3ro commented Feb 7, 2021

A reason why instance would go in rescue mode ? #267

A reason why instance would go in rescue mode ? #267

Comments

rgarrigue commented Apr 20, 2020

rndmh3ro commented Apr 20, 2020

rgarrigue commented Apr 21, 2020

AFriemann commented Apr 24, 2020

rgarrigue commented Apr 24, 2020

AFriemann commented Apr 24, 2020

AFriemann commented Apr 24, 2020

AFriemann commented Apr 24, 2020 • edited

rndmh3ro commented Apr 25, 2020

rgarrigue commented Apr 27, 2020

AFriemann commented Apr 27, 2020

rndmh3ro commented Feb 7, 2021

AFriemann commented Apr 24, 2020 •

edited