New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS Login implementation is incorrect #2503

Closed
andor44 opened this Issue Sep 18, 2018 · 8 comments

Comments

Projects
None yet
3 participants
@andor44

andor44 commented Sep 18, 2018

Issue Report

Bug

Container Linux Version

1897.0.0 Alpha

Environment

GCE

Expected Behavior

OS Login to work, but only when it's actually enabled for the VM/project.

Actual Behavior

First off, OS Login should only be allowed when it is enabled for the project and/or VM. See here for reference. As far as I can see this is not checked anywhere, it will attempt to enable OS Login unconditionally.

Second, the mechanism to enable it on first boot does not work. On a freshly booted CL Alpha GCE VM I see:

core@instance-1 ~ $ systemctl status oem-gce-enable-oslogin.service
● oem-gce-enable-oslogin.service - Enable GCE OS Login
   Loaded: loaded (/etc/systemd/system/oem-gce-enable-oslogin.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Tue 2018-09-18 14:20:22 UTC; 3min 52s ago
  Process: 703 ExecStart=/usr/share/oem/bin/enable-oslogin (code=exited, status=0/SUCCESS)
 Main PID: 703 (code=exited, status=0/SUCCESS)

Sep 18 14:20:22 localhost systemd[1]: Starting Enable GCE OS Login...
Sep 18 14:20:22 localhost enable-oslogin[703]: /etc/ssh/sshd_config is not a symlink to /usr/share/ssh/sshd_config. Not enabling OS Login
Sep 18 14:20:22 localhost systemd[1]: Started Enable GCE OS Login.

It is a symlink though:

core@instance-1 ~ $ readlink -f /etc/ssh/sshd_config
/usr/share/ssh/sshd_config

As dm0 suggested on IRC the oem-gce-enable-oslogin.service unit is probably racing with whatever is creating /usr/share/ssh/sshd_config, though that's just a guess.

Running sudo /usr/share/oem/bin/enable-oslogin manually enables OS Login correctly. After logging in with an OS Login user there's a minor cosmetic hiccup because /etc/profile.d/coreos-profile.sh will try to display the number of failed units resulting in Failed to list units: Access denied. Maybe modify the profile so that this is only ran if the user has sudo access, and run systemctl list-units with sudo?

Reproduction Steps

  1. Create project or VM metadata item enable-oslogin=TRUE (CL doesn't actually care whether it's on at the moment, but this is what gcloud checks to see if it should attempt OS Login-style login)
  2. Start GCE VM with CL image 1897.0.0 Alpha
  3. Attempt to log in with OS Login: gcloud compute ssh <instance-name>
  4. End up with a password prompt
@ajeddeloh

This comment has been minimized.

Show comment
Hide comment
@ajeddeloh

ajeddeloh Sep 18, 2018

I just hit this myself, we hit a systemd race which we fixed last minute, but it looks like it's now racing with systemd tmpfiles and failing because of that. I'm working on a fix now.

ajeddeloh commented Sep 18, 2018

I just hit this myself, we hit a systemd race which we fixed last minute, but it looks like it's now racing with systemd tmpfiles and failing because of that. I'm working on a fix now.

@ajeddeloh

This comment has been minimized.

Show comment
Hide comment
@ajeddeloh

ajeddeloh Sep 18, 2018

As a workaround, you can manually run /usr/share/oem/bin/enable-oslogin

ajeddeloh commented Sep 18, 2018

As a workaround, you can manually run /usr/share/oem/bin/enable-oslogin

@ajeddeloh

This comment has been minimized.

Show comment
Hide comment
@ajeddeloh

ajeddeloh Sep 18, 2018

Fixed via coreos/coreos-overlay#3414 which should be in the next alpha. You'll need to re-provision to pick up the changes (since the OEM partition does not get updates and this is only running on first boot). Thanks for the report!

First off, OS Login should only be allowed when it is enabled for the project and/or VM. See here for reference. As far as I can see this is not checked anywhere, it will attempt to enable OS Login unconditionally.

We don't do any checking because if it's disabled on the cloud side the bits we enabled will be a no-op, which is fine. Enabling it on the VM should not break the non-oslogin paths (if it does, that's a bug, please let us know!). Is there some other concern you have with blanket enabling it by default?

After talking with the GCE folks they are supportive of configuring it at first boot, since it being either enabled or disabled at boot fits well with the "immutable OS" concept. We didn't design it to meant to be toggled on and off after provisioning for CL. We defaulted to "on" so when oslogin is enabled on the cloud console side, it works with existing vms.

ajeddeloh commented Sep 18, 2018

Fixed via coreos/coreos-overlay#3414 which should be in the next alpha. You'll need to re-provision to pick up the changes (since the OEM partition does not get updates and this is only running on first boot). Thanks for the report!

First off, OS Login should only be allowed when it is enabled for the project and/or VM. See here for reference. As far as I can see this is not checked anywhere, it will attempt to enable OS Login unconditionally.

We don't do any checking because if it's disabled on the cloud side the bits we enabled will be a no-op, which is fine. Enabling it on the VM should not break the non-oslogin paths (if it does, that's a bug, please let us know!). Is there some other concern you have with blanket enabling it by default?

After talking with the GCE folks they are supportive of configuring it at first boot, since it being either enabled or disabled at boot fits well with the "immutable OS" concept. We didn't design it to meant to be toggled on and off after provisioning for CL. We defaulted to "on" so when oslogin is enabled on the cloud console side, it works with existing vms.

@andor44

This comment has been minimized.

Show comment
Hide comment
@andor44

andor44 Sep 18, 2018

Thanks for the speedy fix!

As far as the activation goes: I see the reasoning behind it and it makes sense. That being said I could imagine a scenario where it is enabled on the project level, but you might want to opt out a few specific VMs for whatever reason, so you'd have oslogin-enabled=TRUE on project level, but you'd have it overridden as oslogin-enabled=FALSE on specific VMs. This is orthogonal to immutable or not. It'd be as simple as adding

if [ "$(curl -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/metadata/enable-oslogin)" == "FALSE" ]; then
    echo "OS Login is disabled for this VM, not enabling it."
    exit 0
fi

to the top of the enabler script.

andor44 commented Sep 18, 2018

Thanks for the speedy fix!

As far as the activation goes: I see the reasoning behind it and it makes sense. That being said I could imagine a scenario where it is enabled on the project level, but you might want to opt out a few specific VMs for whatever reason, so you'd have oslogin-enabled=TRUE on project level, but you'd have it overridden as oslogin-enabled=FALSE on specific VMs. This is orthogonal to immutable or not. It'd be as simple as adding

if [ "$(curl -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/metadata/enable-oslogin)" == "FALSE" ]; then
    echo "OS Login is disabled for this VM, not enabling it."
    exit 0
fi

to the top of the enabler script.

@ajeddeloh

This comment has been minimized.

Show comment
Hide comment
@ajeddeloh

ajeddeloh Sep 18, 2018

Hrmm. My main concern with that is if someone wanted to turn on oslogin later, they can't without manual intervention. If you opt out for a vm then it's still deactivated, even though the bits are included and activated in the OS; gce will fail login attempts if oslogin is off. When someone tries to log in as if they were using oslogin the os will try the nss/pam modules but gce will hand back responses causing it to fail and prevent the login.

I think I want to keep it as is since having it on doesn't do any harm and users might want to activate it later without reprovisioning (assuming they have a CL provisioned with a initial version new enough).

As a side note, you could add that check as a (really ugly) ExecStartPre=/usr/bin/bash -c '....' in a dropin to the oem-gce-enable-oslogin.service unit if that use case is important to you.

ajeddeloh commented Sep 18, 2018

Hrmm. My main concern with that is if someone wanted to turn on oslogin later, they can't without manual intervention. If you opt out for a vm then it's still deactivated, even though the bits are included and activated in the OS; gce will fail login attempts if oslogin is off. When someone tries to log in as if they were using oslogin the os will try the nss/pam modules but gce will hand back responses causing it to fail and prevent the login.

I think I want to keep it as is since having it on doesn't do any harm and users might want to activate it later without reprovisioning (assuming they have a CL provisioned with a initial version new enough).

As a side note, you could add that check as a (really ugly) ExecStartPre=/usr/bin/bash -c '....' in a dropin to the oem-gce-enable-oslogin.service unit if that use case is important to you.

@lucab

This comment has been minimized.

Show comment
Hide comment
@lucab

lucab Sep 19, 2018

Member

Perhaps we can have coreos-metadata expose COREOS_GCE_ENABLE_OSLOGIN, make enable-oslogin conditional on that (either in the script or in the service unit), and default it to true (as coreos-metadata is not running by default). It wouldn't change the current behavior, but it would make opting-out a simple dropin to run coreos-metadata and source the env-file.

Member

lucab commented Sep 19, 2018

Perhaps we can have coreos-metadata expose COREOS_GCE_ENABLE_OSLOGIN, make enable-oslogin conditional on that (either in the script or in the service unit), and default it to true (as coreos-metadata is not running by default). It wouldn't change the current behavior, but it would make opting-out a simple dropin to run coreos-metadata and source the env-file.

@andor44

This comment has been minimized.

Show comment
Hide comment
@andor44

andor44 Sep 19, 2018

Either way sounds reasonable to me. To be perfectly honest I don't have a horse in this race as we're not disabling OS Login on a per-VM basis, I just figured it might be worth mentioning that it's a valid use-case. I hadn't considered that the authentication server on Google's side would reject the login attempt if it's turned off, so in the end it probably really doesn't make a difference if it's turned on unconditionally on the OS side.

andor44 commented Sep 19, 2018

Either way sounds reasonable to me. To be perfectly honest I don't have a horse in this race as we're not disabling OS Login on a per-VM basis, I just figured it might be worth mentioning that it's a valid use-case. I hadn't considered that the authentication server on Google's side would reject the login attempt if it's turned off, so in the end it probably really doesn't make a difference if it's turned on unconditionally on the OS side.

@ajeddeloh

This comment has been minimized.

Show comment
Hide comment
@ajeddeloh

ajeddeloh Sep 25, 2018

Closing since having it disabled on the cloud side also disables it.

ajeddeloh commented Sep 25, 2018

Closing since having it disabled on the cloud side also disables it.

@ajeddeloh ajeddeloh closed this Sep 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment