no cloud agents: gcp #67

dustymabe · 2018-10-25T22:10:09Z

In #12 we decided that we'd like to try to not ship cloud agents. This ticket will document investigation and strategy for shipping without a cloud agent on the google cloud platform.

See also #41 for a discussion of how to ship cloud specific bits using ignition.

jdoss · 2018-10-30T16:24:06Z

See https://pagure.io/cloud-sig/issue/292#comment-538459 for information about how GCE needs the google-compute-engine-oslogin package installed from the Google Cloud Compute Repo to start the network. This most likely will impact shipping Fedora CoreOS on GCP without this package installed.

ajeddeloh · 2018-10-30T18:45:00Z

GCE wanted google-oslogin added to CL to remain at primary tier OS, so we implemented that. It shouldn't be needed for networking and the rest of the GCE stuff we can do in a container like we do on CL (although it might be worth revisiting it since we haven't in a while).

oslogin itself though needs to be implemented in the os since it messes with nsswitch, pam, and sshd. We'll need to conditionally enable it on gce (could be done with Ignition, and 3.0.0 will make it easier to optionally disable it)

dustymabe · 2018-10-31T16:58:06Z

FYI: rpm package reviews for oslogin rpm: https://pagure.io/fedora-server/issue/5#comment-538460

The current discussion in today's meeting was that we would possibly include the oslogin rpm and just conditionally enable it on gce.

ajeddeloh · 2018-11-01T17:59:11Z

A little background on oslogin:
On mutable distros there's this script which the agent uses to toggle oslogin on and off. For CL we decided we didn't want to ship that script (seems somewhat brittle if a user modifies those files themselves) and instead enable via a systemd oneshot that runs early on first boot.*

"Normal" fedora probably wants the google_oslogin_control script. I don't know if we want that for FCOS though (for similar reasons to why we don't ship it in CL). This means we'd need two seperate rpms unless dnf/rpm has something like gentoos INSTALL_MASK functionality.

*We should be able to do it all with Ignition with spec 3.0.0 (no systemd unit necessary). Trying to do it with the 2.x.y spec is what led me to discover that files, directories, and links are not declarative.

dustymabe · 2018-11-05T13:54:08Z

:(

so how do we implement that functionality without the google_oslogin_control script? are we going to have to continuously manage our version of the implementation? Could we somehow convince google to change the script to be more compatible with what we need?

I guess it's worth asking.. Do we need to ship google_oslogin at all or can we get by without it (which is the topic of this ticket anyway, right?)?

ajeddeloh · 2018-11-06T18:55:27Z

so how do we implement that functionality without the google_oslogin_control script?

On CL we don't; we say "you shouldn't be toggling host bits other than when provisioning". I don't know if that's the path we want to take for FCOS or not.

There's also the question of what is the default configuration and what does that look like with a managed /etc. If we ship with oslogin disabled but enable it with Ignition by default, that'll show up as a change in /etc. I don't know if we want that or not.

Do we need to ship google_oslogin

That's something we need to discuss with the GCE folks. For CL they said it was a requirement to be a first tier OS.

dustymabe · 2018-12-12T19:27:39Z

FYI: rpm package reviews for oslogin rpm: https://pagure.io/fedora-server/issue/5#comment-538460

reviews were approved.. packages should make their way into Fedora soon. Thanks @Conan-Kudo

Conan-Kudo · 2018-12-13T03:43:34Z

Bodhi updates submitted:

Fedora 28: https://bodhi.fedoraproject.org/updates/FEDORA-2018-ba5030068d
Fedora 29: https://bodhi.fedoraproject.org/updates/FEDORA-2018-a8c791535d

smarterclayton · 2019-01-30T02:54:32Z

The biggest blocker I've hit for using with OpenShift so far is forwarded IPs (set from instance metadata https://github.com/GoogleCloudPlatform/compute-image-packages/blob/master/google_compute_engine/distro_lib/ip_forwarding_utils.py#L78) - we have to use NLB for our front ends for masters, and so without the route being read from instance metadata and then set NLB health checks never go green.

E.g. for a forwarding rule the above reads curl -H "Metadata-Flavor:Google" "http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/forwarded-ips/0" and sets:

ip route add to local 35.222.92.223 dev eth0 proto 66 scope host

bgilbert · 2019-03-15T01:29:52Z

Enabling OS Login requires modifying several monolithic files (nsswitch.conf, /etc/pam.d/sshd, sshd_config) only on GCP, which is inconvenient.

The sshd_config changes are specifically to add an AuthorizedKeysCommand. The current plan for #139 is to implement our own AuthorizedKeysCommand to read authorized_keys.d fragments, and that command could chain to a script which conditionally runs the OS Login AuthorizedKeysCommand when enabled on GCP.

cevich · 2019-04-02T15:10:22Z

What about the google-startup-script (and shutdown script) "agents"? I have external integrated VM management service I depend on that uses it. My service downloads and starts it's own agent by injecting runtime-specific details through the google-startup-script service. It can't work through cloud-init or similar, it's specific to that google-startup-script API. Is this use-case covered?

bgilbert · 2019-04-02T16:14:00Z

@cevich We don't plan to support those startup/shutdown scripts. Fedora CoreOS should be configured by passing an Ignition config in userdata. That config can download and install your agent. Or, if you'd prefer to continue using your existing script, the Ignition config can install a ConditionFirstBoot systemd unit which runs the script.

cevich · 2019-04-02T17:29:32Z

You're assuming an open-source service, and that they care about supporting a one-off special case for an non-standard OS, on a major cloud platform. As a user of a a third-party service like this, that makes my choice effectively:

Ask them nicely and pray
Re-implement my entire stack for the sake of a "new OS"

There's nearly zero incentive for third-parties to add the required special-case support when every other OS happily plays along whether or not agent-services are a good idea. It doesn't even have to be GCE-specific, there are plenty of other third-parties which require host-agents. If the OS makes it difficult to integrate, the OS will simply be placed on the "not supported" list and loose out in the long-run. I don't think that's the desire of the community here, whatever the specific philosophy over "agents" is.

bgilbert · 2019-04-02T19:06:36Z

@cevich Fedora CoreOS is continuing the Container Linux philosophy of providing an opinionated, minimal, and reasonably legacy-free OS for running containers. Part of being opinionated is that not everyone will agree with our opinions, and that's okay. If you want flexibility beyond what Fedora CoreOS is prepared to provide, other distros (including Fedora Cloud Base) could be a great choice.

Fedora CoreOS doesn't support cloud-init, whose design has unfixable race conditions. It strongly discourages installing software in the host, in favor of running all user software in containers. It favors immutable infrastructure and reprovisioning rather than configuration management. So existing tooling will already need to be adapted to work well with Fedora CoreOS.

As to this bug, the principle is that provisioning setup for a machine should always be encapsulated into the Ignition config, rather than passed via a platform-specific agent.

cevich · 2019-04-03T14:54:33Z

Thanks for the details and explanation. For me that means I won't use this for testing container run-times and related tooling...ironically as that is. That said, being SO strongly opinionated seems (IMHO) to make this OS overly difficult to use. History provides plenty of examples where difficult-use inventions, are simply not used and therefor ultimately fail. I think the principals here are "cool", and would like to see it be successful. IMHO, that probably necessitates additional flexibility of opinions.

bgilbert · 2019-04-03T17:54:51Z

That said, being SO strongly opinionated seems (IMHO) to make this OS overly difficult to use.

I hope you'll give Fedora CoreOS a try when we're a little further along; it's easier to use than you might think. 😃 (We don't have much documentation right now, which is a problem, but we're working on that.)

Fedora CoreOS's opinions are pretty closely aligned with Container Linux, and they've served that community pretty well for several years now. We're trying to make things easier, not harder, honest.

cevich · 2019-04-03T19:10:38Z

I know you are. I'm just thinking of all the "agents" out there which must run with privileges, on the host, and may not be conducive to being written into container images. This will especially be a problem in cases where the software or service is closed-source/proprietary. The sad fact is, many environments are like this, especially in government and health-care. Requiring little bits of "internal malware" if you will...because management always knows best.

Possibly not an issue for Fedora, but as that rolls down into CentOS/elsewhere it will become a monumental obstacle to adoption. As in my case, the user's choice may literally be: "Ask nice and pray" 😞 We have exactly zero control/influence with what third-parties do, especially with cloud APIs that we also have no control over.

darkmuggle · 2019-05-17T17:26:10Z

I recently did a deep dive on the agent for GCE for some work getting Openshift to run in GCE. The use-case that I needed to solve was the L4 (aka Network Load Balancers)

I came up with a proof-of-concept [1] which only runs the Network Configuration and the Clock Skew daemon.
podman run -d --privileged=true --net=host quay.io/behoward/gce-container works as expected.

bgilbert · 2020-01-17T22:26:26Z

For the record, the new Go-based GCP agent is here and the new OSLogin repo is here.

Conan-Kudo · 2020-01-17T22:31:24Z

Yeah, I found out they were reworking this so I had stopped my rebase work. I guess it's ready now to be put into Fedora...

Not that I like Go at all for this (I really, really, really don't), but at least this means it's shippable for FCOS.

dustymabe · 2020-10-14T20:03:46Z

I broke the OS Login part out into #648. I'm going to close this ticket since we've got a GCP image now and no agent seems to be going fine. We can start new discussions in new tickets.

dustymabe mentioned this issue Dec 12, 2018

tracker: 'cloud'/'no cloud agents' work #95

Open

56 tasks

dustymabe added the cloud* related to public/private clouds label Dec 13, 2018

dustymabe added the jira for syncing to jira label Jan 9, 2019

bgilbert changed the title ~~no cloud agents: gce~~ no cloud agents: gcp Mar 20, 2019

dustymabe removed the jira for syncing to jira label Sep 5, 2019

jlebon mentioned this issue Mar 9, 2020

Fedora Coreos no ssh connection possible #405

Closed

dustymabe mentioned this issue Oct 14, 2020

support OS Login for GCP #648

Open

dustymabe closed this as completed Oct 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no cloud agents: gcp #67

no cloud agents: gcp #67

dustymabe commented Oct 25, 2018 •

edited by bgilbert

jdoss commented Oct 30, 2018

ajeddeloh commented Oct 30, 2018

dustymabe commented Oct 31, 2018

ajeddeloh commented Nov 1, 2018

dustymabe commented Nov 5, 2018

ajeddeloh commented Nov 6, 2018

dustymabe commented Dec 12, 2018

Conan-Kudo commented Dec 13, 2018

smarterclayton commented Jan 30, 2019 •

edited

bgilbert commented Mar 15, 2019

cevich commented Apr 2, 2019

bgilbert commented Apr 2, 2019

cevich commented Apr 2, 2019

bgilbert commented Apr 2, 2019

cevich commented Apr 3, 2019

bgilbert commented Apr 3, 2019

cevich commented Apr 3, 2019

darkmuggle commented May 17, 2019

bgilbert commented Jan 17, 2020

Conan-Kudo commented Jan 17, 2020

dustymabe commented Oct 14, 2020

no cloud agents: gcp #67

no cloud agents: gcp #67

Comments

dustymabe commented Oct 25, 2018 • edited by bgilbert

jdoss commented Oct 30, 2018

ajeddeloh commented Oct 30, 2018

dustymabe commented Oct 31, 2018

ajeddeloh commented Nov 1, 2018

dustymabe commented Nov 5, 2018

ajeddeloh commented Nov 6, 2018

dustymabe commented Dec 12, 2018

Conan-Kudo commented Dec 13, 2018

smarterclayton commented Jan 30, 2019 • edited

bgilbert commented Mar 15, 2019

cevich commented Apr 2, 2019

bgilbert commented Apr 2, 2019

cevich commented Apr 2, 2019

bgilbert commented Apr 2, 2019

cevich commented Apr 3, 2019

bgilbert commented Apr 3, 2019

cevich commented Apr 3, 2019

darkmuggle commented May 17, 2019

bgilbert commented Jan 17, 2020

Conan-Kudo commented Jan 17, 2020

dustymabe commented Oct 14, 2020

dustymabe commented Oct 25, 2018 •

edited by bgilbert

smarterclayton commented Jan 30, 2019 •

edited