New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First boot OVS/WPA workaround for cloud-init #157
Conversation
… via systemd to start all netpla-*.service units at runtime
This way it can be started on first boot by cloud-init via 'systemctl start netplan.target' This partly fixes (LP: #1870346), but needs fixed to cloud-init and systemd v246 (systemd/systemd#16371) as well.
…get'" This reverts commit d3dab6e4590f4e7d6006996d0b2c3e985337d356.
Hm, why can't we instead change |
@@ -967,6 +967,7 @@ write_networkd_conf(const NetplanNetDefinition* def, const char* rootdir) | |||
|
|||
if (def->type == NETPLAN_DEF_TYPE_WIFI || def->has_auth) { | |||
g_autofree char* link = g_strjoin(NULL, rootdir ?: "", "/run/systemd/system/systemd-networkd.service.wants/netplan-wpa-", def->id, ".service", NULL); | |||
g_autofree char* link_target = g_strjoin(NULL, rootdir ?: "", "/run/systemd/system/netplan.target.wants/netplan-wpa-", def->id, ".service", NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this facility is available, we should possibly also add netplan.target.wants/systemd-networked.service
and netplan.target.wants/systemd-wait-online.server
. This would enable us to have neither networkd nor NM enabled in the image, and cloud-init able to activate one or the other on boot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should work. We would need some extra code to activate NetworkManager, as we're not setting up systemd units/dependencies from our NM renderer currently, but its certainly possible.
This would only work for cloud-init based images, though. As in a "normal" boot sequence netplan.target
would never be called/activated.
That is what I did here for testing. But in the real world people expect So it would change the semantics of netplan's CLI |
I'm not terribly happy about the idea of this being a workaround. The primary case for OVS is in deploying cloud-init base images.
What service units and deps are generated from netplan generate?
Instead of generating custom service files (tell me if I'm incorrect about the content being dynamic); could those services read configuration files that netplan generate writes? In the case that netplan generate has no ovs or wpa content, those services are inactive; ConditionPath=/run/somepath/netplan-generate/writes/{wpa,ovs}? |
Indeed it isn't great that it needs a workaround to fix cloud-init first boot. But the current design (using systemd service units) was spec'ed and signed-off quite some time ago. This is not a OVS specific implementation detail, as the same approach has been used in netplan to configure wpa_supplicant for many years. I guess we would need @xnox's and/or @vorlonofportland's opinion if we wanted to change the overall architecture here. Generally, netplan creates the following systemd units, using its networkd/OVS backends:
Some examples are
Correct, those service units contain dynamic content. The
@raharper A totally different approach I discussed with @xnox, which would not require any changes to netplan or systemd at all, would be to split up the cloud-init boot sequence into multiple stages, e.g.:
The idea here is to split up the boot sequence into two (or more?) systemd transactions, so we can call "daemon-reload" in between to re-run all the generators and re-calculate all the dependencies. This way all generators would be used in their intended way and would not need any workarounds. I am not sure if/how this is feasible to be implemented in cloud-init. |
I disagree with the usage of word "workaround". Netplan generate creates many units and dependencies which when created at systemd-genetator time are correctly loaded as part of the boot. However, that precludes at the moment adding new units to the initial boot transaction when netplan yaml is not known ahead of time. For example when cloud-init injects it, just in time. Hence today, cloud-init cannot dynamically enable/disable networkd or NetworkManager renderers via netplan. And we have to bake networkd/wait-online units as enabled, hoping that cloud-init will inject netplan and generate will be run before those daemons start.
network.target.wants systemd-networkd.service|socket, systemd-networkd-online, NetworkManager.service & its wait-online, wpa-supplicant "WiFi" units, OVS service units to establish OVS and configure bridges & fakevlans, or cleanup old OVS state (as itself is persistent across reboots).
No. Any and all renderings of netplan require units to be enabled always. So far we have worked around that by baking in networkd or NM as enabled on server/desktop images.
That would be too late for some of them. And not guaranteed to complete before we start executing cloud-config / cloud-final stages of cloud-init at which point WiFi/OVS must already be up, running and configured.
That would loose granularity though. OVS rejecting setting up one bridge, will mean the rest of bridges are not attempted to be configured. For WPA, we do need a long running Daemon per WiFi network, as it has to continuously renew beacons & roam between APs. And it wouldn't help with starting NetworkManager or networkd. @raharper alternative to this, is to re-engineer how cloud-init targets work a bit. Instead of booting to default.target, divert the boot to cloud-local.target, fully complete booting to that (no jobs left), then call systemctl daemon-reload, then start default.target with new dependencies calculated. Doing that would allow users to do interesting things with systemd via cloud-config. Like changing the default.target from multiuser.target to emergency.target, adding / masking / removing units used in early boot, and "just write fstab" and allow systemd-fstab-generator to process it, and mount things, etc. Or systemd should be fixed to allow dynamically picking up new targets & units always, like upstart did.... |
It's not a hope; cloud-init runs before those daemons start, feeding them
This is the core issue. Neither networkd nor NetworkManager utilize dynamic
Networkd and NetworkManager expect to be enabled; just like networking.service On service start, the unit can be skipped if there aren't any configuration So that doesn't sound like a workaround (always enabled), but requirement Are the OVS/WPA daemons not designed to read config?
OK
I'm not following; How are OVS/WPA different than networkd/NM where the I'm suggesting to not write configuration for a network into units. Instead
This is certainly an interesting topic; I'm not sure we should depend on this |
Yes. This is the core issue indeed.
It depends. OVS for example is using a database (
IMO this should be the preferred solution here as it solves the problem in a clean way without needing special handling in netplan or backports of systemd features. We should consider if we can wait for this to be implemented, or if we need to integrate some kind of workaround to fix the issue temporarily right now (i.e. the code of this PR). |
cloud-init is calling netplan generator exactly at the right time: after It seems that at some point netplan generate grew emitting units rather
When reviewing the ovs/netplan design 8 months ago, I raised this issue of It's unfortunate that we've an implementation and now we're sorting out how
I'm very much wary of suggesting additional systemd target/unit changes that How do we scope the changes to minimize any impact? |
I opened a discussion / bug report at cloud-init to discuss the "staged boot" targets: And I'd like to reject this PR in favor of a this "staged boot". |
However, as discussed on irc and in the cloud-init bug report, imposing and changing boot of all systems ever to use multiple-transactions, always, is a far bigger change too. It would be interesting, if we can still do just in time loading of this unit, in a more opportunistic manner. Such that, it only ever is generated midflight and loaded when needed. With most of the time being a noopt. |
To minimize risks/changes, we could do this:
This would make this PR be a no-opt for most instances; |
Fixed in #162 |
Description
Cloud-init makes use of the netplan generator, but calls
netplan generate
manually at runtime (during the systemd boot transaction), instead of running it as intended at systemd generator stage, due to restrictions it has regarding fetching of its data source (e.g. netplan YAML config).This leads to problems at first boot, as the systemd unit dependencies are calculated after the generator stage, but ahead of the boot transaction (e.g. via
systemctl daemon-reload
), therefore the new service units and its dependencies, which are generated by manually callingnetplan generate
are ignored during the first-boot transaction. In subsequent boots (where the cloud-init data source, netplan YAML config and unit files are already in place), everything works as expected.Systemd v246 introduced a new feature, where target units can be lazy-loaded at runtime, if they are known ahead of a transaction (but still unloaded) and changed on disk in the meantime. Therefore, we always place a new
netplan.target
systemd unit in/lib/systemd/system/netplan.target
, which does almost nothing and isn't loaded by default, but is available at all times on disk. When executingnetplan generate
we create new symlinks/dependencies as/run/systemd/system/netplan.target.wants/netplan-[ovs|wpa]-*.service
, so that cloud-init (or any other service) can callsystemctl start netplan.target
after having callednetplan generate
to have the target unit incl. its new dependencies lazy loaded and started – even during a systemd (boot-) transaction.We cannot statically link the
netplan.target
e.g. via/lib/systemd/system/network.target.wants/netplan.target
, as my experiments showed that it would then already be loaded (read from disk) when the initial dependencies are calculated (ahead of the boot transaction) and therefor it does not know about the netplan service units, which are generated at a later stage by cloud-init callingnetplan generate
after it put the YAML config in place.So in order to get this beast working, we need:
systemctl start netplan.target
after callingnetplan generate
in cloudinit/net/netplan.py:243Reproducer
systemctl start netplan.target
Checklist
make check
successfully.make check-coverage
).