Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static routes are not per-interface, which breaks some deployments #3143

Closed
ubuntu-server-builder opened this issue May 11, 2023 · 13 comments
Closed
Labels
bug Something isn't working correctly launchpad Migrated from Launchpad

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1758919

Launchpad details
affected_projects = ['maas', 'maas/2.3']
assignee = None
assignee_name = None
date_closed = None
date_created = 2018-03-26T13:49:51.837880+00:00
date_fix_committed = None
date_fix_released = None
id = 1758919
importance = medium
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1758919
milestone = None
owner = mpontillo
owner_name = Mike Pontillo
private = False
status = triaged
submitter = gabor.meszaros
submitter_name = Gábor Mészáros
tags = ['4010', 'cpe-onsite', 'field-critical']
duplicates = [1752332]

Launchpad user Gábor Mészáros(gabor.meszaros) wrote on 2018-03-26T13:49:51.837880+00:00

When juju tries to deploy a lxd container on a maas managed machine, it looses all static routes, due to ifdown/ifup being issued and e/n/i has no saved data on the original state.

Machine with no lxd container deployed:
root@4-compute-4:~# ip r
default via 100.68.4.254 dev bond2 onlink
100.68.4.0/24 dev bond2 proto kernel scope link src 100.68.4.1
100.68.5.0/24 via 100.68.4.254 dev bond2
100.68.6.0/24 via 100.68.4.254 dev bond2
100.84.4.0/24 dev bond1 proto kernel scope link src 100.84.4.2
100.84.5.0/24 via 100.84.4.254 dev bond1
100.84.6.0/24 via 100.84.4.254 dev bond1
100.99.4.0/24 dev bond0 proto kernel scope link src 100.99.4.101
100.99.5.0/24 via 100.99.4.254 dev bond0
100.99.6.0/24 via 100.99.4.254 dev bond0
100.107.0.0/24 via 100.99.4.254 dev bond0

After juju deploys a container, routes are disappearing:
root@4-management-1:~# ip r
default via 100.68.100.254 dev bond2 onlink
10.177.144.0/24 dev lxdbr0 proto kernel scope link src 10.177.144.1
100.68.100.0/24 dev bond2 proto kernel scope link src 100.68.100.26
100.84.4.0/24 dev br-bond1 proto kernel scope link src 100.84.4.1
100.99.4.0/24 dev br-bond0 proto kernel scope link src 100.99.4.3

After host reboot, the routes are NOT getting back in place, they are still gone:
root@4-management-1:~# ip r s
default via 100.68.100.254 dev bond2 onlink
100.68.100.0/24 dev bond2 proto kernel scope link src 100.68.100.26
100.84.4.0/24 dev br-bond1 proto kernel scope link src 100.84.4.1
100.84.5.0/24 via 100.84.4.254 dev br-bond1
100.84.6.0/24 via 100.84.4.254 dev br-bond1
100.99.4.0/24 dev br-bond0 proto kernel scope link src 100.99.4.3

@ubuntu-server-builder ubuntu-server-builder added bug Something isn't working correctly launchpad Migrated from Launchpad labels May 11, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Gábor Mészáros(gabor.meszaros) wrote on 2018-03-26T13:52:43.998253+00:00

Launchpad attachments: eni_original_new

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Gábor Mészáros(gabor.meszaros) wrote on 2018-03-26T13:53:02.425136+00:00

attached is the original and juju modified interfaces file

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Gábor Mészáros(gabor.meszaros) wrote on 2018-03-26T13:55:11.749329+00:00

the routes are in the wrong bond (bond2), however the gateways are on br-bond0. Also in MAAS they are set to those proper subnets.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Gábor Mészáros(gabor.meszaros) wrote on 2018-03-26T13:59:09.323381+00:00

on nodes without containers, the configuration is put to /etc/network/interfaces.d/50-cloud-init.cfg, which is also available on all nodes, but getting overridden.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ante Karamatić(ivoks) wrote on 2018-03-26T13:59:10.231739+00:00

ifup brings interfaces in serial. In juju's ENI, this means that it would bring bond0 before br-bond0 and br-bond1. And since layer3, provided by br-bond1 and br-bond2 would not exist when post-up is run, post-up would fail. Because of '|| true' that would not cause ifup to fail, but it would leave the machine without routes.

I believe MAAS add 'post-up' static routes always to last interface (which is a good approach until netplan solves this). This means that juju should do the same; pick up post-up routes from the bottom of ENI and place them at the end of the last bridge it creates.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Gábor Mészáros(gabor.meszaros) wrote on 2018-03-26T13:59:40.201409+00:00

Launchpad attachments: 50-cloud-init.cfg

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user John A Meinel(jameinel) wrote on 2018-03-26T14:24:25+00:00

Why is adding it to the "last interface" correct. Wouldn't it be more
correct to attach routes to the interface that contains that route?

Eg, in your above scenario, bond0 is getting 100.99.4.3/24 thus things that
use the 100.99.4.254 as the gateway should be attached to bond0?

On Mon, Mar 26, 2018 at 5:59 PM, Gábor Mészáros <
gabor.meszaros@canonical.com> wrote:

** Attachment added: "50-cloud-init.cfg"
https://bugs.launchpad.net/juju/+bug/1758919/+attachment/
5091207/+files/50-cloud-init.cfg

--
You received this bug notification because you are a member of Canonical
Field Critical, which is subscribed to the bug report.
Matching subscriptions: juju bugs
https://bugs.launchpad.net/bugs/1758919

Title:
static routes get lost when lxd container being deployed [MAAS
environment]

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju/+bug/1758919/+subscriptions

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user John A Meinel(jameinel) wrote on 2018-03-26T14:26:04+00:00

Note that:
https://launchpadlibrarian.net/362140579/50-cloud-init.cfg
Is a lie, because it pretends that it is just setting up the "post-up"
routes "after all interfaces", but really all of those are explicitly
attached to bond2 (it though it wasn't indenting them, but ifup, et al,
don't actually pay attention in that fashion.)

On Mon, Mar 26, 2018 at 6:24 PM, John Meinel john@arbash-meinel.com wrote:

Why is adding it to the "last interface" correct. Wouldn't it be more
correct to attach routes to the interface that contains that route?

Eg, in your above scenario, bond0 is getting 100.99.4.3/24 thus things
that use the 100.99.4.254 as the gateway should be attached to bond0?

On Mon, Mar 26, 2018 at 5:59 PM, Gábor Mészáros <
gabor.meszaros@canonical.com> wrote:

** Attachment added: "50-cloud-init.cfg"
https://bugs.launchpad.net/juju/+bug/1758919/+attachment/50
91207/+files/50-cloud-init.cfg

--
You received this bug notification because you are a member of Canonical
Field Critical, which is subscribed to the bug report.
Matching subscriptions: juju bugs
https://bugs.launchpad.net/bugs/1758919

Title:
static routes get lost when lxd container being deployed [MAAS
environment]

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju/+bug/1758919/+subscriptions

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user John A Meinel(jameinel) wrote on 2018-03-26T14:29:09+00:00

Is this actually Field Critical? Isn't just moving the post-up to a
different section enough to fix the field issue as a workaround?

On Mon, Mar 26, 2018 at 6:26 PM, John Meinel john@arbash-meinel.com wrote:

Note that:
https://launchpadlibrarian.net/362140579/50-cloud-init.cfg
Is a lie, because it pretends that it is just setting up the "post-up"
routes "after all interfaces", but really all of those are explicitly
attached to bond2 (it though it wasn't indenting them, but ifup, et al,
don't actually pay attention in that fashion.)

On Mon, Mar 26, 2018 at 6:24 PM, John Meinel john@arbash-meinel.com
wrote:

Why is adding it to the "last interface" correct. Wouldn't it be more
correct to attach routes to the interface that contains that route?

Eg, in your above scenario, bond0 is getting 100.99.4.3/24 thus things
that use the 100.99.4.254 as the gateway should be attached to bond0?

On Mon, Mar 26, 2018 at 5:59 PM, Gábor Mészáros <
gabor.meszaros@canonical.com> wrote:

** Attachment added: "50-cloud-init.cfg"
https://bugs.launchpad.net/juju/+bug/1758919/+attachment/50
91207/+files/50-cloud-init.cfg

--
You received this bug notification because you are a member of Canonical
Field Critical, which is subscribed to the bug report.
Matching subscriptions: juju bugs
https://bugs.launchpad.net/bugs/1758919

Title:
static routes get lost when lxd container being deployed [MAAS
environment]

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju/+bug/1758919/+subscriptions

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ante Karamatić(ivoks) wrote on 2018-03-26T14:30:39.729724+00:00

You are right, and as soon as I wrote the comment I realized I was wrong (mixed it with using iptables in post-up).

16:07 < ivoks> so, ideally, cloud-config would be smarter here
16:08 < ivoks> and place those routes where they belong
16:08 < ivoks> well, whoever generates that cloud-init.cfg should be a wee smarter

Routes should be placed on the interfaces that provide access to gateways for those routes.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Witold Krecicki(wpk) wrote on 2018-03-26T14:39:51.586121+00:00

IMHO that's an obvious MAAS fault in writing the routes always to the last device and not to the device the routes are 'attached' to. In this scenario doing ifdown bond2 (an interface that has absolutely nothing to do with the static routes) would bring the routes down. Moreover, the assumption that order of the devices in e/n/i will be the order in which the devices are brought might be incorrect. IMHO This should be fixed in MAAS.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Mike Pontillo(mpontillo) wrote on 2018-03-27T04:49:51.691342+00:00

IMHO, this should also be fixed in cloud-init. If the input netplan contains "global" routes, the renderer (or whatever can pre-process the Netplan before renderering) should intelligently determine which interfaces have an on-link gateway that matches the global route, and automatically render the route at interface scope instead of "global".

Arguably, if the route's gateway address doesn't match an on-link prefix, it should not be installed anyway (the kernel will reject it anyway, unless the onlink flag is supplied, which instructs the kernel to assume the address is on-link even if it doesn't appear to be). But the only useful scenario I can see for supporting the onlink flag is if we're installing a route on an interface that will get is IP address via DHCP.

@holmanb
Copy link
Member

holmanb commented Apr 28, 2024

IMHO that's an obvious MAAS fault in writing the routes always to the last device and not to the device the routes are 'attached' to.

Agreed

IMHO, this should also be fixed in cloud-init. If the input netplan contains "global" routes, the renderer (or whatever can pre-process the Netplan before renderering) should intelligently determine which interfaces have an on-link gateway that matches the global route, and automatically render the route at interface scope instead of "global".

The expectation is that cloud-init should parse configurations and automatically fix them when it thinks they are incorrect? That sounds error prone at best, and definitely out of scope. I suggest fixing this in MAAS, if it hasn't been already.

Closing

@holmanb holmanb closed this as not planned Won't fix, can't repro, duplicate, stale Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly launchpad Migrated from Launchpad
Projects
None yet
Development

No branches or pull requests

2 participants