New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container linux 1662.0.0 alpha doesn't add routes if external to network #2327

Closed
jorgelon opened this Issue Jan 24, 2018 · 11 comments

Comments

Projects
None yet
6 participants
@jorgelon

jorgelon commented Jan 24, 2018

Issue Report

Bug

Container Linux Version

1662.0.0 alpha

Environment

Under vmware esxi 5.5 with oem vmware_raw with vmxnet3 interface
I have a gateway from a different network than the IP
for example:
vm ip : 10.5.100.23/32
gateway: 50.255.255.255
The ip and the routes come from the dhcp server

Expected Behavior

In coreos stable or beta or previous alpha releases, when the machine boots all works. The vm gets the correct route and ip adress

Actual Behavior

In Coreos 1662.0.0 alpha the vm gets the IP but it fails setting the gateway

Reproduction Steps

  1. Start an vm with coreos alpha 1662
  2. The vm has no connectivity because the gateway is not set

Other Information

The systemd-network error is "Could not set DHCPv4 route" "Network is unreachable

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Jan 25, 2018

This is almost certainly from the systemd 235 -> 236 upgrade; I'll review the changes between the two and see if anything sticks out. Can you please provide the output from journalctl --no-pager -u systemd-networkd? Also ip a and the contents of the relevent lease in /run/systemd/netif/leases/ would be helpful. Thanks for the report.

@jorgelon

This comment has been minimized.

jorgelon commented Jan 25, 2018

Jan 25 14:00:26 localhost systemd[1]: Starting Network Service...
Jan 25 14:00:26 localhost systemd-networkd[577]: Enumeration completed
Jan 25 14:00:26 localhost systemd[1]: Started Network Service.
Jan 25 14:00:27 localhost systemd-networkd[577]: eth0: Interface name change detected, eth0 has been renamed to ens192.
Jan 25 14:00:27 localhost systemd-networkd[577]: lo: Configured
Jan 25 14:00:27 localhost systemd-networkd[577]: ens192: IPv6 successfully enabled
Jan 25 14:00:27 localhost systemd-networkd[577]: ens192: Gained carrier
Jan 25 14:00:27 localhost systemd-networkd[577]: ens192: DHCPv4 address <PUBLIC IP>/32 via 15.255.255.1
Jan 25 14:00:27 localhost systemd-networkd[577]: ens192: Could not set DHCPv4 route: Network is unreachable
Jan 25 14:00:27 localhost systemd-networkd[577]: ens192: Failed
ADDRESS=<PUBLIC IP>
NETMASK=255.255.255.255
ROUTER=15.255.255.1
SERVER_ADDRESS=<PUBLIC IP>
T1=21600
T2=37800
LIFETIME=43200
DNS=<DNS1> <DNS2>
ROUTES=169.254.0.0/16,15.255.255.1   << the first route is because I have a metadata server there
CLIENTID=ff2d1aa13300020000ab11948395539b53d96a
@squeed

This comment has been minimized.

squeed commented Jan 25, 2018

Oddly enough, this change, at first glance, should have fixed this: systemd/systemd#5982

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Jan 25, 2018

Thanks for the logs. It looks like when specifying static routes (i.e. not from dhcp) you need to set GatewayOnlink to true or else it will reject the route, but there's no option to do the same with dhcp routes.

@squeed That looks like it's been in systemd since 234 and this only popped up after switching to from 235 to 236.

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Jan 25, 2018

I wonder if systemd/systemd#6885 is responsible. It looks like before that the gateway route was applied then the static routes whereas now the static routes are applied first, then the gateway route if no static routes were applied.

@jorgelon can you provide a little more info about what your network looks like? Specifically the dhcp options the server is sending.

You also might try swapping the order of those routes so that gateway one comes first. I suspect what's happening is it doesn't have the any routes, then try to apply the metadata one, but doesn't have the gateway yet so it cant reach anything and thus fails.

@lucab

This comment has been minimized.

Member

lucab commented Jan 26, 2018

A packet capture of the DCHP exchange may also be valuable, in order to check what is actually offered to networkd.

@jorgelon

This comment has been minimized.

jorgelon commented Jan 26, 2018

    option subnet-mask 255.255.255.255;
    option routers 15.255.255.1;
    option static-routes 169.254.169.254 15.255.255.1; << the first route is because I have a metadata server there
    option domain-name-servers 15.5.100.12, 15.5.101.12;

I have tried with GatewayOnlink and it works but in stable and beta it is not neccessary.

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Jan 26, 2018

Ugh, it looks like they're parsing both the classless static routes and the classful static routes into the same list, then assuming it's classless (and thus ignoring the gateway route as required when using the classless routes option, but detrimental when using the classful routes option).

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Jan 26, 2018

Looks like it's fixed upstream in systemd/systemd@8cdc46e. We'll backport that and it should be in the next alpha.

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Jan 26, 2018

Closed via coreos/coreos-overlay#3027. It will be fixed in the next alpha. As a workaround you can probably specify the routes manually in a networkd unit.

@ajeddeloh ajeddeloh closed this Jan 26, 2018

@bgilbert

This comment has been minimized.

Member

bgilbert commented Feb 1, 2018

This will be fixed in the next alpha, and also beta as current alpha is promoted. Both are due shortly. Thanks for the report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment