New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCP With multiple interfaces can cause connectivity issues. #212

Closed
nickethier opened this Issue Dec 16, 2014 · 36 comments

Comments

Projects
None yet
@nickethier

nickethier commented Dec 16, 2014

Channel: stable
Install method: iPXE boot
Running version 494.5

In some cases, depending on what order interfaces come up, the route table can get screwed up from multiple interfaces auto-configuring via DHCP. This relates to #210.

This happens before the user-cloudinit-proc-cmdline.service unit can run so theres no way for me to get a cloud-config onto the box to disable DHCP.

Here are the interfaces on the box:

core@localhost ~ $ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:xx:xx:xx:28:6a brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:xx:xx:xx:28:6b brd ff:ff:ff:ff:ff:ff

And the route table:

core@localhost ~ $ ip route
default via 172.17.3.1 dev eno2  metric 1024
default via 172.17.3.1 dev eno1  metric 1024
172.17.3.0/24 dev eno2  proto kernel  scope link  src 172.17.3.93
172.17.3.0/24 dev eno1  proto kernel  scope link  src 172.17.3.42
172.17.3.1 dev eno2  scope link  metric 1024
172.17.3.1 dev eno1  scope link  metric 1024

So in this case eno2 will work but eno1 will not due to the routing. But since eno1 is the first interface traffic will by default attempt to use it.

Example (pinging the ip I fetch my cloud-config from):

core@localhost ~ $ ping 172.31.1.10
PING 172.31.1.10 (172.31.1.10) from 172.17.3.42 eno1: 56(84) bytes of data.
^C
--- 172.31.1.10 ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 5000ms

core@localhost ~ $ ping -I eno2 172.31.1.10
PING 172.31.1.10 (172.31.1.10) from 172.17.3.93 eno2: 56(84) bytes of data.
64 bytes from 172.31.1.10: icmp_seq=1 ttl=62 time=1.80 ms
64 bytes from 172.31.1.10: icmp_seq=2 ttl=62 time=1.04 ms
64 bytes from 172.31.1.10: icmp_seq=3 ttl=62 time=1.02 ms
64 bytes from 172.31.1.10: icmp_seq=4 ttl=62 time=1.11 ms
64 bytes from 172.31.1.10: icmp_seq=5 ttl=62 time=1.14 ms
64 bytes from 172.31.1.10: icmp_seq=6 ttl=62 time=1.01 ms
^C
--- 172.31.1.10 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5006ms
rtt min/avg/max/mdev = 1.010/1.189/1.806/0.281 ms

I've worked around this by blacklisting the macs on the second interface on my dhcp server for my lab but this doesn't scale too well. I don't see why you would need more that one interface to be configured via DHCP (iPXE is a good example, it will only auto configure the first interface*), so it would seem the best resolution would be to only enable DHCP for a single interface. I'm not familiar enough with networkd yet to know if this is a possibility.

*ipxe ifconf description:

Automatically configure a network interface. iPXE will open the first specified network interface and attempt to automatically configure it. If automatic configuration succeeds, the command will terminate and the network interface will be left open. If automatic configuration fails, the network interface will be closed and iPXE will proceed to the next network interface in the list.
If no network interfaces are explicitly specified, iPXE will try all available network interfaces.
If no configurators are explicitly specified, iPXE will try all available configurators.

os-release:

core@localhost ~ $ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=494.5.0
VERSION_ID=494.5.0
BUILD_ID=
PRETTY_NAME="CoreOS 494.5.0"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
@hvolpers

This comment has been minimized.

Show comment
Hide comment
@hvolpers

hvolpers Jan 3, 2015

Hi,

you can ignore the advertised routes for one of your interfaces - I posted a comparable setup here #226

hvolpers commented Jan 3, 2015

Hi,

you can ignore the advertised routes for one of your interfaces - I posted a comparable setup here #226

@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Jan 3, 2015

Yes but you have to get a cloud-config file on the box first. In my case
the networking is borked due to the dhcp issue before it can fetch the
cloud-config.

On Saturday, January 3, 2015, hvolpers notifications@github.com wrote:

Hi,

you can ignore the advertised routes for one of your interfaces - I posted
a comparable setup here #226 #226


Reply to this email directly or view it on GitHub
#212 (comment).

nickethier commented Jan 3, 2015

Yes but you have to get a cloud-config file on the box first. In my case
the networking is borked due to the dhcp issue before it can fetch the
cloud-config.

On Saturday, January 3, 2015, hvolpers notifications@github.com wrote:

Hi,

you can ignore the advertised routes for one of your interfaces - I posted
a comparable setup here #226 #226


Reply to this email directly or view it on GitHub
#212 (comment).

@hvolpers

This comment has been minimized.

Show comment
Hide comment
@hvolpers

hvolpers Jan 3, 2015

Sorry, looks like it's too late here :)
Out of curiosity, why do you have two NICs in the same network?

hvolpers commented Jan 3, 2015

Sorry, looks like it's too late here :)
Out of curiosity, why do you have two NICs in the same network?

@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Jan 4, 2015

In my senario, the default/native vlan on the ports is set to a
"bootstrapping" network which has services for pxe/ipxe and configurations
for cloud-config, debootstrap, etc... Not everything is coreos, (yet =P).
Its actually the only network that uses dhcp. Cloud-config takes care of
configuring the NICs for the proper networks.

On Sat, Jan 3, 2015 at 3:24 PM, hvolpers notifications@github.com wrote:

Sorry, looks like it's too late here :)
Out of curiosity, why do you have two NICs in the same network?


Reply to this email directly or view it on GitHub
#212 (comment).

nickethier commented Jan 4, 2015

In my senario, the default/native vlan on the ports is set to a
"bootstrapping" network which has services for pxe/ipxe and configurations
for cloud-config, debootstrap, etc... Not everything is coreos, (yet =P).
Its actually the only network that uses dhcp. Cloud-config takes care of
configuring the NICs for the proper networks.

On Sat, Jan 3, 2015 at 3:24 PM, hvolpers notifications@github.com wrote:

Sorry, looks like it's too late here :)
Out of curiosity, why do you have two NICs in the same network?


Reply to this email directly or view it on GitHub
#212 (comment).

@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Jan 5, 2015

@crawford Anything I can do here to help or gather more info.

nickethier commented Jan 5, 2015

@crawford Anything I can do here to help or gather more info.

@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Jan 27, 2015

Has this been addressed at all in later releases? Should I give a new version a try? Is there anything I can do to get some progress on this? Its a decent blocker for enabling multiple interfaces on our machines.

nickethier commented Jan 27, 2015

Has this been addressed at all in later releases? Should I give a new version a try? Is there anything I can do to get some progress on this? Its a decent blocker for enabling multiple interfaces on our machines.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Jan 27, 2015

Member

@nickethier Sorry for the delay here. The version you mentioned is running systemd 215. The alpha and beta channels have systemd 218. Can you give one of those a shot?

Member

crawford commented Jan 27, 2015

@nickethier Sorry for the delay here. The version you mentioned is running systemd 215. The alpha and beta channels have systemd 218. Can you give one of those a shot?

@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Jan 29, 2015

@crawford Was able to test this morning. Still seeing the same issue on alpha.

core@localhost ~ $ ip route
default via 172.17.3.1 dev eno1  proto dhcp  src 172.17.3.42  metric 1024
default via 172.17.3.1 dev eno2  proto dhcp  src 172.17.3.93  metric 1024
172.17.3.0/24 dev eno1  proto kernel  scope link  src 172.17.3.42
172.17.3.0/24 dev eno2  proto kernel  scope link  src 172.17.3.93
172.17.3.1 dev eno1  proto dhcp  scope link  src 172.17.3.42  metric 1024
172.17.3.1 dev eno2  proto dhcp  scope link  src 172.17.3.93  metric 1024
core@localhost ~ $ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=575.0.0
VERSION_ID=575.0.0
BUILD_ID=
PRETTY_NAME="CoreOS 575.0.0"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

nickethier commented Jan 29, 2015

@crawford Was able to test this morning. Still seeing the same issue on alpha.

core@localhost ~ $ ip route
default via 172.17.3.1 dev eno1  proto dhcp  src 172.17.3.42  metric 1024
default via 172.17.3.1 dev eno2  proto dhcp  src 172.17.3.93  metric 1024
172.17.3.0/24 dev eno1  proto kernel  scope link  src 172.17.3.42
172.17.3.0/24 dev eno2  proto kernel  scope link  src 172.17.3.93
172.17.3.1 dev eno1  proto dhcp  scope link  src 172.17.3.42  metric 1024
172.17.3.1 dev eno2  proto dhcp  scope link  src 172.17.3.93  metric 1024
core@localhost ~ $ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=575.0.0
VERSION_ID=575.0.0
BUILD_ID=
PRETTY_NAME="CoreOS 575.0.0"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
@sramak1396c

This comment has been minimized.

Show comment
Hide comment
@sramak1396c

sramak1396c Mar 5, 2015

I seem to have a similar issue , which is driving me crazy.

coreos on openstack
public and private network

there are 2 default routes getting created , one for private and another public.
something with the order , it just whacks the network and I am not able to connect to the instance.

Issue is resolved , when I remove the default route for private network ( typically private network should be non-routable)

But, route always comes back ( guess system-networkd daemon just brings it up ).
Not sure if above if this issue is related.

core@kube-master ~ $ ip route
default via 192.168.1.1 dev eth1 proto static
default via ... dev eth0 proto dhcp src ... metric 1024 -> my public interface
.../23 dev eth0 proto kernel scope link src ...
... dev eth0 proto dhcp scope link src ... metric 1024
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.2
core@kube-master ~ $

sramak1396c commented Mar 5, 2015

I seem to have a similar issue , which is driving me crazy.

coreos on openstack
public and private network

there are 2 default routes getting created , one for private and another public.
something with the order , it just whacks the network and I am not able to connect to the instance.

Issue is resolved , when I remove the default route for private network ( typically private network should be non-routable)

But, route always comes back ( guess system-networkd daemon just brings it up ).
Not sure if above if this issue is related.

core@kube-master ~ $ ip route
default via 192.168.1.1 dev eth1 proto static
default via ... dev eth0 proto dhcp src ... metric 1024 -> my public interface
.../23 dev eth0 proto kernel scope link src ...
... dev eth0 proto dhcp scope link src ... metric 1024
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.2
core@kube-master ~ $

@jedsmith

This comment has been minimized.

Show comment
Hide comment
@jedsmith

jedsmith Apr 5, 2015

I'm hitting this too using eth0 and eth1 on Amazon, and it doesn't seem my cloud-config can prevent it. It seems like systemd-networkd assigns the metric=1024 routes that it gets from DHCP before cloud-config can change the units, so if you define a static default route in a network unit, you end up with three default routes across both interfaces and this causes asymmetric routing depending on the policy. (Out of the box, simply adding eth1 to an EC2 instance can cause asymmetric routing all by itself, it seems.)

Still looking into it, but my plan for a stacking up VPN + NAT + services on my border with each leg of the machine in a different subnet and security group isn't looking possible without modifying the AMI to ban DHCP earlier in the boot. I agree with @nickethier that DHCP across multiple instances probably isn't useful, but then the question arises of how to decide which interface in an early, pre-config environment.

A useful medium would be spraying every interface but only configuring one, but again, which one? On Amazon, the answer is fairly clear. Physical gear, where multiple NICs are more likely? Not so much.

jedsmith commented Apr 5, 2015

I'm hitting this too using eth0 and eth1 on Amazon, and it doesn't seem my cloud-config can prevent it. It seems like systemd-networkd assigns the metric=1024 routes that it gets from DHCP before cloud-config can change the units, so if you define a static default route in a network unit, you end up with three default routes across both interfaces and this causes asymmetric routing depending on the policy. (Out of the box, simply adding eth1 to an EC2 instance can cause asymmetric routing all by itself, it seems.)

Still looking into it, but my plan for a stacking up VPN + NAT + services on my border with each leg of the machine in a different subnet and security group isn't looking possible without modifying the AMI to ban DHCP earlier in the boot. I agree with @nickethier that DHCP across multiple instances probably isn't useful, but then the question arises of how to decide which interface in an early, pre-config environment.

A useful medium would be spraying every interface but only configuring one, but again, which one? On Amazon, the answer is fairly clear. Physical gear, where multiple NICs are more likely? Not so much.

@JeanMertz

This comment has been minimized.

Show comment
Hide comment
@JeanMertz

JeanMertz Apr 21, 2015

We're running in the exact same issues as @sramak1396c, CoreOS + OpenStack with public and private network.

Resulting in two default routes, causing networking issues.

JeanMertz commented Apr 21, 2015

We're running in the exact same issues as @sramak1396c, CoreOS + OpenStack with public and private network.

Resulting in two default routes, causing networking issues.

@jedsmith

This comment has been minimized.

Show comment
Hide comment
@jedsmith

jedsmith Apr 21, 2015

If it helps, I worked around this by writing network units into /etc/systemd/network by hand to override zz-default.network's DHCP for both interfaces outside of cloud-config, and then just didn't configure my network via cloud-config. Just throwing that out there, as it might be an option for long-lived gear. It's ham-fisted because you have to boot once, fix, reboot, but it's something.

jedsmith commented Apr 21, 2015

If it helps, I worked around this by writing network units into /etc/systemd/network by hand to override zz-default.network's DHCP for both interfaces outside of cloud-config, and then just didn't configure my network via cloud-config. Just throwing that out there, as it might be an option for long-lived gear. It's ham-fisted because you have to boot once, fix, reboot, but it's something.

@sramak1396c

This comment has been minimized.

Show comment
Hide comment
@sramak1396c

sramak1396c Apr 21, 2015

I do something similar.I remove the private network and add it back through setup network environment. Cannot find a better solution , it works for now.

units:
- name: remove-private-network.service
command: start
content: |
[Unit]
Description=Remove default Private routing
Requires=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/bin/route del default gw 192.168.1.1
SuccessExitStatus=7
- name: setup-network-environment.service
command: start
content: |
[Unit]
Description=Setup Network Environment
Documentation=https://github.com/kelseyhightower/setup-network-environment
Requires=remove-private-network.service
After=remove-private-network.service

    [Service]
    ExecStartPre=-/usr/bin/mkdir -p /opt/bin
    ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/k8s/setup-network-environment
    ExecStartPre=/usr/bin/chmod +x /opt/bin/setup-network-environment
    ExecStart=/opt/bin/setup-network-environment
    RemainAfterExit=yes
    Type=oneshot
- name: etcd.service
  command: start
  content: |
    [Unit]
    Description=etcd
    Requires=setup-network-environment.service
    After=setup-network-environment.service

    [Service]
    EnvironmentFile=/etc/network-environment
    User=etcd
    PermissionsStartOnly=true
    ExecStart=/usr/bin/etcd \
    --name ${DEFAULT_IPV4} \
    --addr ${DEFAULT_IPV4}:4001 \
    --bind-addr 0.0.0.0 \
    --cluster-active-size 1 \
    --data-dir /var/lib/etcd \
    --http-read-timeout 86400 \
    --peer-addr ${DEFAULT_IPV4}:7001 \
    --snapshot true
    Restart=always
    RestartSec=10s

On Apr 21, 2015, at 4:19 PM, Jed Smith <notifications@github.commailto:notifications@github.com> wrote:

If it helps, I worked around this by writing network units into /etc/systemd/system by hand to override zz-default.network's DHCP for both interfaces outside of cloud-config, and then just didn't configure my network via cloud-config. Just throwing that out there, as it might be an option (but a bummer, I know).


Reply to this email directly or view it on GitHubhttps://github.com//issues/212#issuecomment-94927150.

sramak1396c commented Apr 21, 2015

I do something similar.I remove the private network and add it back through setup network environment. Cannot find a better solution , it works for now.

units:
- name: remove-private-network.service
command: start
content: |
[Unit]
Description=Remove default Private routing
Requires=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/bin/route del default gw 192.168.1.1
SuccessExitStatus=7
- name: setup-network-environment.service
command: start
content: |
[Unit]
Description=Setup Network Environment
Documentation=https://github.com/kelseyhightower/setup-network-environment
Requires=remove-private-network.service
After=remove-private-network.service

    [Service]
    ExecStartPre=-/usr/bin/mkdir -p /opt/bin
    ExecStartPre=/usr/bin/wget -N -P /opt/bin https://storage.googleapis.com/k8s/setup-network-environment
    ExecStartPre=/usr/bin/chmod +x /opt/bin/setup-network-environment
    ExecStart=/opt/bin/setup-network-environment
    RemainAfterExit=yes
    Type=oneshot
- name: etcd.service
  command: start
  content: |
    [Unit]
    Description=etcd
    Requires=setup-network-environment.service
    After=setup-network-environment.service

    [Service]
    EnvironmentFile=/etc/network-environment
    User=etcd
    PermissionsStartOnly=true
    ExecStart=/usr/bin/etcd \
    --name ${DEFAULT_IPV4} \
    --addr ${DEFAULT_IPV4}:4001 \
    --bind-addr 0.0.0.0 \
    --cluster-active-size 1 \
    --data-dir /var/lib/etcd \
    --http-read-timeout 86400 \
    --peer-addr ${DEFAULT_IPV4}:7001 \
    --snapshot true
    Restart=always
    RestartSec=10s

On Apr 21, 2015, at 4:19 PM, Jed Smith <notifications@github.commailto:notifications@github.com> wrote:

If it helps, I worked around this by writing network units into /etc/systemd/system by hand to override zz-default.network's DHCP for both interfaces outside of cloud-config, and then just didn't configure my network via cloud-config. Just throwing that out there, as it might be an option (but a bummer, I know).


Reply to this email directly or view it on GitHubhttps://github.com//issues/212#issuecomment-94927150.

@stresler

This comment has been minimized.

Show comment
Hide comment
@stresler

stresler Apr 21, 2015

I could be misunderstanding. but I think this is either a misconfigured dhcp server or by design.

DHCP sets a default gateway intentionally - it wouldn't work without that. What is happening here is that you are trying to DHCP two interfaces, which is fine, but dhcp has an option for that.

http://askubuntu.com/questions/152605/how-to-define-default-gateway-with-multiple-dhcp-interfaces

The OS has no way of knowing which interface or subnet you want to be the default, so it sets both.

Alternatively, you can alter the routes with cloud-init after dhcp sets them.

stresler commented Apr 21, 2015

I could be misunderstanding. but I think this is either a misconfigured dhcp server or by design.

DHCP sets a default gateway intentionally - it wouldn't work without that. What is happening here is that you are trying to DHCP two interfaces, which is fine, but dhcp has an option for that.

http://askubuntu.com/questions/152605/how-to-define-default-gateway-with-multiple-dhcp-interfaces

The OS has no way of knowing which interface or subnet you want to be the default, so it sets both.

Alternatively, you can alter the routes with cloud-init after dhcp sets them.

@eaoliver

This comment has been minimized.

Show comment
Hide comment
@eaoliver

eaoliver May 17, 2015

Thanks for the work around @sramak1396c , but have you noticed the default route comes back?

I have an AWS instance running with two network interfaces (one on 10.0.0.0/24 and another on 10.0.128.0/17). The 10.0.0.0/24 network is publicly accessible. The 10.0.128.0/17 network is private.

core@node ~ $ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    1024   0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.0.0.1        0.0.0.0         255.255.255.255 UH    1024   0        0 eth0
10.0.128.0      0.0.0.0         255.255.128.0   U     0      0        0 eth1
10.0.128.1      0.0.0.0         255.255.255.255 UH    1024   0        0 eth1

Then, wait a while and CoreOS adds the default route into the private network back to the routing table.

core@node ~ $ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    1024   0        0 eth0
0.0.0.0         10.0.128.1      0.0.0.0         UG    1024   0        0 eth1
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.0.0.1        0.0.0.0         255.255.255.255 UH    1024   0        0 eth0
10.0.128.0      0.0.0.0         255.255.128.0   U     0      0        0 eth1
10.0.128.1      0.0.0.0         255.255.255.255 UH    1024   0        0 eth1

eaoliver commented May 17, 2015

Thanks for the work around @sramak1396c , but have you noticed the default route comes back?

I have an AWS instance running with two network interfaces (one on 10.0.0.0/24 and another on 10.0.128.0/17). The 10.0.0.0/24 network is publicly accessible. The 10.0.128.0/17 network is private.

core@node ~ $ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    1024   0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.0.0.1        0.0.0.0         255.255.255.255 UH    1024   0        0 eth0
10.0.128.0      0.0.0.0         255.255.128.0   U     0      0        0 eth1
10.0.128.1      0.0.0.0         255.255.255.255 UH    1024   0        0 eth1

Then, wait a while and CoreOS adds the default route into the private network back to the routing table.

core@node ~ $ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    1024   0        0 eth0
0.0.0.0         10.0.128.1      0.0.0.0         UG    1024   0        0 eth1
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.0.0.1        0.0.0.0         255.255.255.255 UH    1024   0        0 eth0
10.0.128.0      0.0.0.0         255.255.128.0   U     0      0        0 eth1
10.0.128.1      0.0.0.0         255.255.255.255 UH    1024   0        0 eth1
@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier May 21, 2015

The problem here isn' that DHCP is setting the default route. @stresler is correct in that its the default and right behavior.

The problem is that DHCP is the on by default for all interfaces. In my original issue this causes problems because it borks my route table before I'm able to pull down a cloud-config to fix the interfaces. My two work arounds are either to repackage coreos without the default networkd file, or configure my DHCP server with a whitelist of MACs that map to a single interface on each machince.

I good fix for this might be to only enable DHCP on the first interface or maybe use a kernel parameter to designate an interface to use, but I'm not sure if either of these options are supported in systemd-networkd.

nickethier commented May 21, 2015

The problem here isn' that DHCP is setting the default route. @stresler is correct in that its the default and right behavior.

The problem is that DHCP is the on by default for all interfaces. In my original issue this causes problems because it borks my route table before I'm able to pull down a cloud-config to fix the interfaces. My two work arounds are either to repackage coreos without the default networkd file, or configure my DHCP server with a whitelist of MACs that map to a single interface on each machince.

I good fix for this might be to only enable DHCP on the first interface or maybe use a kernel parameter to designate an interface to use, but I'm not sure if either of these options are supported in systemd-networkd.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Oct 20, 2015

Member

Ignition can be used to write the necessary networkd config files. As @nickethier pointed out, the default networkd config will enable DHCP on all of the interfaces.

Member

crawford commented Oct 20, 2015

Ignition can be used to write the necessary networkd config files. As @nickethier pointed out, the default networkd config will enable DHCP on all of the interfaces.

@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Nov 12, 2015

@crawford this is still an outstanding issue for me.

nickethier commented Nov 12, 2015

@crawford this is still an outstanding issue for me.

@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Nov 12, 2015

I'll note that I've worked around it but it would be nice not to have to.

nickethier commented Nov 12, 2015

I'll note that I've worked around it but it would be nice not to have to.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Nov 12, 2015

Member

@nickethier is it not sufficient to disable DHCP on certain interfaces via Ignition? Keeping DHCP enabled on all interfaces is really the only default network config for CoreOS. Custom environments will need to set up the network to match their requirements.

Member

crawford commented Nov 12, 2015

@nickethier is it not sufficient to disable DHCP on certain interfaces via Ignition? Keeping DHCP enabled on all interfaces is really the only default network config for CoreOS. Custom environments will need to set up the network to match their requirements.

@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Nov 12, 2015

Network is borked before ignition so a remote url fetch does not work. This
is on metal and not a cloud provider.
On Thu, Nov 12, 2015 at 2:10 PM Alex Crawford notifications@github.com
wrote:

@nickethier https://github.com/nickethier is it not sufficient to
disable DHCP on certain interfaces via Ignition? Keeping DHCP enabled on
all interfaces is really the only default network config for CoreOS. Custom
environments will need to set up the network to match their requirements.


Reply to this email directly or view it on GitHub
#212 (comment).

nickethier commented Nov 12, 2015

Network is borked before ignition so a remote url fetch does not work. This
is on metal and not a cloud provider.
On Thu, Nov 12, 2015 at 2:10 PM Alex Crawford notifications@github.com
wrote:

@nickethier https://github.com/nickethier is it not sufficient to
disable DHCP on certain interfaces via Ignition? Keeping DHCP enabled on
all interfaces is really the only default network config for CoreOS. Custom
environments will need to set up the network to match their requirements.


Reply to this email directly or view it on GitHub
#212 (comment).

@crawford crawford reopened this Nov 12, 2015

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Nov 12, 2015

Member

You should have several options here:

  1. You can bake the Ignition config into the OEM partition of the image and then point Ignition at that config. This can be done with both PXE images and disk installs.
  2. It should be possible to configure the network via kernel boot parameters. I'd have to do a bit of testing to be sure about this though.
Member

crawford commented Nov 12, 2015

You should have several options here:

  1. You can bake the Ignition config into the OEM partition of the image and then point Ignition at that config. This can be done with both PXE images and disk installs.
  2. It should be possible to configure the network via kernel boot parameters. I'd have to do a bit of testing to be sure about this though.
@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Nov 13, 2015

Id like not to have to go option 1 though I've toyed with it a bit and understand how to do it.

I've poked around networkd and can't seem to find anything that would let me configure via boot parameters but if there is a way to do this that would solve all my problems.

nickethier commented Nov 13, 2015

Id like not to have to go option 1 though I've toyed with it a bit and understand how to do it.

I've poked around networkd and can't seem to find anything that would let me configure via boot parameters but if there is a way to do this that would solve all my problems.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Nov 13, 2015

Member

Let's move option number two over to #981.

Member

crawford commented Nov 13, 2015

Let's move option number two over to #981.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Nov 13, 2015

Member

I don't think there is anything the OS can automatically do here. We don't know which interface is being used for what, so enabling DHCP on all of them seems reasonable. If possible, the DHCP servers should be configured as @stresler mentioned, but in the case that you don't control the DHCP servers, you'll need to give CoreOS a hint as to which interface it should use. This can be done by including a higher-precedence network config in the image itself (by modifying the PXE image or as part of the installation process after coreos-install) or adding a cloud-config/ignition config in the OEM partition to write the aforementioned config.

Member

crawford commented Nov 13, 2015

I don't think there is anything the OS can automatically do here. We don't know which interface is being used for what, so enabling DHCP on all of them seems reasonable. If possible, the DHCP servers should be configured as @stresler mentioned, but in the case that you don't control the DHCP servers, you'll need to give CoreOS a hint as to which interface it should use. This can be done by including a higher-precedence network config in the image itself (by modifying the PXE image or as part of the installation process after coreos-install) or adding a cloud-config/ignition config in the OEM partition to write the aforementioned config.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Nov 13, 2015

Member

I forgot one more option. Using kernel parameters to tell either the kernel or networkd to configure the interfaces.

Member

crawford commented Nov 13, 2015

I forgot one more option. Using kernel parameters to tell either the kernel or networkd to configure the interfaces.

@nickethier

This comment has been minimized.

Show comment
Hide comment
@nickethier

nickethier Nov 13, 2015

I'd +1 using kernel parameters as well.

nickethier commented Nov 13, 2015

I'd +1 using kernel parameters as well.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Nov 13, 2015

Member

It should be possible to use the ip kernel parameter to set the address, though I'm having trouble getting that to work in my QEMU setup.

Member

crawford commented Nov 13, 2015

It should be possible to use the ip kernel parameter to set the address, though I'm having trouble getting that to work in my QEMU setup.

@lgreenlee

This comment has been minimized.

Show comment
Hide comment
@lgreenlee

lgreenlee Feb 23, 2016

Etcd2 requires that the members of the cluster all have static IP addresses, especially for test deployments. While this may not be the "normal case", is a starting task for someone using your platform. This can cause routing problems if a DHCP address is assigned to the NIC. Most other distros do not have this problem and I have found it undesirable to need to include units or scripts to cycle the interfaces after the system starts up since testing the cloud-config is tedious, error prone and can brick images. If there a way to do this through Ignition, some documentation would be nice.

lgreenlee commented Feb 23, 2016

Etcd2 requires that the members of the cluster all have static IP addresses, especially for test deployments. While this may not be the "normal case", is a starting task for someone using your platform. This can cause routing problems if a DHCP address is assigned to the NIC. Most other distros do not have this problem and I have found it undesirable to need to include units or scripts to cycle the interfaces after the system starts up since testing the cloud-config is tedious, error prone and can brick images. If there a way to do this through Ignition, some documentation would be nice.

@crawford

This comment has been minimized.

Show comment
Hide comment
@lgreenlee

This comment has been minimized.

Show comment
Hide comment
@lgreenlee

lgreenlee Feb 23, 2016

@crawford https://github.com/coreos/docs/blob/master/ignition/what-is-ignition.md#providing-ignition-a-config

"This means that if Ignition is being used, it will not be possible to use other tools which also use this userdata (e.g. coreos-cloudinit)"

So it is either ignition or cloud-config?

lgreenlee commented Feb 23, 2016

@crawford https://github.com/coreos/docs/blob/master/ignition/what-is-ignition.md#providing-ignition-a-config

"This means that if Ignition is being used, it will not be possible to use other tools which also use this userdata (e.g. coreos-cloudinit)"

So it is either ignition or cloud-config?

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Feb 24, 2016

Member

So it is either ignition or cloud-config?

That is the official suggestion. In reality, you can do whatever you want with Ignition. There is nothing stopping you from enabling cloudinit from Ignition or passing both coreos.config.url and cloud-config-url to the kernel. If you need both, Ignition is falling short. Is there a particular feature you need?

Member

crawford commented Feb 24, 2016

So it is either ignition or cloud-config?

That is the official suggestion. In reality, you can do whatever you want with Ignition. There is nothing stopping you from enabling cloudinit from Ignition or passing both coreos.config.url and cloud-config-url to the kernel. If you need both, Ignition is falling short. Is there a particular feature you need?

@lgreenlee

This comment has been minimized.

Show comment
Hide comment
@lgreenlee

lgreenlee Feb 24, 2016

Thank you, I put in some time putting together a working cloud config and then found the ignition docs which seemed to indicate that I needed to throw this approach away. I'm not sure that my current use case is aligned with your "normal" deployment profile.

lgreenlee commented Feb 24, 2016

Thank you, I put in some time putting together a working cloud config and then found the ignition docs which seemed to indicate that I needed to throw this approach away. I'm not sure that my current use case is aligned with your "normal" deployment profile.

@FirefighterBlu3

This comment has been minimized.

Show comment
Hide comment
@FirefighterBlu3

FirefighterBlu3 Feb 26, 2016

how do i get ignition/initramfs to stop fetching a brand new dhcp lease when my PXE boot already has one and has passed the parameters on the kernel command line?

e.g.:

[    0.000000] Command line: BOOT_IMAGE=CoreOS/coreos_production_pxe.vmlinuz console=tty0
coreos.autologin rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0 LOGLEVEL=1
cloud-config-url=http://xxxxxxxx/pxe-cloud-config.yml quiet
initrd=CoreOS/coreos_production_pxe_image.cpio.gz
ip=10.99.0.10:10.0.0.1:10.99.0.1:255.255.255.0 BOOTIF=01-00-e0-4c-68-28-4c

fwd ref to issue #981

FirefighterBlu3 commented Feb 26, 2016

how do i get ignition/initramfs to stop fetching a brand new dhcp lease when my PXE boot already has one and has passed the parameters on the kernel command line?

e.g.:

[    0.000000] Command line: BOOT_IMAGE=CoreOS/coreos_production_pxe.vmlinuz console=tty0
coreos.autologin rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0 LOGLEVEL=1
cloud-config-url=http://xxxxxxxx/pxe-cloud-config.yml quiet
initrd=CoreOS/coreos_production_pxe_image.cpio.gz
ip=10.99.0.10:10.0.0.1:10.99.0.1:255.255.255.0 BOOTIF=01-00-e0-4c-68-28-4c

fwd ref to issue #981

@marcovnyc

This comment has been minimized.

Show comment
Hide comment
@marcovnyc

marcovnyc Apr 29, 2016

I am experiencing this issue in AWS. I have two nics one in a private subnet and another in a tools subnet as soon as the secondary interface becomes active i lose connectivity to the instances.

Once i remove the secondary ethernet interaface i am able to connect. These instace has dchp enabled on both interfaces. I am wondering if this is still an issue.

cheers

marcovnyc commented Apr 29, 2016

I am experiencing this issue in AWS. I have two nics one in a private subnet and another in a tools subnet as soon as the secondary interface becomes active i lose connectivity to the instances.

Once i remove the secondary ethernet interaface i am able to connect. These instace has dchp enabled on both interfaces. I am wondering if this is still an issue.

cheers

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Sep 20, 2016

Member

@nickethier I stumbled upon the issue again. Since I last commented, Ignition actually has the ability to read its configuration from a kernel parameter directly. Granted, this config will have to be fairly small since there is a limit to the size of the kernel parameters, but until we get #981 sorted, that's going to be your best bet.

All: As for the underlying issue, we are going to continue keeping DHCP enabled on all interfaces. The network configuration will need to be explicitly configured by Ignition if something other than the default is required. If you are unable to leverage Ignition to do this, then it is a bug and we'd appreciate a new bug report so we can get it fixed.

Member

crawford commented Sep 20, 2016

@nickethier I stumbled upon the issue again. Since I last commented, Ignition actually has the ability to read its configuration from a kernel parameter directly. Granted, this config will have to be fairly small since there is a limit to the size of the kernel parameters, but until we get #981 sorted, that's going to be your best bet.

All: As for the underlying issue, we are going to continue keeping DHCP enabled on all interfaces. The network configuration will need to be explicitly configured by Ignition if something other than the default is required. If you are unable to leverage Ignition to do this, then it is a bug and we'd appreciate a new bug report so we can get it fixed.

@crawford crawford closed this Sep 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment