VMware OVA 00-.network file breaks network connectivity #1802

Closed
battlecow opened this Issue Feb 9, 2017 · 9 comments

Comments

Projects
None yet
4 participants
@battlecow

battlecow commented Feb 9, 2017

Issue Report

Bug

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.9.0
VERSION_ID=1235.9.0
BUILD_ID=2017-02-02-0235
PRETTY_NAME="Container Linux by CoreOS 1235.9.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

VMware OVA using vApp options to utilize a base64 cloud-config.

Expected Behavior

Docker Containers have consistently full networking and are able to ping to the Docker virtual gateway and beyond. For the default cloud-config for VMware OVA NOT to create a base highest priority network config file.

Actual Behavior

Intermittent network failures resulting in loss of network connectivity to outside of the container. During the cloud-config bootstrap the following can be seen within the cloud init logs:
cloud-init.txt

Reproduction Steps

  1. Deploy Coreos OVA image and leave defaults (adding SSH key to allow access to server)
  2. Run the command:
    for ((i=0; i<100; i++)) ; do docker run -it --rm bash bash -c "ping -c 1 google.com &> /dev/null && echo 'Success' || bash" ; done
  3. On failure from within the current container attempt to ping anything beyond its own ip to no avail.

Other Information

During a failure event running the command networkctl status vethxxxxx will show that the interface is stuck Configuring The network file listed will point to: /run/systemd/network/00-.network which was not created or placed there by the user but rather through the VMware OVA. That file has the following configuration:

[Match]

[Network]
DHCP=true

As it is prefixed with 00 it is the base config for all adapters and conflicts with whatever settings VMware places in: /usr/lib64/systemd/network/yy-vmware.network which reads contains:

[Match]
Virtualization=vmware

[Network]
DHCP=yes

[DHCP]
UseMTU=true
UseDomains=true
RequestBroadcast=true

Removing the 00-.network file has so far resolved the issues surrounding this bug.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Feb 21, 2017

Member

It looks like the issue is that the OVA template is populating both guestinfo.interface.0.dhcp and guestinfo.interface.0.role with default values. DHCP is explicitly enabled, which causes coreos-cloudinit to emit a network config. The fix will be to change that default to "false".

I believe you can work around the issue by setting guestinfo.interface.0.dhcp to "no". DHCP will still be enabled by default on all interfaces, and no network configs should be emitted, which will allow the docker network configs to take effect.

Member

crawford commented Feb 21, 2017

It looks like the issue is that the OVA template is populating both guestinfo.interface.0.dhcp and guestinfo.interface.0.role with default values. DHCP is explicitly enabled, which causes coreos-cloudinit to emit a network config. The fix will be to change that default to "false".

I believe you can work around the issue by setting guestinfo.interface.0.dhcp to "no". DHCP will still be enabled by default on all interfaces, and no network configs should be emitted, which will allow the docker network configs to take effect.

@battlecow

This comment has been minimized.

Show comment
Hide comment
@battlecow

battlecow Feb 24, 2017

This seems to correctly remove the injected config without any side effects for us!

This seems to correctly remove the injected config without any side effects for us!

@xcompass

This comment has been minimized.

Show comment
Hide comment
@xcompass

xcompass Apr 27, 2017

Any plan to fix this in coreos OVA template? Thanks

Any plan to fix this in coreos OVA template? Thanks

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Apr 27, 2017

Member

@xcompass We haven't had a chance to make the change and test it (we don't do a whole lot with VMware internally). Pull requests are welcome!

Member

crawford commented Apr 27, 2017

@xcompass We haven't had a chance to make the change and test it (we don't do a whole lot with VMware internally). Pull requests are welcome!

@xcompass

This comment has been minimized.

Show comment
Hide comment
@xcompass

xcompass Apr 28, 2017

I've tested three VMs for now and they all worked as expected. I'll be changing more VMs and do more testing. I would be happy to make the PR. Could you give me a hint which repo I should be looking at? I have some trouble to find it :)

I've tested three VMs for now and they all worked as expected. I'll be changing more VMs and do more testing. I would be happy to make the PR. Could you give me a hint which repo I should be looking at? I have some trouble to find it :)

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford May 4, 2017

Member

The template can be found here: https://github.com/coreos/scripts/blob/master/build_library/template_vmware.ovf#L52. Thanks for digging into this!

Member

crawford commented May 4, 2017

The template can be found here: https://github.com/coreos/scripts/blob/master/build_library/template_vmware.ovf#L52. Thanks for digging into this!

xcompass added a commit to xcompass/scripts that referenced this issue May 4, 2017

Change default value of dhcp for vmware template to no
When OVA template is not being used, the default dhcp value yes will
trigger cloud-init to generate a 00-.network file, which will break
network connectivity Intermittently. Please see the details here:
coreos/bugs#1802 (comment)
@xcompass

This comment has been minimized.

Show comment
Hide comment

xcompass commented May 4, 2017

Thanks @crawford. PR is made. coreos/scripts#680

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert May 5, 2017

Member

Should be fixed by coreos/scripts#680. Thanks @battlecow for the report and @xcompass for the PR!

Member

bgilbert commented May 5, 2017

Should be fixed by coreos/scripts#680. Thanks @battlecow for the report and @xcompass for the PR!

@bgilbert bgilbert closed this May 5, 2017

@battlecow

This comment has been minimized.

Show comment
Hide comment
@battlecow

battlecow May 5, 2017

Wooo! great news thanks for the PR @xcompass !

Wooo! great news thanks for the PR @xcompass !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment