Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMware OVA 00-.network file breaks network connectivity #1802

Closed
battlecow opened this issue Feb 9, 2017 · 12 comments
Closed

VMware OVA 00-.network file breaks network connectivity #1802

battlecow opened this issue Feb 9, 2017 · 12 comments

Comments

@battlecow
Copy link

@battlecow battlecow commented Feb 9, 2017

Issue Report

Bug

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.9.0
VERSION_ID=1235.9.0
BUILD_ID=2017-02-02-0235
PRETTY_NAME="Container Linux by CoreOS 1235.9.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

VMware OVA using vApp options to utilize a base64 cloud-config.

Expected Behavior

Docker Containers have consistently full networking and are able to ping to the Docker virtual gateway and beyond. For the default cloud-config for VMware OVA NOT to create a base highest priority network config file.

Actual Behavior

Intermittent network failures resulting in loss of network connectivity to outside of the container. During the cloud-config bootstrap the following can be seen within the cloud init logs:
cloud-init.txt

Reproduction Steps

  1. Deploy Coreos OVA image and leave defaults (adding SSH key to allow access to server)
  2. Run the command:
    for ((i=0; i<100; i++)) ; do docker run -it --rm bash bash -c "ping -c 1 google.com &> /dev/null && echo 'Success' || bash" ; done
  3. On failure from within the current container attempt to ping anything beyond its own ip to no avail.

Other Information

During a failure event running the command networkctl status vethxxxxx will show that the interface is stuck Configuring The network file listed will point to: /run/systemd/network/00-.network which was not created or placed there by the user but rather through the VMware OVA. That file has the following configuration:

[Match]

[Network]
DHCP=true

As it is prefixed with 00 it is the base config for all adapters and conflicts with whatever settings VMware places in: /usr/lib64/systemd/network/yy-vmware.network which reads contains:

[Match]
Virtualization=vmware

[Network]
DHCP=yes

[DHCP]
UseMTU=true
UseDomains=true
RequestBroadcast=true

Removing the 00-.network file has so far resolved the issues surrounding this bug.

@crawford
Copy link
Member

@crawford crawford commented Feb 21, 2017

It looks like the issue is that the OVA template is populating both guestinfo.interface.0.dhcp and guestinfo.interface.0.role with default values. DHCP is explicitly enabled, which causes coreos-cloudinit to emit a network config. The fix will be to change that default to "false".

I believe you can work around the issue by setting guestinfo.interface.0.dhcp to "no". DHCP will still be enabled by default on all interfaces, and no network configs should be emitted, which will allow the docker network configs to take effect.

@battlecow
Copy link
Author

@battlecow battlecow commented Feb 24, 2017

This seems to correctly remove the injected config without any side effects for us!

@xcompass
Copy link

@xcompass xcompass commented Apr 27, 2017

Any plan to fix this in coreos OVA template? Thanks

@crawford
Copy link
Member

@crawford crawford commented Apr 27, 2017

@xcompass We haven't had a chance to make the change and test it (we don't do a whole lot with VMware internally). Pull requests are welcome!

@xcompass
Copy link

@xcompass xcompass commented Apr 28, 2017

I've tested three VMs for now and they all worked as expected. I'll be changing more VMs and do more testing. I would be happy to make the PR. Could you give me a hint which repo I should be looking at? I have some trouble to find it :)

@crawford
Copy link
Member

@crawford crawford commented May 4, 2017

The template can be found here: https://github.com/coreos/scripts/blob/master/build_library/template_vmware.ovf#L52. Thanks for digging into this!

xcompass added a commit to xcompass/scripts that referenced this issue May 4, 2017
When OVA template is not being used, the default dhcp value yes will
trigger cloud-init to generate a 00-.network file, which will break
network connectivity Intermittently. Please see the details here:
coreos/bugs#1802 (comment)
@xcompass
Copy link

@xcompass xcompass commented May 4, 2017

Thanks @crawford. PR is made. coreos/scripts#680

@bgilbert
Copy link
Member

@bgilbert bgilbert commented May 5, 2017

Should be fixed by coreos/scripts#680. Thanks @battlecow for the report and @xcompass for the PR!

@bgilbert bgilbert closed this May 5, 2017
@battlecow
Copy link
Author

@battlecow battlecow commented May 5, 2017

Wooo! great news thanks for the PR @xcompass !

@DaveTCode
Copy link

@DaveTCode DaveTCode commented Sep 3, 2018

Apologies for reviving an old ticket but I ran into this last week and I'm curious whether there's more to fix here or whether (the more likely outcome) we were just being daft.

For a certain set of VMs I had a MOP for creating them in vSphere which involved setting DHCP to no during creation, booting the machine (which had a custom network unit that matched on Name=ens192 and had DHCP=yes). Turning the machine off, setting vApp options DHCP to yes and then booting the machine again.

I think that MOP was created before this fix as a workaround for the issue (presumably because we didn't get to the bottom of it) although that's not really the key point here.

However, following the MOP with the latest ova will reproduce this issue. Essentially, if I take a coreos machine in a working state, turn it off, set dhcp to yes in vApp options in vsphere and turn it back on again then the 00-.network file is created with a blank match statement. Is that just a fundamentally broken this to do and anyone who knew what they were doing wouldn't dream of it or is there further work required on this ticket to avoid getting into the broken state?

@bgilbert
Copy link
Member

@bgilbert bgilbert commented Sep 5, 2018

@DaveTCode Changing the vApp options after first boot is not really supported.

If you already have a custom network unit that enables DHCP, why do you need to re-enable it in the vApp options at all?

@DaveTCode
Copy link

@DaveTCode DaveTCode commented Sep 6, 2018

Fair enough!

We definitely don't need to fiddle with vApp options any more - it's not even clear to me why we ever were.

I just wanted to make sure there wasn't some low hanging fruit here in documentation fixes/config changes that would make it clearer that we shouldn't have done this (or ideally prevent it). No worries at all if not, you have to fall back to the flash vsphere app to even edit the vapp options after boot so I can't imagine many people running into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.