Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no cloud agents: vmware #70

Open
dustymabe opened this issue Oct 25, 2018 · 32 comments
Open

no cloud agents: vmware #70

dustymabe opened this issue Oct 25, 2018 · 32 comments
Assignees

Comments

@dustymabe
Copy link
Member

@dustymabe dustymabe commented Oct 25, 2018

In #12 we decided that we'd like to try to not ship cloud agents. This ticket will document investigation and strategy for shipping without a cloud agent on the vmware virtualization cloud platform.

See also #41 for a discussion of how to ship cloud specific bits using ignition.

@dustymabe
Copy link
Member Author

@dustymabe dustymabe commented Oct 31, 2018

For this one it is mostly unfamiliar territory and we'd need a set up environment to experiment and determine what is needed/not needed. We might be able to use packet's ESX servers for this. If that doesn't work then we'd need to access a set up environment.

@dustymabe dustymabe mentioned this issue Dec 12, 2018
4 of 56 tasks complete
@dustymabe dustymabe added the cloud* label Dec 13, 2018
@bgilbert bgilbert self-assigned this Jan 16, 2019
@bgilbert
Copy link
Member

@bgilbert bgilbert commented Feb 1, 2019

In Container Linux:

  • Ignition supports reading configs from VMware Guestinfo, provided directly or via an OVF environment.
  • coreos-metadata has no support for VMware.
  • The kernel includes in-tree modules for VMCI and VMCI sockets, PV SCSI and Ethernet, and ballooning.
  • vmtoolsd from open-vm-tools runs directly in the host. Much of the open-vm-tools functionality (out-of-tree kernel modules, PAM modules, HGFS, etc.) is disabled.
  • CL ships three relevant install images: a vmx + vmdk, an OVA, and a raw image for installing into an empty VMware VM using coreos-install.

It appears that the primary benefits of vmtoolsd are:

  • Power events initiated through the hypervisor will gracefully restart / shut down the guest
  • Quiescing filesystems for snapshotting
  • Heartbeat generation for VMware HA
  • Clock synchronization
  • Collection of guest metrics on behalf of the host
  • Preventing the VMware UI from complaining about the absence of VMware Tools

I think we'll need to address the first point at least, since it seems likely to surprise users if VMware power controls default to hard shutdown/reboot on Fedora CoreOS but not on other distros.

On Fedora CoreOS:

  • The Fedora kernel already enables VMware kernel modules.
  • We could ship either (a) without open-vm-tools, (b) with it running in the host, or (c) with it running in a container. If we ship it, we should probably disable superfluous vmtoolsd modules to prevent them from becoming a compatibility constraint. Since open-vm-tools is open source, we could in principle (d) reimplement the relevant pieces (soft shutdown support at a minimum) in coreos-metadata.
  • Hopefully we don't need to ship both the vmx + vmdk and an OVA. vmx + vmdk is unwieldy because of the two separate files. The OVA doesn't have that problem, and on some VMware products it includes UI support for injecting Ignition configs, but launching an OVA requires users to go through a separate import step.
  • Users do in fact install into an empty VM, so we should preserve that functionality. It'd be good to avoid shipping a separate disk image for it, though. Unlike on CL, this would be possible if the install script learned to modify the platform ID in the installed image.
@redbaron
Copy link

@redbaron redbaron commented Feb 3, 2019

We are users of CoreOS on VMware in a quite strict environment and I can say that without vmtoolsd we wouldn't get an exception from infra team to run our own image.

@bgilbert
Copy link
Member

@bgilbert bgilbert commented Feb 4, 2019

@redbaron Which vmtoolsd functionality do you depend on? As noted, we'd probably disable unnecessary modules.

@dcode
Copy link

@dcode dcode commented Apr 16, 2019

One of the vmtoolsd features not listed that I depend on is reporting the IP addresses associated with a given vm. This let's me use the vsphere and/or esx APIs to get associated IPs for follow-on scripted actions. I specifically do this for CI/CD of system deployment scripts.

@cgwalters
Copy link
Member

@cgwalters cgwalters commented Jun 17, 2019

As of 4.1 RHCOS ships open-vm-tools by default; we're not entirely happy about this but it's where we are.

@kpettijohn
Copy link

@kpettijohn kpettijohn commented Jun 21, 2019

Previously when running CoreOS on VMware it was very nice to have vmtoolsd expose the IP address of the VM, which could then be used as a Terraform output or passed to other resources.

Basic Terraform example

resource "vsphere_virtual_machine" "linux" {
  name             = "${var.vm_name}"
  resource_pool_id = "${data.vsphere_resource_pool.pool.id}"
  datastore_id     = "${data.vsphere_datastore.datastore.id}"
  folder           = "${var.vm_folder}"

  num_cpus = "${var.vcpu}"
  memory   = "${var.memory}"

  guest_id = "${data.vsphere_virtual_machine.template.guest_id}"

  network_interface {
    network_id = "${data.vsphere_network.network.id}"
  }

  clone {
    template_uuid = "${data.vsphere_virtual_machine.template.id}"
  }

  scsi_type = "lsilogic"

  disk {
    label            = "disk00"
    size             = "${data.vsphere_virtual_machine.template.disks.0.size}"
    thin_provisioned = true
  }

  extra_config {
    guestinfo.hostname                      = "${var.hostname}"
    guestinfo.ignition.config.data.encoding = "base64"
    guestinfo.ignition.config.data          = "${base64encode(file("./ignition.json"))}"
  }
}

output "ip" {
  value = "${vsphere_virtual_machine.linux.guest_ip_addresses}"
}

output "vmware_tools_status" {
  value = "${vsphere_virtual_machine.linux.vmware_tools_status}"
}
@dcode
Copy link

@dcode dcode commented Jun 23, 2019

@kpettijohn that and ansible dynamic inventory is what I'm looking for specifically.

@lucab
Copy link
Member

@lucab lucab commented Jun 23, 2019

Network introspection belongs to the "collection of guest metrics on behalf of the host" bucket in the list above.

While I understand its usefulness, it isn't (IMHO) a very pressing requirement for the following reasons:

  • it doesn't strictly require an on-host C daemon: an ad-hoc inventory agent running in a container with host-netns access can perform the same task, but in safer way and less exposed to distro-regressions.
  • "network config snapshot" topic is somehow ill-defined for dynamic environments, as there is not a single point in time when the network configuration can be declared done (i.e. immutable for the rest of node lifetime)
  • node introspection is not strictly needed here. If the configuration is dynamic (i.e. DHCP), the source of truth is in the network environment configuration; otherwise if the configuration is static, the source of truth is in whatever generated the Ignition config. In both cases, querying (from TF or ansible) the source of truth seems preferable for correctness.
@kpettijohn
Copy link

@kpettijohn kpettijohn commented Jun 24, 2019

Thanks for the feedback @lucab. After looking into things further I think I should be able to get by using the toolbox provided by vmware/govmomi for my use case.

Here is a basic usage example that when built and run on a VM hosted by ESX, will register itself as a Guest Managed instance of VMware tools and report the default IP address.

fedora-coreos-vmtools

@Reamer
Copy link

@Reamer Reamer commented Jul 24, 2019

It would be nice, if FCOS would also ship with open-vm-tools.
open-vm-tools are needed for K8s vsphere storage provider.

@vrutkovs vrutkovs mentioned this issue Nov 15, 2019
4 of 4 tasks complete
@kai-uwe-rommel
Copy link

@kai-uwe-rommel kai-uwe-rommel commented Jan 24, 2020

Also, open-vm-tools would allow a graceful shutdown or reboot of a FCOS VM from the vSphere client or CLI/API.

@alexlllll
Copy link

@alexlllll alexlllll commented Feb 5, 2020

I just tried the ova deployment and I dont see any open-vm-tools. So what happened?

@LorbusChris
Copy link
Contributor

@LorbusChris LorbusChris commented Feb 6, 2020

Including it in FCOS by default is not really what we want, because, and I'm quoting from the first comment:

In #12 we decided that we'd like to try to not ship cloud agents.

You should be able to install it with:

rpm-ostree install open-vm-tools
@kai-uwe-rommel
Copy link

@kai-uwe-rommel kai-uwe-rommel commented Feb 6, 2020

@LorbusChris Great, thanks for the information ... BTW, is there documentation about this?

Is there a way to get this "rpm-ostree install open-vm-tools" executed automatically during installation? I did not see something in the description of the ignition files that would allow this? Please correct me if I'm wrong. Thanks!

@zotrix
Copy link

@zotrix zotrix commented Feb 11, 2020

This hack more useful instead of "rpm-ostree install open-vm-tools", no reboot needed and possible provision

systemd:
  units:
    - name: open-vm-tools.service
      enabled: true
      contents: |
        [Unit]
        Description=Open VM Tools
        After=network-online.target
        Wants=network-online.target

        [Service]
        TimeoutStartSec=0
        ExecStartPre=-/bin/podman kill open-vm-tools
        ExecStartPre=-/bin/podman rm open-vm-tools
        ExecStartPre=/bin/podman pull open-vm-tools:fc31
        ExecStart=/bin/podman run -e SYSTEMD_IGNORE_CHROOT=1 -v  /proc/:/hostproc/ -v /sys/fs/cgroup:/sys/fs/cgroup -v /run/systemd:/run/systemd --pid=host --net=host --ipc=host --uts=host --rm  --privileged --name open-vm-tools open-vm-tools:fc31

        [Install]
        WantedBy=multi-user.target
@lucab lucab added platform/vmware and removed cloud* labels Feb 18, 2020
@varesa
Copy link

@varesa varesa commented Feb 19, 2020

@zotrix Is that container image available somewhere or was that just a proposal?

@dthomastx
Copy link

@dthomastx dthomastx commented Feb 25, 2020

I have been using terraforms 'wait for ip' functionality which relies on open-vm-tools when provisioning RHCOS which doesn't work now. Would be nice if it did

@zotrix
Copy link

@zotrix zotrix commented Feb 26, 2020

@zotrix Is that container image available somewhere or was that just a proposal?

@varesa in private registry, but Dockerfile like in this repo
https://github.com/projectatomic/atomic-system-containers/tree/master/open-vm-tools-centos

@kai-uwe-rommel
Copy link

@kai-uwe-rommel kai-uwe-rommel commented Mar 13, 2020

I finally came up with this:

systemd:
  units:
    - name: postinstall.service
      enabled: true
      contents: |
        [Unit]
        Description=Post Installation
        After=network-online.target
        Wants=network-online.target

        [Service]
        TimeoutStartSec=0
        ExecStart=/bin/bash -c "/bin/rpm-ostree install open-vm-tools nrpe && reboot || /bin/true"

        [Install]
        WantedBy=multi-user.target

This also easily allows to install more packages in one step.

@straffalli
Copy link

@straffalli straffalli commented Mar 18, 2020

Hi,

We were using CoreOS as underlaying OS for Kubernetes clusters, we try now to move to FCOS, and we encounter this issue with open-vm-tools, workaround requires adding a unit just for installation and a reboot step, that is not very handy ...

Like already said, this is required for VM graceful shutdown, vSphere storage provider for K8S, reporting guest metrics, etc ...

In #12 we decided that we'd like to try to not ship cloud agents.

If this is a strict decision, Is it possible to re-apply an Ignition file after first boot on FCOS (like for CoreOS) in order to use a tool like Packer to add open-vm-tools extra package?

Thanks

@varesa
Copy link

@varesa varesa commented Mar 18, 2020

@straffalli why not run in a container (a dockerfile linked above)? Works fine for us. Just add an unit for that, no need to reboot or run Ignition twice

@remoe
Copy link

@remoe remoe commented May 28, 2020

@varesa does shutdown work on your setup?

govc vm.power -s -force <name>

@varesa
Copy link

@varesa varesa commented Jun 14, 2020

@remoe

It does work.

esa@desktop $ govc vm.power -k -s k8s-test-master-1
Shutdown guest VirtualMachine:vm-38027... OK

VM starts shutting down and after a minute or so stops

@Amos-85
Copy link

@Amos-85 Amos-85 commented Nov 1, 2020

After installing open-vm-tools with rpm-ostree and rebooting the machine, I'm getting those errors in /var/log/vmware-vmtoolsd-root.log :

[2020-11-01T17:49:53.450Z] [ message] [vmsvc] Log caching is enabled with maxCacheEntries=4096.
[2020-11-01T17:49:53.450Z] [ message] [vmsvc] Core dump limit set to -1
[2020-11-01T17:49:53.450Z] [ message] [vmtoolsd] Tools Version: 11.1.5.22735 (build-16724464)
[2020-11-01T17:49:53.699Z] [ message] [vmsvc] Cannot load message catalog for domain 'hgfsServer', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.699Z] [ message] [vmtoolsd] Plugin 'hgfsServer' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vix] QueryVGAuthConfig: vgauth usage is: 1
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'vix', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'vix' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'appInfo', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'appInfo' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'deployPkg', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'deployPkg' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'guestInfo', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'guestInfo' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'powerops', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'powerops' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'timeSync', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'timeSync' initialized.
[2020-11-01T17:49:53.700Z] [ message] [vmsvc] Cannot load message catalog for domain 'vmbackup', language 'C', catalog dir '/usr/share/open-vm-tools'.
[2020-11-01T17:49:53.700Z] [ message] [vmtoolsd] Plugin 'vmbackup' initialized.
[2020-11-01T17:49:53.704Z] [ message] [vix] VixTools_ProcessVixCommand: command 62

The Esxi version is 6.7

is someone face it before?

@remoe
Copy link

@remoe remoe commented Nov 1, 2020

@Amos-85 , do you have tried #503 (comment) ?

@Amos-85
Copy link

@Amos-85 Amos-85 commented Nov 1, 2020

@remoe not yet,
Is it should function within a container or am I misconfigure something in the vm template?

@Amos-85
Copy link

@Amos-85 Amos-85 commented Nov 2, 2020

@remoe
I've run open-vm-tools in a container like the solution you mentioned but I see the same output inside the container in the log /var/log/vmware-vmtoolsd-root.log

it's very odd issue.

@remoe
Copy link

@remoe remoe commented Nov 2, 2020

@Amos-85 It works with "fedora-coreos-32.20200824.3.0" on ESXi 6.7. I don't have this issue.

@Amos-85
Copy link

@Amos-85 Amos-85 commented Nov 2, 2020

I'm not sure it's relate to the issue,
what guest OS version have you choose in the vm template?

@remoe
Copy link

@remoe remoe commented Nov 2, 2020

This is selected from the official fcos ova template.

@Amos-85
Copy link

@Amos-85 Amos-85 commented Nov 3, 2020

Right now I've succeeded to run open-vm-tools with the container solution @remoe mentioned but now I'm getting other exception relate to perl package in open-vm-tools in /var/log/vmware-imc/toolsDeployPkg.log

[root@localhost log]# cat vmware-imc/toolsDeployPkg.log 
[2020-11-03T13:34:05.591Z] [   debug] ## Starting deploy pkg operation
[2020-11-03T13:34:05.591Z] [   debug] Deploying /var/run/201b5b4e/imcf-j3KP2b
[2020-11-03T13:34:05.591Z] [    info] Initializing deployment module.

[2020-11-03T13:34:05.591Z] [    info] Cleaning old state files.

[2020-11-03T13:34:05.591Z] [    info] EXIT STATE 'INPROGRESS'.

[2020-11-03T13:34:05.591Z] [   debug] Setting deploy error: 'Error removing lock '/var/log/.vmware-deploy.INPROGRESS'.(No such file or directory)'.

[2020-11-03T13:34:05.591Z] [    info] EXIT STATE 'Done'.

[2020-11-03T13:34:05.591Z] [   debug] Setting deploy error: 'Error removing lock '/var/log/.vmware-deploy.Done'.(No such file or directory)'.

[2020-11-03T13:34:05.591Z] [    info] EXIT STATE 'ERRORED'.

[2020-11-03T13:34:05.591Z] [   debug] Setting deploy error: 'Error removing lock '/var/log/.vmware-deploy.ERRORED'.(No such file or directory)'.

[2020-11-03T13:34:05.591Z] [   debug] Setting deploy error: 'Success.'.

[2020-11-03T13:34:05.591Z] [    info] Deploying cabinet file '/var/run/201b5b4e/imcf-j3KP2b'.

[2020-11-03T13:34:05.591Z] [    info] Transitioning from state '(null)' to state 'INPROGRESS'.

[2020-11-03T13:34:05.591Z] [    info] ENTER STATE 'INPROGRESS'.

[2020-11-03T13:34:05.592Z] [    info] Reading cabinet file '/var/run/201b5b4e/imcf-j3KP2b' and will extract it to '/var/run/.vmware-imgcust-dD4kPZE'.

[2020-11-03T13:34:05.592Z] [    info] Flags in the header: 0.

[2020-11-03T13:34:05.592Z] [    info] Original deployment command: '/bin/sh /tmp/.vmware/linux/deploy/scripts/customize.sh /tmp/.vmware/linux/deploy/cust.cfg'.

[2020-11-03T13:34:05.592Z] [    info] Actual deployment command: '/bin/sh /var/run/.vmware-imgcust-dD4kPZE/scripts/customize.sh /var/run/.vmware-imgcust-dD4kPZE/cust.cfg'.

[2020-11-03T13:34:05.592Z] [    info] Extracting package files.

[2020-11-03T13:34:05.610Z] [   debug] Check if cust.cfg exists.

[2020-11-03T13:34:05.610Z] [    info] cust.cfg is found in '/var/run/.vmware-imgcust-dD4kPZE' directory.

[2020-11-03T13:34:05.610Z] [   debug] Command to exec : '/usr/bin/cloud-init'.

[2020-11-03T13:34:05.610Z] [    info] sizeof ProcessInternal is 56

[2020-11-03T13:34:05.610Z] [    info] Returning, pending output from stdout
[2020-11-03T13:34:05.610Z] [    info] Returning, pending output from stderr
[2020-11-03T13:34:05.633Z] [    info] Process exited normally after 0 seconds, returned 127
[2020-11-03T13:34:05.633Z] [    info] No more output from stdout
[2020-11-03T13:34:05.633Z] [    info] No more output from stderr
[2020-11-03T13:34:05.633Z] [    info] Customization command output: ''.

[2020-11-03T13:34:05.633Z] [   error] Customization command failed with exitcode: 127, stderr: ''.

[2020-11-03T13:34:05.633Z] [    info] cloud-init is not installed.

[2020-11-03T13:34:05.633Z] [    info] Executing traditional GOSC workflow.

[2020-11-03T13:34:05.633Z] [   debug] Command to exec : '/bin/sh'.

[2020-11-03T13:34:05.633Z] [    info] sizeof ProcessInternal is 56

[2020-11-03T13:34:05.633Z] [    info] Returning, pending output from stdout
[2020-11-03T13:34:05.633Z] [    info] Returning, pending output from stderr
[2020-11-03T13:34:05.645Z] [    info] Process exited normally after 0 seconds, returned 1
[2020-11-03T13:34:05.645Z] [    info] Saving output from stdout
[2020-11-03T13:34:05.645Z] [    info] No more output from stdout
[2020-11-03T13:34:05.645Z] [    info] No more output from stderr
[2020-11-03T13:34:05.645Z] [    info] Customization command output: 'GOSC_DIR: /var/run/.vmware-imgcust-dD4kPZE/scripts
OS_KERNEL: Linux
ERROR: Guest Customization is not supported on systems not having Perl installed.
'.

[2020-11-03T13:34:05.645Z] [   error] Customization command failed with exitcode: 1, stderr: ''.

[2020-11-03T13:34:05.645Z] [   error] Customization process returned with error.

[2020-11-03T13:34:05.645Z] [   debug] Deployment result = 1.

[2020-11-03T13:34:05.645Z] [    info] Setting 'unknown' error status in vmx.

[2020-11-03T13:34:05.646Z] [    info] Transitioning from state 'INPROGRESS' to state 'ERRORED'.

[2020-11-03T13:34:05.646Z] [    info] ENTER STATE 'ERRORED'.

[2020-11-03T13:34:05.646Z] [    info] EXIT STATE 'INPROGRESS'.

[2020-11-03T13:34:05.646Z] [   debug] Setting deploy error: 'Deployment failed.The forked off process returned error code.'.

[2020-11-03T13:34:05.646Z] [   error] Deployment failed.The forked off process returned error code.

[2020-11-03T13:34:05.646Z] [    info] Launching cleanup.

[2020-11-03T13:34:05.646Z] [   debug] Command to exec : '/bin/rm'.

[2020-11-03T13:34:05.646Z] [    info] sizeof ProcessInternal is 56

[2020-11-03T13:34:05.647Z] [    info] Returning, pending output from stdout
[2020-11-03T13:34:05.647Z] [    info] Returning, pending output from stderr
[2020-11-03T13:34:05.674Z] [    info] Process exited normally after 0 seconds, returned 0
[2020-11-03T13:34:05.674Z] [    info] No more output from stdout
[2020-11-03T13:34:05.674Z] [    info] No more output from stderr
[2020-11-03T13:34:05.674Z] [    info] Customization command output: ''.

[2020-11-03T13:34:05.674Z] [    info] sSkipReboot: 'false', forceSkipReboot 'false'.

[2020-11-03T13:34:05.674Z] [   error] Deploy error: 'Deployment failed.The forked off process returned error code.'.

[2020-11-03T13:34:05.674Z] [   error] Package deploy failed in DeployPkg_DeployPackageFromFile
[2020-11-03T13:34:05.674Z] [   debug] ## Closing log

only after installing perl everything work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet