Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmware_guest - after cloning nic is not connected to network #45834

Closed
arno1704 opened this issue Sep 19, 2018 · 29 comments
Closed

vmware_guest - after cloning nic is not connected to network #45834

arno1704 opened this issue Sep 19, 2018 · 29 comments
Labels
affects_2.6 This issue/PR affects Ansible v2.6 bot_closed bug This issue/PR relates to a bug. cloud collection:community.vmware collection Related to Ansible Collections work module This issue/PR relates to a module. needs_collection_redirect https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md support:community This issue/PR relates to code supported by the Ansible community. vmware VMware community

Comments

@arno1704
Copy link

SUMMARY

When cloning a Template in the created VM the checkbox for "Connect at power on" of the NIC is not set - so the VM boots without working network

ISSUE TYPE
  • Bug Report
COMPONENT NAME

vmware_guest

ANSIBLE VERSION
ansible 2.6.4
  config file = /mnt/d/Git/GitLab/Ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /root/.local/lib/python2.7/site-packages/ansible
  executable location = /root/.local/bin/ansible
  python version = 2.7.6 (default, Nov 23 2017, 15:49:48) [GCC 4.8.4]
CONFIGURATION
DEFAULT_HOST_LIST(/mnt/d/Git/GitLab/Ansible/ansible.cfg) = [u'/mnt/d/Git/GitLab/Ansible/testhosts.yml']
OS / ENVIRONMENT

vCenter 6.7 - using distributed swtiches
ESXi 6.7 and 6.0U3
Template-OS: Windows Server 2016 Core
VMWare Tools: 10304 (10.2.0)

STEPS TO REPRODUCE

Executed the Playbook i attached...

  tasks:
    - name: Clone Virtual Machine
      vmware_guest:
        hostname: "{{ vcenter_hostname }}"
        username: "{{ vcenter_user }}"
        password: "{{ vcenter_pass }}"
        validate_certs: False
        name: "{{ inventory_hostname }}"
        folder: "WindowsServer"
        datacenter: "{{ datacenter }}"
        esxi_hostname: "{{ esxi_host }}"
        template: "Win2016_Core_Eng_BaseImage"
        state: poweredon
        networks:
        - name: "{{ network }}"
          ip: "{{ ip_address }}"
          netmask: "{{ ip_netmask }}"
          gateway: "{{ ip_gateway }}"
          start_connected: True
          dns_servers:
          - 10.30.0.11
          - 10.30.0.12
        customization:
          autologon: True
          joindomain: fhwn.ac.at
          domainadmin: administrator
          domainadminpassword: '{{ dom_admin_pass }}'
          dns_servers:
          - 10.30.0.11
          - 10.30.0.12
          password: "{{ local_admin_pwd }}"
          runonce:
          - powershell.exe -ExecutionPolicy Unrestricted -File C:\Scripts\ConfigureRemotingForAnsible.ps1 -ForceNewSSLCert -EnableCredSSP
EXPECTED RESULTS

A Cloned VM with working network-connections joining the domain

ACTUAL RESULTS

VM gets cloned correctly - NICs are alle on the right vSwitch. But the checkbox for "Connect at PowerOn" is not set - so when the vm start it gets no network-connection and the task fails.
If i set state: poweredoff - check "Connect at PowerOn" at the NIC and boot the VM - everything works as expected
No difference if a NIC is present in the template or not - same result

changed: [adfsproxy01] => {
    "changed": true,
    "instance": {
        "annotation": "",
        "current_snapshot": null,
        "customvalues": {},
        "guest_consolidation_needed": false,
        "guest_question": null,
        "guest_tools_status": "guestToolsNotRunning",
        "guest_tools_version": "10304",
        "hw_cores_per_socket": 1,
        "hw_datastores": [
            "BigData2_Store2"
        ],
        "hw_esxi_host": "********",
        "hw_eth0": {
            "addresstype": "assigned",
            "ipaddresses": null,
            "label": "Network adapter 1",
            "macaddress": "00:50:56:9c:46:f9",
            "macaddress_dash": "00-50-56-9c-46-f9",
            "portgroup_key": "dvportgroup-642",
            "portgroup_portkey": "250",
            "summary": "DVSwitch: 64 c5 1c 50 95 a9 4a 9f-5a 08 8b 44 54 f9 41 92"
        },
        "hw_files": [
            "[BigData2_Store2] adfsproxy01/adfsproxy01.vmx",
            "[BigData2_Store2] adfsproxy01/adfsproxy01.nvram",
            "[BigData2_Store2] adfsproxy01/adfsproxy01.vmsd",
            "[BigData2_Store2] adfsproxy01/adfsproxy01.vmdk"
        ],
        "hw_folder": "/*****",
        "hw_guest_full_name": null,
        "hw_guest_ha_state": null,
        "hw_guest_id": null,
        "hw_interfaces": [
            "eth0"
        ],
        "hw_is_template": false,
        "hw_memtotal_mb": 2048,
        "hw_name": "*******",
        "hw_power_status": "poweredOff",
        "hw_processor_count": 2,
        "hw_product_uuid": "421c848d-b854-fc5a-d934-6fd930cd5b2d",
        "hw_version": "vmx-11",
        "instance_uuid": "501c1b84-69f0-3797-ae45-cf1d8ebdc6ba",
        "ipv4": null,
        "ipv6": null,
        "module_hw": true,
        "snapshots": []
    },
    "invocation": {
        "module_args": {
            "annotation": null,
            "cdrom": {},
            "cluster": null,
            "customization": {
                "autologon": true,
                "dns_servers": [
                    "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
                    "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER"
                ],
                "domainadmin": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
                "domainadminpassword": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
                "joindomain": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
                "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
                "runonce": [
                    "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER"
                ]
            },
            "customization_spec": null,
            "customvalues": [],
            "datacenter": "*******",
            "disk": [],
            "esxi_hostname": "*******.server.********",
            "folder": "WindowsServer",
            "force": false,
            "guest_id": null,
            "hardware": {},
            "hostname": "********",
            "is_template": false,
            "linked_clone": false,
            "name": "*******",
            "name_match": "first",
            "networks": [
                {
                    "dns_servers": [
                        "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
                        "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER"
                    ],
                    "gateway": "10.22.0.1",
                    "ip": "10.22.0.101",
                    "name": "Dis_DMZ",
                    "netmask": "255.255.0.0",
                    "start_connected": true,
                    "type": "static"
                }
            ],
            "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "port": 443,
            "resource_pool": null,
            "snapshot_src": null,
            "state": "poweredoff",
            "state_change_timeout": 0,
            "template": "Win2016_Core_Eng_BaseImage",
            "username": "********@vsphere.local",
            "uuid": null,
            "validate_certs": false,
            "vapp_properties": [],
            "wait_for_ip_address": false
        }
    }
}
@ansibot
Copy link
Contributor

ansibot commented Sep 19, 2018

Files identified in the description:

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibot
Copy link
Contributor

ansibot commented Sep 19, 2018

@ansibot
Copy link
Contributor

ansibot commented Sep 19, 2018

Hi @arno1704,

Thank you for the issue, just so you are aware we have a dedicated Working Group for vmware.
You can find other people interested in this in #ansible-vmware on Freenode IRC
For more information about communities, meetings and agendas see https://github.com/ansible/community

click here for bot help

@ansibot ansibot added affects_2.6 This issue/PR affects Ansible v2.6 bug This issue/PR relates to a bug. cloud module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. support:core This issue/PR relates to code supported by the Ansible Engineering Team. vmware VMware community labels Sep 19, 2018
@jborean93 jborean93 removed the needs_triage Needs a first human triage before being processed. label Sep 20, 2018
@cigamit
Copy link
Contributor

cigamit commented Sep 20, 2018

Most likely VM Customization is failing on your host. You will need to find out why (there should be a log file on the server itself). VMWare disables the nic until customization completes and then re-enables it once done. This is all handled by VMWare and is not done by the module itself.

@arno1704
Copy link
Author

Do you have any idea in which logfile i could find this info - i found nothing...

@cigamit
Copy link
Contributor

cigamit commented Sep 20, 2018

A quick google tells me to look at
C:\WINDOWS\TEMP\vmware-imc\toolsDeployPkg.txt
or
C:\WINDOWS\TEMP\customize-guest.log

@arno1704
Copy link
Author

Thanks for the tipp - but these logs get created after i power on the vm - thats too late - then the error allready happened.
If i clone the VM with one existing nic without sections networks: and customization: everything is ok - the checkbox at "connect at poweron" is checked
if i add the networks: section to the same vm (without customization-section) after cloning finished the checkbox "connect at power on" for the nic is not checked - so the vm would start with no network available

so the question still is: if this is a vmware bug - in wich logfile can i find this? must be someware on the vcenter or the esx hosts... but i think ansible is not setting the checkbox even i set "start_connected" to true

my networks-section:
- name: "{{ network }}"
ip: "{{ ip_address }}"
netmask: "{{ ip_netmask }}"
gateway: "{{ ip_gateway }}"
start_connected: True <<<- i think this should check the checkbox "connect at power on"
dns_servers:
- 10.30.0.11
- 10.30.0.12

@michalss
Copy link

michalss commented Sep 24, 2018

i have a sam issue, nothing helps. If i do it manyally from vcenter it works ok, dont think this is a vmware bug. Also experience a lot of more issue with linux template where after clone, it wont boot and get me error "you need to load the kernel first". This is very frustrating.. :( pls help

@ansibot ansibot added support:community This issue/PR relates to code supported by the Ansible community. and removed support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Oct 2, 2018
@scottsisco
Copy link

I too am experiencing this issue. I have an Ubuntu 18.04 template and when I use ansible to clone the template I am left with the "Connect At Power On" box un-checked under settings -> network adapter 1 for the new VM. When I use vCenter to clone from the template this is not a problem and the "Connect At Power On" box remains checked under settings -> network adapter 1 for the new VM.

However, I use ansible to clone from a Red Hat 7 template, that I use frequently, that has no problems what so ever.

@cigamit
Copy link
Contributor

cigamit commented Nov 14, 2018

@scottsisco What vCenter version? (different versions of VCenter only support specific OSes) Do you have open-vm-tools and perl installed? Have you checked the vmware tools log on the new VM?

@arno1704
Copy link
Author

When i check the vmTools log there is nothing - but can there be anything in the logs? Since the checkbox is missing even if i don't turn on the vm - so vmware-tools are not running at this time. The error happens before switching on the machine - right at the clone-process

@cigamit
Copy link
Contributor

cigamit commented Nov 15, 2018

@arno1704 That comment was for scottsisco's issue, not yours.

For yours, customization will not happen until the VM powers on and VMWare Tools loads, so if you aren't turning it on, then it will not check the box. That is a requirement by VMWare's Customization spec not us.

@arno1704
Copy link
Author

yes - i know that this was for scottisco
but its the same by me - and setting this checkbox happens before i turn on the machine - because if i clone a machine without customization the checkbox is checked - if i turn it on or not

@ansibot
Copy link
Contributor

ansibot commented Nov 24, 2018

cc @ckotte
click here for bot help

@ansibot
Copy link
Contributor

ansibot commented Dec 12, 2018

@aaronk1
Copy link
Contributor

aaronk1 commented Dec 14, 2018

@arno1704 I've had similar/same issue for a while now. Only workaround I've found is below (these are the manual steps, which of course I automated):

  • Import my OVA to vCenter
  • Power on the VM imported
  • Wait for the machine to reinstall the NIC and boot into Windows fully.
  • Wait for VMware Tools to get an IP (169.x).
  • Shut down the VM cleanly
  • Convert to template
  • Clone away using any method you want (I'm using vmware_guest)

The issue in my case I think is the VM has no NIC to finish customization with since template had NIC removed when it was inside another hypervisor. Hope that helps.

@sbonds
Copy link
Contributor

sbonds commented Jan 15, 2019

As others have said, this appears to be caused by a failure within the VMware tools Guest Customization. However there are two issues with this:

  1. Ansible thinks the deployment succeeded when it actually failed
  2. Finding out why the guest customization failed is really tricky

I can help with the second one by describing what I did to track down a very similar issue.

The process I used:

  1. Enable VMware tools debugging within the source template
  2. Deploy the template to a test VM
  3. Check the debug logs on the test VM

Enable VMware tools debugging

The general process is here: https://kb.vmware.com/s/article/1007873

I was debugging Linux guests, so I used this in /etc/vmware-toolbox/tools.conf (verbatim):

[logging]
log = true

vmtoolsd.level = debug
vmtoolsd.handler = file
vmtoolsd.data = /tmp/vmtoolsd.${USER}.log

vmsvc.level = debug
vmsvc.handler = file
vmsvc.data = /tmp/vmsvc.log
vmusr.level = debug
vmusr.handler = file
vmusr.data = /tmp/vmusr.${USER}.log

toolboxcmd.level = debug
toolboxcmd.handler = file
toolboxcmd.data = /tmp/vmtoolboxcmd.log

To get that in the template I converted it to a VM, booted it, changed the above file, shut it down, then converted it from a VM back to a template.

Deploy the template to a test VM

Easy! Just use your existing Ansible playbook for this.

Check the debug logs on the test VM

OK, there's a lot of data in /tmp now. The important bits for guest customization will be labelled deployPkg. There are lots of places with incorrect or outdated info about the location of the guest customization logs, so presumably VMware changes this a lot. On a working system where guest customization was triggered correctly I saw logs like these:

[Jan 15 09:35:30.303] [ debug] [vmsvc] RpcIn: sending 63 bytes
[Jan 15 09:35:30.360] [ debug] [vmsvc] RpcIn: received 55 bytes, content:"deployPkg.deploy /tmp/vmware-root/34693895/imcf-h0TA52\0a"
...
[Jan 15 09:35:30.362] [ debug] [vmsvc] Rpci: Sending request='deployPkg.update.state 4 0 /var/log/vmware-imc/toolsDeployPkg.log'

In my specific case, I saw the first message but not the second on my nonworking server. I also saw this message only on the nonworking server:

RpcChannel: Unknown Command 'f': Handler not registered

Google only showed me the source code as hits on this specific error. That's when you know it's not going to be a good day.

https://github.com/vmware/open-vm-tools/blob/master/open-vm-tools/lib/rpcChannel/rpcChannel.c

However, I did notice in my case that some of the follow-up logs on my working server related to HgfsChannelGuest_Receive messages and none appeared on my non-working server.

Probably because none of the HGFS VMware tools packages were installed.

So in my case, installing those packages resolved the guest customization issue. However, I hope that the above techniques and methods are useful for more possible cases of brokenness than just my specific HGFS issue.

And it would be REALLY nice if Ansible could tell when the guest customization has failed and report this as a failure.

@arruko
Copy link

arruko commented Jan 22, 2019

It might be a useful workaround install govmomi/toolbox Go package into VM Template.

For instance, I'm not getting any issue after adding these tasks during VM Template creation (before vmware_guest module). I'm testing some VMs on a development pool:

Someone has tried with this approach?

OS / ENVIRONMENT
vCenter 6.7
ESXi 6.7
Template-OS: Centos6/Ubuntu16.04/
VMWare Tools: 10304 (10.2.0)
Ansible: 2.7.5
Go: go1.8.1 linux/amd64

Playbook VM Templates

        - name: Go facts
          set_fact:
            goroot: /usr/local/go
            gopath: "{{ lookup('env', 'HOME') + '/go' | realpath }}"

        - name: install vmware tools, git
          package:
            name: "{{ item }}"
            state: latest
          with_items:
            - open-vm-tools
            - git

        - name: check govc install
          stat:
            path: "{{ gopath }}/bin/govc"
          register: govc_stat

        - name: go get govc
          command: "{{ goroot }}/bin/go get -u github.com/vmware/govmomi/govc"
          environment:
            GOPATH: "{{ gopath }}"
            GOROOT: "{{ goroot }}"
          when: not govc_stat.stat.exists

        - name: go install govc
          command: "{{ goroot }}/bin/go install github.com/vmware/govmomi/govc"
          environment:
            GOPATH: "{{ gopath }}"
            GOROOT: "{{ goroot }}"
          when: not govc_stat.stat.exists

        - name: go install govc/toolbox
          command: "{{ goroot }}/bin/go install github.com/vmware/govmomi/toolbox"
          environment:
            GOPATH: "{{ gopath }}"
            GOROOT: "{{ goroot }}"
          when: not govc_stat.stat.exists

        - name: remove git
          package:
            name: git
            state: absent

@ansibot
Copy link
Contributor

ansibot commented Feb 15, 2019

@ansibot
Copy link
Contributor

ansibot commented Feb 23, 2019

@L1ghtman2k
Copy link

L1ghtman2k commented Mar 26, 2019

What I have been noticing a lot, is that when I do too many cloning asynchronously, that is when my things break, and NICs don't connect. When I go 1 by 1, that really doesn't happen as much. And what I also noticed when I was going 1 by 1, is that the only machines that fail, are the ones that have "A newer version of VMware Tools is available for this virtual machine." as a warning. I have encountered this on multiple occasions, and what is wired is that this warning might appear on only some of the machines, even though all of the machines are clones.
I wasn't able to test this that much, as I am not aware how to turn off that warning, besides editing flags on a VM itself. But that flags solution is useless since I need the warning 'not to be there' before I clone the vm

@rjouhann
Copy link

Look, this might be your issue: https://kb.vmware.com/s/article/59444

@angystardust
Copy link
Contributor

Thanks for the link @rjouhann but I have the same issue without having NSX-T in my environment

@Tomorrow9
Copy link
Contributor

Hi @arno1704, network adapter not connected might caused by guest customization in OS failed at some step. Here are some suggestions:
(1) collect logs in Windows OS to let us check the failures during customization:

  • %WINDIR%\Temp\vmware-imc*.log
  • %WINDIR%\System32\Sysprep\Panther*
  • %WINDIR%\Panther*
  • %WINDIR%\Panther\Unattendgc*
  • %WINDIR%\Debug*

(2) there is a new parameter "wait_for_customization" added in Ansible 2.8 to check customization succeed event, you can have a try. Thanks.

@rhydian76
Copy link
Contributor

I had been trying to resolve the same issue myself, but I have found a solution:

Note that these are my experiences with using Ansible to clone a Windows 2016 Server guest.

As has been mentioned in this post, the issue is to do with VMware customisation and not Ansible itself. I found that the VMware image customisation was failing, and thanks to the post from @Tomorrow9 above, it gave me some useful pointers for troubleshooting.

The source template I was cloning from had already been sysprepped, and as a result, VMware customisation was failing when the machine was was spun up from this template - which I found from the logs in %WINDIR%\Temp\vmware-imc*.log. To resolve this, I converted the source template into a VM and booted it, let it run through the setup steps, then when it was done and back up, I shut it down and converted it back into a template so it was effectively a non-sysprepped image. I then re-ran my playbook to create a VM from this template and all was well, the server came up, VMware Image customisation ran (which runs sysprep on Windows), and once it was finished and rebooted I had a network-connected VM with the correct NIC configuration.

I have yet to deploy from a Linux template - but that is next, and obviously sysprep wont be part of this deployment.

Thanks for the previous posts above which helped me in finding the cause. I hope this helps some people out as it was massively frustrating me!

@ansibot
Copy link
Contributor

ansibot commented May 4, 2019

cc @goneri
click here for bot help

@aaronk1
Copy link
Contributor

aaronk1 commented May 4, 2019

Have seen this also. Have you guys tried the following? Seems to be more successful for me:

  • Remove NIC from Template that you're cloning
  • Remove NIC from Windows (ensure that any hidden devices are removed also so if you remove the NIC from the VM before removing from Windows you'll have to show hidden devices to remove from Windows)
  • Run vmware_guest task to add the NIC
  • Use E1000E (vmxnet3 doesn't seem to work reliably for me--I get "device adfadfadfadf not started" in the Kernel - PNP logs) NIC for Windows 2016

Still testing, but this is what I'm finding works reliably. I see MAC Address conflicts in vCenter if I don't remove the NIC from vCenter. I see device can not be started in Event Viewer Kernel-PNP log if I don't use E1000E type NIC.

@ansibot
Copy link
Contributor

ansibot commented Jun 5, 2019

@ansibot ansibot added collection Related to Ansible Collections work collection:community.vmware needs_collection_redirect https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md labels Apr 29, 2020
@ansibot
Copy link
Contributor

ansibot commented Aug 16, 2020

Thank you very much for your interest in Ansible. Ansible has migrated much of the content into separate repositories to allow for more rapid, independent development. We are closing this issue/PR because this content has been moved to one or more collection repositories.

For further information, please see:
https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md

@ansibot ansibot closed this as completed Aug 16, 2020
@ansible ansible locked and limited conversation to collaborators Sep 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.6 This issue/PR affects Ansible v2.6 bot_closed bug This issue/PR relates to a bug. cloud collection:community.vmware collection Related to Ansible Collections work module This issue/PR relates to a module. needs_collection_redirect https://github.com/ansible/ansibullbot/blob/master/docs/collection_migration.md support:community This issue/PR relates to code supported by the Ansible community. vmware VMware community
Projects
None yet
Development

No branches or pull requests