Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

win_domain - Unable to issue winrm commands following promotion to Domain Controller #39235

Closed
jtauke opened this issue Apr 24, 2018 · 24 comments · Fixed by #43703
Closed

win_domain - Unable to issue winrm commands following promotion to Domain Controller #39235

jtauke opened this issue Apr 24, 2018 · 24 comments · Fixed by #43703
Labels
affects_2.5 This issue/PR affects Ansible v2.5 bug This issue/PR relates to a bug. module This issue/PR relates to a module. support:core This issue/PR relates to code supported by the Ansible Engineering Team. windows Windows community

Comments

@jtauke
Copy link
Contributor

jtauke commented Apr 24, 2018

ISSUE TYPE
  • Bug Report
COMPONENT NAME

win_domain

ANSIBLE VERSION
ansible 2.5.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/ansible/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]
CONFIGURATION

Default configuration in place. Currently utilizing an updated version of the win_domain module that includes a netbios name option (this new version was merged to devel branch recently).

OS / ENVIRONMENT

CentOS 7, managing Windows Server 2016

SUMMARY
STEPS TO REPRODUCE

Shortened copy of the playbook from https://github.com/jborean93/ansible-windows

---
- name: get network adapter information for each host
  hosts: domain_controllers
  gather_facts: no
  tasks:
  - name: make absolutely sure the connection is active
    wait_for_connection:

  - name: get network connection name for private adapter
    win_shell: |
      foreach ($instance in (Get-CimInstance -ClassName Win32_NetworkAdapter -Filter "Netenabled='True'")) {
          $instance_config = Get-CimInstance -ClassName WIn32_NetworkAdapterConfiguration -Filter "Index = '$($instance.Index)'"
          if ($instance_config.IPAddress -contains "{{ansible_host}}") {
              $instance.NetConnectionID
          }
      }
    changed_when: false
    register: network_connection_name

  - name: fail if we didn't get a network connection name
    fail:
      msg: Failed to get the network connection name
    when: network_connection_name.stdout_lines|count != 1

  - name: make sure network connection is set to private
    win_shell: |
      $manager = [Activator]::CreateInstance([Type]::GetTypeFromCLSID('DCB00C01-570F-4A9B-8D69-199FDBA5723B'))
      $connections = $manager.GetNetworkConnections()
      $connections | ForEach-Object { $_.GetNetwork().SetCategory(1) }

- name: create Domain Controller
  hosts: domain_controllers
  gather_facts: no
  tasks:
  - name: Install Active Directory Role and Managment Tools
    win_feature:
      name: AD-Domain-Services
      state: present
      include_management_tools: True

  - name: set the DNS for the private adapter to localhost
    win_dns_client:
      adapter_names: '{{network_connection_name.stdout_lines[0]}}'
      ipv4_addresses: 127.0.0.1

  - name: ensure domain exists and DC is promoted as domain controllers
    win_domain2:
      dns_domain_name: '{{dns_domain_name}}'
      domain_netbios_name: '{{domain_netbios_name}}'
      safe_mode_password: '{{safe_mode_password}}'
    register: domain_result

  - name: Start netlogon service to ensure connectivity for reboot
    win_service:
      name: netlogon
      state: started

  - name: reboot DC if required
    win_reboot:
    when: domain_result.reboot_required
EXPECTED RESULTS

Create a new AD forest using the variables passed to it. Then reboot.

ACTUAL RESULTS

Server is promoted to a domain controller, and winrm connectivity is killed before the reboot command can run. Causing the playbook to fail. I've tracked it down to the netlogon service not being started following promotion to a domain controller. If that service is manually started from console, winrm functionality is restored. The existing module, win_domain squelches the automatic reboot triggered by Install-ADDSForest, a function that isn't recommended by Microsoft, because the server in question won't respond properly as a DC or a domain member until rebooted. I'm currently testing the module with the reboot allowed (to see how it reacts to being rebooted outside of win_reboot).

Anyone else have any other ideas?


@ansibot
Copy link
Contributor

ansibot commented Apr 24, 2018

Files identified in the description:

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibot
Copy link
Contributor

ansibot commented Apr 24, 2018

@ansibot ansibot added affects_2.5 This issue/PR affects Ansible v2.5 bug This issue/PR relates to a bug. module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. support:core This issue/PR relates to code supported by the Ansible Engineering Team. windows Windows community labels Apr 24, 2018
@jhawkesworth
Copy link
Contributor

Thanks for this I'd spotted exactly the same behaviour but hadn't found a fix. There is a todo in the module about using an action plugin to handle the reboot, (as win_reboot does), but presumably this would have to also switch user as following the reboot you would need to connect as a domain user.

I guess it would not be too hard to get the module code to start netlogon service, but whether the auth would remain in place long enough for the module to return would have to be tested.

In my case I'm going from a workgroup to a DC, but I guess if you are going from a domain member to a DC the behaviour might be different, so I'm wary of suggesting that starting netlogon would fix all scenarios.

@ansibot ansibot removed the needs_triage Needs a first human triage before being processed. label Apr 24, 2018
@jtauke
Copy link
Contributor Author

jtauke commented Apr 24, 2018 via email

@jhawkesworth
Copy link
Contributor

I was proposing a manual start in the win_domain.ps1 module code after it has completed the

Install-ADDSForest
I guess that would be after the domain created (but still before the reboot - so - I guess only way to find out would be to test)

@jtauke
Copy link
Contributor Author

jtauke commented Apr 25, 2018

That seems to do the trick. I just tested uncerimoniously slapping a "Start-Service -Name netlogon" after the last variable. Once that's done the playbook runs as expected.

If(-not $check_mode) {
    $sm_cred = ConvertTo-SecureString $safe_mode_admin_password -AsPlainText -Force

    $install_forest_args = @{
        DomainName=$dns_domain_name;
        SafeModeAdministratorPassword=$sm_cred;
        Confirm=$false;
        SkipPreChecks=$true;
        InstallDNS=$true;
        NoRebootOnCompletion=$true;
    }
    if ($database_path) {
        $install_forest_args.DatabasePath = $database_path
    }
    if ($sysvol_path) {
        $install_forest_args.SysvolPath = $sysvol_path
    }
    if ($domain_netbios_name) {
        $install_forest_args.DomainNetBiosName = $domain_netbios_name
    }

    $iaf = Install-ADDSForest @install_forest_args
    $result.reboot_required = $iaf.RebootRequired
    Start-Service -Name "netlogon"

}

Any thoughts on the best way to implement? Seems a little ham-fisted to just plop the command in there like I did.

@jhawkesworth
Copy link
Contributor

jhawkesworth commented Apr 26, 2018

@jtauke Nice, thanks for trying it out.

As for an actual implementation....

I'd be inclined to add a module parameter restart_netlogon and default it to true (which leaves the option of reproducing the existing behaviour in case there are times it needs to work like it does now).

Also perhaps I'd do a Get-Service on netlogon and only start it if it was not in running state.

I was also using S2016 so I'd want to feel confident this worked with older Windows versions before merging. In fact I might start with the testing as I think the module may date back to S2012R2 being the most recent windows version at the time. If it turns out it is specific to S2016 I'm not sure I'd just magically restart it, but just document that it has been needed on S2016 in the module docs.

Hope that helps?

Are you OK to create a PR?

If you are, and you get stuck at all feel free to ask on #ansible-windows on IRC (freenode).

@jtauke
Copy link
Contributor Author

jtauke commented Apr 28, 2018

I’ll see try to get a pull request in this weekend. Been swamped lately.

@stintel
Copy link
Contributor

stintel commented May 16, 2018

I've also encountered this problem. Unfortunately just starting the netlogon service is not enough.

Before running the win_domain module, I can login with ansible_user = administrator. After running the win_domain module (modified to start netlogon), I need to use ansible_user = hostname\administrator instead. After a reboot, it works again with just ansible_user = administrator. I guess another service needs to be restarted as well.

What about instead adding a reboot option that takes "no", "yes" and "if-required", that could be used to solve both these problems?

@stintel
Copy link
Contributor

stintel commented May 16, 2018

So I wiped the machine where I ran into this problem and retried, and am unable to reproduce this problem now...

@jtauke did you already start working on a PR? if not I could give it a shot

@jhawkesworth
Copy link
Contributor

@stintel yes, just restarting netlogon is not enough, it will need a reboot straight afterwards, for which you can use the existing win_reboot module.
The issue I was having was authentication was being lost while the win_domain module was running, meaning the module would not return control to ansible cleanly.
I'm not sure about adding a reboot option to the module, partly because win_reboot is more than just a powershell module, it is also an action plugin which runs on the controller, so its not trivial, but mostly because I prefer modules to have a single purpose.

That said, it would be good to document the reboot in the module documentation examples as its effectively mandatory, although keeping the two tasks as separate things retains flexibility if there are scenarios where it makes sense to do something else between running win_domain and rebooting.

@jtauke
Copy link
Contributor Author

jtauke commented May 16, 2018

Sorry, I've been remiss in getting a pull request put together. Here's the basics of what I've been testing with:

If specified true the module will check the current state of the netlogon service, then, if it's not running, start the service. This allows ansible to continue to the next item (most likely a reboot), successfully.

$restart_netlogon = Get-AnsibleParam $parsed_args "restart_netlogon" -type "bool" -default $true
$netlogon = $null
 if ($restart_netlogon) {
            try {
               $netlogon = Get-Service -Name Netlogon -ErrorAction SilentlyContinue
            }
            catch { }
            if ($netlogon.Status -ne "started") {
                $rnl = Start-Service -Name Netlogon
            }
        }

@dagwieers
Copy link
Contributor

@jborean93 What is your opinion on this ? I am not convinced this is something we want to fix in the module, but a proper fix needs Microsoft's attention... Is there an alternative ? Maybe async ?

@jtauke
Copy link
Contributor Author

jtauke commented May 16, 2018

I don’t really think this is a Microsoft issue. The proper default for new-addsforest is for a reboot to occur after promotion. Something this module actively stops from happening (for good reason, so it can be handled by win_reboot).

In the current state, I wouldn’t even consider this module to be functional, since it stops any Ansible commands from running after it does.

@jborean93
Copy link
Contributor

I’ve been using this module for a while and have never come across this issue so I need to try and replicate it myself. I literally have run through this more than 10 times this past week testing out changes to ansible-windows and I never failed once so not sure what is happening.

@MattMencel
Copy link

MattMencel commented May 17, 2018

Just jumping in to say I'm seeing this exact same behavior on new 2012R2 instances on Azure. This is how I'm doing it in the playbook I'm using...

- name: Ensure Active Directory domain is setup
  hosts: all
  tasks:
    - name: format data disk
      script: scripts/format-data-disk.ps1
      register: out
    - debug: var=out
    - name: Install AD Services Feature
      win_feature:
        name: AD-Domain-Services
        include_management_tools: yes
        include_sub_features: yes
        state: present
      register: result
    - name: Create New Forest
      win_domain:
        dns_domain_name: mydom.local
        safe_mode_password: mypass
        database_path: F:\NTDS
        sysvol_path: F:\SYSVOL
      register: result
    - name: Reboot after Domain Creation
      win_reboot:
        msg: "Server config in progress; rebooting..."
      when: result.reboot_required

Ansible disconnects during the win_domain task as mentioned previously by others.

@mmidler
Copy link

mmidler commented May 30, 2018

If you are running Ansible 2.5 or above, I've had luck using the win_scheduled_task to automatically start the netlogon service when Windows Server 2012 is promoted to a domain controller. The EventID='29223' is the event generated when the server is promoted to a DC. You can use Event Viewer on the server to generate the XML query string for the subscription with the custom filter which is what I did to have a known-good XML query string.

- name: Create Netlogon scheduled task
  win_scheduled_task: 
    name: Start Netlogon
    actions:
    - path: C\Windows\System32\sc.exe
      arguements: start netlogon
    triggers:
    - type: event
      subscription: "<QueryList><Query Id='0' Path='System'><Select Path='System'>*[System[(EventID='29223') and Security[@UserID='YOUR-ID-HERE']]]</Select></Query></QueryList>"
    username: YOUR-USER
    password: YOUR-PASS
    logon_type: password
    run_level: highest
    state: present

After the server is rebooted you can copy-paste this task and change the state: absent to remove the scheduled task from the server.

- name: Create Netlogon scheduled task
  win_scheduled_task: 
    name: Start Netlogon
    state: absent

@nitzmahone
Copy link
Member

This might be another case we should explore wrapping in an action to handle the reboots automatically in a way that's "ansible-friendly" (ala win_updates + reboot: yes in 2.5)... There are some wonky connection-related things in win_domain_controller as well that can really only be definitively solved with an action plugin... I'll put this on the "things we should look at for 2.7" list.

@mmidler
Copy link

mmidler commented Jun 28, 2018

@nitzmahone would you mind elaborating on the connection-related issues with win_domain_controller and what action plugin might solve it? I've tried to use the win_domain_controller module in one of my playbooks to configure 6 Windows Server 2012R2 hosts as domain controllers and 1-2 of the servers always seem to fail. (Not always the same servers fail).

@jseiser
Copy link

jseiser commented Jul 26, 2018

Running into this same issue.

(venv) ubuntu@ip-172-19-32-33:~/ansible$ ansible --version
ansible 2.6.1
  config file = /home/ubuntu/ansible/ansible.cfg
  configured module search path = [u'/home/ubuntu/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ubuntu/ansible/venv/local/lib/python2.7/site-packages/ansible
  executable location = /home/ubuntu/ansible/venv/bin/ansible
  python version = 2.7.12 (default, Dec  4 2017, 14:50:18) [GCC 5.4.0 20160609]
- name: Verify/Create Domain
  win_domain:
    dns_domain_name: "{{ dns_domain_name }}"
    safe_mode_password: "{{ ansible_password }}"
  register: is_domain

- name: Reboot if Requires
  win_reboot:
  when: is_domain.reboot_required

- name: Ensure Administrator is Set to Not Expire
  win_domain_user:
    name: Administrator
    password_never_expires: yes
  register: extend_domain_admin
  retries: 30
  delay: 15
  until: extend_domain_admin is successful

Nothing works until I forcefully reboot the instance.

Error I get:

TASK [internal/qa_env_dc : Reboot if Requires] **********************************************************************************************************************************************************************************************
fatal: [10.0.136.5]: FAILED! => {"changed": false, "msg": "credssp: HTTPSConnectionPool(host='10.0.136.5', port=5986): Read timed out. (read timeout=30)", "reboot": false}
        to retry, use: --limit @/home/ubuntu/ansible/qa_dc.retry

@jhawkesworth
Copy link
Contributor

@jseiser I believe win_reboot is not working in ansible 2.6.1 but should be fixed in 2.6.2, so worth trying again with latest devel or 2.6.2 when its released.

@jseiser
Copy link

jseiser commented Aug 1, 2018

@jhawkesworth

Same problem with 2.6.2

TASK [internal/qa_env_dc : Reboot if Required] **********************************************************************************************************************************************************************************************************************************************************************
fatal: [10.0.136.5]: FAILED! => {"changed": false, "msg": "credssp: HTTPSConnectionPool(host='10.0.136.5', port=5986): Read timed out. (read timeout=30)", "reboot": false}
- name: Verify/Create Domain
  win_domain:
    dns_domain_name: "{{ dns_domain_name }}"
    safe_mode_password: "{{ ansible_password }}"
  register: is_domain

- name: Reboot if Required
  win_reboot:
  when: is_domain.reboot_required

Its not just win_reboot, its anything attempting to run after win_domain fails.

Same problem exists if you attempt to promote using win_dsc as well.

- name: Install AD-Domain-Services with sub features and management tools
  win_feature:
    name: AD-Domain-Services
    state: present
    include_management_tools: yes
  register: win_feature_adds

- name: Reboot if Required
  win_reboot:
  when: win_feature_adds.reboot_required

- name: Add xActiveDirectory
  win_psmodule:
    name: xActiveDirectory
    state: present

- name: Configure xADDomain Powershell DSC
  win_dsc:
    resource_name: xADDomain
    DomainName: 'meta.local'
    DomainAdministratorCredential_username: '{{ ansible_user }}'
    DomainAdministratorCredential_password: '{{ ansible_password }}'
    SafemodeAdministratorPassword_username: '{{ ansible_user }}'
    SafemodeAdministratorPassword_password: '{{ ansible_password }}'
  register: is_domain
TASK [internal/qa_env_dc : Configure xADDomain Powershell DSC] ******************************************************************************************************************************************************************************************************************************************************
changed: [10.0.136.5]

TASK [internal/qa_env_dc : Reboot if Required] **********************************************************************************************************************************************************************************************************************************************************************
fatal: [10.0.136.5]: FAILED! => {"changed": false, "msg": "credssp: HTTPSConnectionPool(host='10.0.136.5', port=5986): Read timed out. (read timeout=30)", "reboot": false}

Are there any known work arounds for this? Im am stuck on this and havent been able to find a way around it. Issue is also present when attempting to promote via win_dsc xActiveDirectory

@jborean93
Copy link
Contributor

jborean93 commented Aug 6, 2018

I have figured out why I never came across this issue, because the Netlogon service is not running post promotion, any authentication protocols that rely on the Negotiate protocol (like NTLM, Kerberos, CredSSP) will fail. My test playbooks always ran under Basic auth which is handled a bit different than the other protocols. I need to implement a good way to get beyond this as part of the 2.7 action-ify work but for now, setting ansible_winrm_transport: basic should get you going. As an FYI, Basic auth does need to be explicitly enabled before hand withSet-Item -Path WSMan:\localhost\Service\Auth\Basic -Value $true

@jborean93
Copy link
Contributor

Here is a PR that should solve this issue #43703. I am still planning on creating an action plugin to incorporate an automatic reboot in the 1 task.

@ansible ansible locked and limited conversation to collaborators Jul 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.5 This issue/PR affects Ansible v2.5 bug This issue/PR relates to a bug. module This issue/PR relates to a module. support:core This issue/PR relates to code supported by the Ansible Engineering Team. windows Windows community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants