Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xADDomain: Set-TargetResource Throws Terminating Error When DC Promotion in Progress #73

Closed
kwirkykat opened this issue Feb 27, 2016 · 30 comments
Assignees
Labels
bug The issue is a bug.

Comments

@kwirkykat
Copy link
Contributor

The first time DSC tries to install xADDomain, it runs Install-ADDSForrest (or Install-ADDSDomain but I haven't tried that one) appropriately.

If DSC comes back again to run this resource while the domain is still starting up (which happens in the Azure DSC extension), the Get method (which is run by the Test method) will catch the resulting ADServerDownException error and then return nothing.
This triggers the resource to call the Set method again which calls Install-ADDSForrest (or Install-ADDSDomain) again.

This second call to Install-ADDSForrest spits out an ugly error:
[ERROR] Verification of prerequisites for Domain Controller promotion failed. The specified argument 'DataBasePath' was not recognized

This causes the entire configuration to fail and nothing that depends on this resource can run.
But the call to Install-ADDSForrest has actually already happened, so the machine is eventually promoted to a domain controller successfully.

This issue is breaking a lot of ARM templates with AD in Windows Server 2016 since it seems to take longer for DC promotion to occur.

A quick work-around to this problem is to retry retrieving the domain in the Get method until it is found or the resource hits a timeout.

@iainbrighton
Copy link
Contributor

@kwirkykat I don't know how we fix this, but a timeout is not the right solution. In theory, the LCM should not be restarting/continuing with the configuration until after AD is installed/initialised.

I presume this is specific to the Azure DSC service/extension starting before the domain is actually up-and-running, invoking the LCM sooner than it would start naturally? Perhaps @HemantMahawar or @TravisEz13 can shed some light on how or what is different with the Azure extension?

@TravisEz13
Copy link
Contributor

In a reboot LCM will restart almost immediately after a reboot. Some way we need to know what to wait on. Depending on the machine it can take some time for the directory to actually become available on the machine. We need some resource to tell us how to wait.

@kilasuit
Copy link

I'm getting this in an on-prem machine using Server 2016 TP4 - In this instance LCM seems to not be kicking off immediately after the reboot or immediately erroring out with the Error as mentioned above.

However if left until the LCM does its consistency check it seems to just kick off again with no issue and the configuration completes as expected.

I believe that this may be an issue to Server 2016 TP4 and i'm spinning up a 2012 environment atm to see if this issue happens there as well.

@PlagueHO
Copy link
Member

I'm getting some long delays (5+ minutes) promoting Windows Server 2016 TP4 to a DC using xADDomain. It does eventually get there though.

I have not seen this error though:
[ERROR] Verification of prerequisites for Domain Controller promotion failed. The specified argument 'DataBasePath' was not recognized.

@kilasuit
Copy link

The issue doesn't occur on a 2012R2 VM with WMF5 so I think it is limited to Server 2016 TP4

@iainbrighton
Copy link
Contributor

@kwirkykat @kilasuit I've tested this today on both 2012 R2 and 2016 TP4. There is no issue with 2012 R2 with WMF4 and event log is clear.

2016 TP4 is another issue entirely. As @PlagueHO mentioned, it does seem to take a lot, lot longer to promote to a DC on TP4. This might be causing the issue? Here are the DesiredStateConfiguration errors logged in the Event Viewer:

Job {5B5B11AB-E778-11E5-8DC8-00155D32768E} : 
This event indicates that a non-terminating error was thrown when DSCEngine was executing Set-TargetResource on MSFT_xADDomain DSC resource. FullyQualifiedErrorId is Test.VerifyDcPromoCore.DCPromo.General.77,Microsoft.DirectoryServices.Deployment.PowerShell.Commands.InstallADDSForestCommand. Error Message is Verification of prerequisites for Domain Controller promotion failed. The specified argument 'NewDomain' was not recognized.
Job {5B5B11AB-E778-11E5-8DC8-00155D32768E} : 
MIResult: 1
Error Message: Verification of prerequisites for Domain Controller promotion failed. The specified argument 'NewDomain' was not recognized.

Message ID: Test.VerifyDcPromoCore.DCPromo.General.77,Microsoft.DirectoryServices.Deployment.PowerShell.Commands.InstallADDSForestCommand
Error Category: 0
Error Code: 1
Error Type: MI
Job {5B5B11AB-E778-11E5-8DC8-00155D32768E} : 
This event indicates that failure happens when LCM is processing the configuration. Error Id is 0x1. Error Detail is The SendConfigurationApply function did not succeed.. Resource Id is [xADDomain]ADDomain and Source Info is C:\Users\Administrator\Documents\TestDomain.ps1::34::9::xADDomain. Error Message is The PowerShell DSC resource '[xADDomain]ADDomain' with SourceInfo 'C:\Users\Administrator\Documents\TestDomain.ps1::34::9::xADDomain' threw one or more non-terminating errors while running the Set-TargetResource functionality. These errors are logged to the ETW channel called Microsoft-Windows-DSC/Operational. Refer to this channel for more details..
Job {5B5B11AB-E778-11E5-8DC8-00155D32768E} : 
MIResult: 1
Error Message: The PowerShell DSC resource '[xADDomain]ADDomain' with SourceInfo 'C:\Users\Administrator\Documents\TestDomain.ps1::34::9::xADDomain' threw one or more non-terminating errors while running the Set-TargetResource functionality. These errors are logged to the ETW channel called Microsoft-Windows-DSC/Operational. Refer to this channel for more details.
Message ID: NonTerminatingErrorFromProvider
Error Category: 7
Error Code: 1
Error Type: MI
Job {5B5B11AB-E778-11E5-8DC8-00155D32768E} : 
Job runs under the following LCM setting. 
ConfigurationMode: ApplyAndMonitor 
ConfigurationModeFrequencyMins: 15 
RefreshMode: PUSH 
RefreshFrequencyMins: 30 
RebootNodeIfNeeded: ForceModuleImport 
DebugMode: True
Job {5B5B11AB-E778-11E5-8DC8-00155D32768E} : 
MIResult: 1
Error Message: The SendConfigurationApply function did not succeed.
Message ID: MI RESULT 1
Error Category: 0
Error Code: 1
Error Type: MI
Job DscTimerConsistencyOperationResult : 
DSC Engine Error : 
     Error Message: NULL 
    Error Code : 1 

You run the existing configuration (or let the consistency check naturally occur) and it then passes. I haven't a clue how to query whether a DC promotion is in progress. Perhaps this should be raised with the server team @TravisEz13 ?

@TravisEz13
Copy link
Contributor

@iainbrighton I don't think you have to query. Just create a list of errors which you retry for and the rest you abort. I don't think we will find a reliable way to find if the DC promotion is in progress... only if it is complete.

@HemantMahawar
Copy link

@iainbrighton @kwirkykat @PlagueHO @TravisEz13 I have seen this issue on W2K16 TP4 bits as well. I had started making some change to the resource, but never finished the work. My thought process was as follows:

  • During the Set-TargetResource, after the DSC promo command was succeeded and before MachineReboot is set, create a file on the disk (name TBD)
  • In the Get-TargetResource (which is called by Test-TargetResource), check if the above mentioned file is present. This ndicates that Get-TargetResource is running after DC promo was initiated
  • If the above mentioned file is present, loop for a long number (or infinitely) with some sleep and see if domain can be reached
  • Once domain can be reached, remove the file created by Set-TargetResource

Does it sounds like a good approach for this specific problem?

@kwirkykat
Copy link
Contributor Author

@HemantMahawar That sounds good to me. @iainbrighton @PlagueHO @TravisEz13 Thoughts?

@PlagueHO
Copy link
Member

@kwirkykat, @HemantMahawar - sounds good, although looping infinitely might not be a good idea - I prefer a "long number" to be safe.

@slapointe
Copy link
Contributor

slapointe commented Jun 16, 2016

We also encounter this issue extensively running 2012R2, WMF 5 & DSC Extension on Azure.

On the second reboot, it find the domain and I get:
VERBOSE: [2016-06-15T19:30:41] [VERBOSE] [SLAPPDEVDC0]: LCM: [ Start Resource ]
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc]
VERBOSE: [2016-06-15T19:30:41] [VERBOSE] [SLAPPDEVDC0]: LCM: [ Start Test ]
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc]
VERBOSE: [2016-06-15T19:30:43] [VERBOSE] [SLAPPDEVDC0]:
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc] Active Directory domain 'slappdev.cloud' found.
VERBOSE: [2016-06-15T19:30:43] [VERBOSE] [SLAPPDEVDC0]: LCM: [ End Test ]
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc] in 2.2660 seconds.
VERBOSE: [2016-06-15T19:30:43] [VERBOSE] [SLAPPDEVDC0]: LCM: [ Skip Set ]
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc]
VERBOSE: [2016-06-15T19:30:43] [VERBOSE] [SLAPPDEVDC0]: LCM: [ End Resource ]

on the third reboot of my sequence It usually don't find the domain:
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc]
VERBOSE: [2016-06-15T19:33:00] [VERBOSE] [SLAPPDEVDC0]: LCM: [ Start Test ]
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc]
VERBOSE: [2016-06-15T19:33:01] [VERBOSE] [SLAPPDEVDC0]:
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc] Domain '' is NOT present on the current node.
VERBOSE: [2016-06-15T19:33:01] [VERBOSE] [SLAPPDEVDC0]: LCM: [ End Test ]
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc] in 1.0310 seconds.
VERBOSE: [2016-06-15T19:33:01] [VERBOSE] [SLAPPDEVDC0]: LCM: [ Start Set ]
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc]
VERBOSE: [2016-06-15T19:33:01] [VERBOSE] [SLAPPDEVDC0]:
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc] Creating AD forest 'slappdev.cloud' ...
VERBOSE: [2016-06-15T19:33:01] [ERROR] Verification of prerequisites for Domain Controller promotion failed. The specified argument 'DataBasePath' was not recognized.

VERBOSE: [2016-06-15T19:33:01] [VERBOSE] [SLAPPDEVDC0]:
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc] AD forest '{0}' created.
VERBOSE: [2016-06-15T19:33:01] [VERBOSE] [SLAPPDEVDC0]: LCM: [ End Set ]
[[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc] in 0.6100 seconds.
VERBOSE: [2016-06-15T19:33:01] [ERROR] The PowerShell DSC resource '[cADDomain]FirstDS::[cPrimaryDomainController]primaryDc'
with SourceInfo 'C:\Program
Files\WindowsPowerShell\Modules\cOrckestra\DSCResources\cPrimaryDomainController\cPrimaryDomainController.schema.psm1::47::5::c
ADDomain' threw one or more non-terminating errors while running the Set-TargetResource functionality. These errors are logged
to the ETW channel called Microsoft-Windows-DSC/Operational. Refer to this channel for more details.
VERBOSE: [2016-06-15T19:33:01] [VERBOSE] [SLAPPDEVDC0]: LCM: [ End Set ]
VERBOSE: [2016-06-15T19:33:01] [ERROR] The SendConfigurationApply function did not succeed.
VERBOSE: [2016-06-15T19:33:01] [VERBOSE] Operation 'Invoke CimMethod' complete.
VERBOSE: [2016-06-15T19:33:01] [VERBOSE] Time taken for configuration job to complete is 21.792 seconds

I know you see cADDomain as the resource but it is xADDomain with one more parameter that we added for DNS.

@ghost
Copy link

ghost commented Aug 9, 2016

Wondering whether this is being addressed? Testing with TP5 repros for me 100%.

@TravisEz13
Copy link
Contributor

@markreno, what version of the resource where you using?

@ghost
Copy link

ghost commented Aug 14, 2016

@TravisEz13 Version is 2.12.0.0

@kwirkykat kwirkykat added bug The issue is a bug. help wanted The issue is up for grabs for anyone in the community. labels Aug 18, 2016
@TravisEz13
Copy link
Contributor

@markreno This currently does not look like it has been addressed since 2.12

@kilasuit
Copy link

@markreno - 2016 TP5 is still Preview software though and I expect that the issue we see on there will be fixed at RTM unless it is a change in how the cmdlet that does the underlying domain install is completed (which could easily be the case)

@slapointe - Regarding the Azure DSC Extension it could as easily be an issue with that as I have no issues at all with this on an on-premises 2012R2 Hyper-V Vm with WMF4, WMF5 or the WMF5.1 preview

I think the Issue should be renamed to highlight its 2016 & Azure DSC where the issue is because this doesn't happen with on 2012R2 machines elsewhere

@iainbrighton
Copy link
Contributor

iainbrighton commented Aug 19, 2016 via email

@kilasuit
Copy link

Not sure what PR your alluding to @iainbrighton as the link came out as for this issue

@iainbrighton
Copy link
Contributor

iainbrighton commented Aug 19, 2016 via email

@kwirkykat kwirkykat added in progress The issue is being actively worked on by someone. and removed help wanted The issue is up for grabs for anyone in the community. labels Aug 19, 2016
@StefanSchoof
Copy link

If you set not a DataBasePath the error message could: Verification of prerequisites for Domain Controller promotion failed. The specified argument 'NewDomain' was not recognized. (Just adding this, so that google finds this issue if people search the error)

@ghost
Copy link

ghost commented Oct 13, 2016

Just tried on 2016 RTM - same failure.

@oradcliffe
Copy link

oradcliffe commented Nov 7, 2016

Yep, I am getting the same failure as well with 2016, redeploying with 2012R2 datacenter right now to see if that makes a difference.

Edit - it did make a difference in that now the error I am getting is that MSFT_xWaitForADDomain isn't finding the domain after x retries, but that domain is there after remoting in and checking, so I will update the timers there for awhile longer and see if that makes a difference.

@iainbrighton
Copy link
Contributor

@oradcliffe Just a word of warning - the DomainAdministrator property is only used to attempt to query for the domain - it is not the domain's 'Administrator' password. _The existing local administrator password is used for the domain 'Administrator' password._ Therefore, you'll need to use the whatever the local 'Administrator' password was for the xWaitForADDomain domain admin credential.

@oradcliffe
Copy link

oradcliffe commented Nov 7, 2016

Yep, and for simplicity's sake I am passing the same creds I use to set up the machine as those creds in xWaitForADDomain so they should be good there. I did update the timers and wait and I am still getting errors on the wait.

@invisibleaxm
Copy link

Hi there, I am getting the same error on my end. Do you all know of a work-around that we can use in the mean time?

@StefanSchoof
Copy link

You can use the Module from the PR #101 on https://github.com/slapointe/xActiveDirectory/tree/Issue73. I use this for some time now and it works for me.

@invisibleaxm
Copy link

@StefanSchoof thanks a lot, I did try it and it is working for me as well, so two thumbs up :)

@ghost
Copy link

ghost commented Nov 21, 2016

Yeah agreed. PR 101 seems to work for me also. 2.13.0.0 fails.

bill-dall pushed a commit to HPInc/Anyware-Idle-Shutdown that referenced this issue Jan 7, 2017
…erenced in this bug report: dsccommunity/ActiveDirectoryDsc#73

At the time of writing the MSFT master version does not seem to work with Server 2016 domain controllers but the branched version (here) does. So this version is the one that must be added to the DSC configuration .zip file.
@bill-dall
Copy link

I have the same experience - I was hard-down until I switched to the PR 101 code. I would also up-vote that this issue get resolved somehow.

@bill-dall
Copy link

@kwirkykat, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug.
Projects
None yet
Development

No branches or pull requests