New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CustomScript] IDNS issue in expressroute VMs #142

Closed
ahmetb opened this Issue May 25, 2016 · 12 comments

Comments

Projects
None yet
4 participants
@ahmetb
Contributor

ahmetb commented May 25, 2016

I am simply adding custom script to run a date command however it stays in transitioning state indefinitely:

$ azure vm extension set ahmetb-dev CustomScriptForLinux Microsoft.OSTCExtensions '1.*' --public-config '{"commandToExecute":"date"}'

it kicks of enable:

2016/05/25 22:01:54 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Change log file to /var/log/azure/Microsoft.OSTCExtensions.CustomScriptForLinux/1.5.2.0/extension.log
2016/05/25 22:01:59 installCommand completed.
2016/05/25 22:01:59 SetHandlerState: Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0, Installed
2016/05/25 22:01:59 Installation completed for Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0
2016/05/25 22:01:59 Found RuntimeSettings for Microsoft.OSTCExtensions.CustomScriptForLinux V 1.5.2.0
2016/05/25 22:01:59 Spawned customscript.py -enable PID 63312
2016/05/25 22:01:59 CustomScript started to handle.
2016/05/25 22:01:59 [CustomScript-0.0]cwd is /var/lib/waagent/Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Change log file to /var/log/azure/Microsoft.OSTCExtensions.CustomScriptForLinux/1.5.2.0/extension.log
2016/05/25 22:01:59 CustomScript started to handle.
2016/05/25 22:01:59 [CustomScript-0.0]cwd is /var/lib/waagent/Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Change log file to /var/log/azure/Microsoft.OSTCExtensions.CustomScriptForLinux/1.5.2.0/extension.log
2016/05/25 22:02:04 enableCommand completed.
2016/05/25 22:02:04 SetHandlerState: Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0, Enabled
2016/05/25 22:02:04 Enable completed for Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0
2016/05/25 22:02:04 Processing completed for Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0

extension logs

sudo cat /var/log/azure/Microsoft.OSTCExtensions.CustomScriptForLinux/1.5.2.0/extension.log
2016/05/25 22:01:54 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]sequence number is 0
2016/05/25 22:01:54 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]setting file path is/var/lib/waagent/Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0/config/0.settings
2016/05/25 22:01:54 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]JSON config: {"runtimeSettings":[{"handlerSettings":{"publicSettings":{"commandToExecute":"date"}}}]}
2016/05/25 22:01:54 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Install,success,0,Install succeeded
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]sequence number is 0
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]setting file path is/var/lib/waagent/Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0/config/0.settings
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]JSON config: {"runtimeSettings":[{"handlerSettings":{"publicSettings":{"commandToExecute":"date"}}}]}
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]set most recent sequence number to 0
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Command to execute:date
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Enable,transitioning,0,Launching the script...
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]sequence number is 0
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]setting file path is/var/lib/waagent/Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0/config/0.settings
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]JSON config: {"runtimeSettings":[{"handlerSettings":{"publicSettings":{"commandToExecute":"date"}}}]}
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Will try to download files, number of retries = 10, wait SECONDS between retrievals = 20s
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Command to execute:date
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]fileUris value provided is empty or invalid. Continue with executing command...
2016/05/25 22:01:59 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Succeeded to download files, retry count = 0
2016/05/25 22:02:10 ERROR:CalledProcessError.  Error Code is 1
2016/05/25 22:02:10 ERROR:CalledProcessError.  Command string was host $(hostname)
2016/05/25 22:02:10 ERROR:CalledProcessError.  Command result was ;; connection timed out; no servers could be reached
2016/05/25 22:02:10 ERROR:[Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Internal DNS is not ready, retry to check.
2016/05/25 22:02:10 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Sleep 20 seconds
2016/05/25 22:02:41 ERROR:CalledProcessError.  Error Code is 1
2016/05/25 22:02:41 ERROR:CalledProcessError.  Command string was host $(hostname)
2016/05/25 22:02:41 ERROR:CalledProcessError.  Command result was ;; connection timed out; no servers could be reached
2016/05/25 22:02:41 ERROR:[Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Internal DNS is not ready, retry to check.
2016/05/25 22:02:41 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Sleep 20 seconds
2016/05/25 22:03:12 ERROR:CalledProcessError.  Error Code is 1
2016/05/25 22:03:12 ERROR:CalledProcessError.  Command string was host $(hostname)
2016/05/25 22:03:12 ERROR:CalledProcessError.  Command result was ;; connection timed out; no servers could be reached
2016/05/25 22:03:12 ERROR:[Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Internal DNS is not ready, retry to check.
2016/05/25 22:03:12 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Sleep 20 seconds
2016/05/25 22:03:44 ERROR:CalledProcessError.  Error Code is 1
2016/05/25 22:03:44 ERROR:CalledProcessError.  Command string was host $(hostname)
2016/05/25 22:03:44 ERROR:CalledProcessError.  Command result was ;; connection timed out; no servers could be reached
2016/05/25 22:03:44 ERROR:[Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Internal DNS is not ready, retry to check.
2016/05/25 22:03:44 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Sleep 20 seconds
2016/05/25 22:04:15 ERROR:CalledProcessError.  Error Code is 1
2016/05/25 22:04:15 ERROR:CalledProcessError.  Command string was host $(hostname)
2016/05/25 22:04:15 ERROR:CalledProcessError.  Command result was ;; connection timed out; no servers could be reached
2016/05/25 22:04:15 ERROR:[Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Internal DNS is not ready, retry to check.
2016/05/25 22:04:15 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Sleep 20 seconds

... and it goes on

.status file

[{"status": {"status": "transitioning", "formattedMessage": {"lang": "en-US", "message": "Launching the script..."}, "operation": "Enable", "code": "0", "name": "Microsoft.OSTCExtensions.CustomScriptForLinux"}, "version": "1.0", "timestampUTC": "2016-05-25T22:01:59Z"}]#
@boumenot

This comment has been minimized.

Show comment
Hide comment
@boumenot

boumenot May 25, 2016

Member

The code checks for DNS readiness, which doesn't appear to be true in this case, why?

Internal DNS is not ready, retry to check

https://github.com/Azure/azure-linux-extensions/blob/master/CustomScript/customscript.py#L286

Member

boumenot commented May 25, 2016

The code checks for DNS readiness, which doesn't appear to be true in this case, why?

Internal DNS is not ready, retry to check

https://github.com/Azure/azure-linux-extensions/blob/master/CustomScript/customscript.py#L286

@ahmetb

This comment has been minimized.

Show comment
Hide comment
@ahmetb

ahmetb May 25, 2016

Contributor

@boumenot this VM is in expressroute (our dev/test subscription to be precise). maybe it's related.

Contributor

ahmetb commented May 25, 2016

@boumenot this VM is in expressroute (our dev/test subscription to be precise). maybe it's related.

@ahmetb

This comment has been minimized.

Show comment
Hide comment
@ahmetb

ahmetb May 25, 2016

Contributor

@boumenot yes it appears to be an RDFE+ExpressRoute issue. When I create a normal VM on CRP it works just fine:

2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Will try to download files, number of retries = 10, wait SECONDS between retrievals = 20s
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Command to execute:date
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]fileUris value provided is empty or invalid. Continue with executing command...
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Succeeded to download files, retry count = 0
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Internal DNS is ready, retry count = 0
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Command to execute:date
2016/05/25 22:30:58 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Command is finished.
Contributor

ahmetb commented May 25, 2016

@boumenot yes it appears to be an RDFE+ExpressRoute issue. When I create a normal VM on CRP it works just fine:

2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Will try to download files, number of retries = 10, wait SECONDS between retrievals = 20s
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Command to execute:date
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]fileUris value provided is empty or invalid. Continue with executing command...
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Succeeded to download files, retry count = 0
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Internal DNS is ready, retry count = 0
2016/05/25 22:30:57 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Command to execute:date
2016/05/25 22:30:58 [Microsoft.OSTCExtensions.CustomScriptForLinux-1.0]Command is finished.

@ahmetb ahmetb changed the title from [CustomScript] commandToExecute without fileUris does not complete to [CustomScript] IDNS issue in expressroute VMs May 25, 2016

@boumenot

This comment has been minimized.

Show comment
Hide comment
@boumenot

boumenot May 25, 2016

Member

Glad you figured it out. The code should automatically handle this (if possible). I'm not sure if you can auto-detect an express route VM?!?

Member

boumenot commented May 25, 2016

Glad you figured it out. The code should automatically handle this (if possible). I'm not sure if you can auto-detect an express route VM?!?

@ahmetb

This comment has been minimized.

Show comment
Hide comment
@ahmetb

ahmetb May 25, 2016

Contributor

@boumenot I'm not sure.. I'm leaving this open anyways since this is a bug.

Contributor

ahmetb commented May 25, 2016

@boumenot I'm not sure.. I'm leaving this open anyways since this is a bug.

@pedrams1

This comment has been minimized.

Show comment
Hide comment
@pedrams1

pedrams1 Jul 25, 2016

Is there any update on this? This is a fairly significant defect which is impacting us.

Is there any update on this? This is a fairly significant defect which is impacting us.

@ahmetb

This comment has been minimized.

Show comment
Hide comment
@ahmetb

ahmetb Jul 25, 2016

Contributor

@pedrams1 Is your Virtual Machine also in an ExpressRoute virtual network?

Contributor

ahmetb commented Jul 25, 2016

@pedrams1 Is your Virtual Machine also in an ExpressRoute virtual network?

@pedrams1

This comment has been minimized.

Show comment
Hide comment
@pedrams1

pedrams1 Jul 25, 2016

@ahmetalpbalkan Yes it is. We're also modifying the default DNS servers in the VNet to point to our internal DNS servers not Azure's DNS servers (168.63.129.16).

That being said I think I've found an undocumented way of getting around this by changing the "enableInternalDNSCheck" setting to false:

{ "publisher": "Microsoft.OSTCExtensions", "type": "CustomScriptForLinux", "typeHandlerVersion": "1.5", "autoUpgradeMinorVersion": true, "settings": { "fileUris": [], "commandToExecute": "echo hello world", "enableInternalDNSCheck": false } }

Full disclaimer though, I've only tested this feature by modifying the /var/lib/waagent/Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0/config/0.setting" file and tricking the custom script extension to run again by deleting the "mrseq" file and running "customscript.py enable". I haven't tested passing that attribute all the way down from the ARM template.

pedrams1 commented Jul 25, 2016

@ahmetalpbalkan Yes it is. We're also modifying the default DNS servers in the VNet to point to our internal DNS servers not Azure's DNS servers (168.63.129.16).

That being said I think I've found an undocumented way of getting around this by changing the "enableInternalDNSCheck" setting to false:

{ "publisher": "Microsoft.OSTCExtensions", "type": "CustomScriptForLinux", "typeHandlerVersion": "1.5", "autoUpgradeMinorVersion": true, "settings": { "fileUris": [], "commandToExecute": "echo hello world", "enableInternalDNSCheck": false } }

Full disclaimer though, I've only tested this feature by modifying the /var/lib/waagent/Microsoft.OSTCExtensions.CustomScriptForLinux-1.5.2.0/config/0.setting" file and tricking the custom script extension to run again by deleting the "mrseq" file and running "customscript.py enable". I haven't tested passing that attribute all the way down from the ARM template.

@ahmetb

This comment has been minimized.

Show comment
Hide comment
@ahmetb

ahmetb Jul 25, 2016

Contributor

@pedrams1 point taken. Yes you can specify "enableInternalDNSCheck": false flag in your public settings section of the ARM template and disable the IDNS check.

I believe it was added because the internal DNS was not ready immediately when instances came up and user scripts which reference to other hostnames in the same VNET was not resolvable, that's why this check was added. However we realized it breaks for the users with custom DNS servers or in ExpressRoute.

Most probably we are going to make this check disabled by default and configurable through an optional settings key in the next versions (or remove it altogether once we confirm if the underlying DNS issue is fixed). We're actively investigating. Thanks for giving your input, please let us know if this does not solve your problem.

Contributor

ahmetb commented Jul 25, 2016

@pedrams1 point taken. Yes you can specify "enableInternalDNSCheck": false flag in your public settings section of the ARM template and disable the IDNS check.

I believe it was added because the internal DNS was not ready immediately when instances came up and user scripts which reference to other hostnames in the same VNET was not resolvable, that's why this check was added. However we realized it breaks for the users with custom DNS servers or in ExpressRoute.

Most probably we are going to make this check disabled by default and configurable through an optional settings key in the next versions (or remove it altogether once we confirm if the underlying DNS issue is fixed). We're actively investigating. Thanks for giving your input, please let us know if this does not solve your problem.

@pedrams1

This comment has been minimized.

Show comment
Hide comment
@pedrams1

pedrams1 Jul 25, 2016

@ahmetalpbalkan Thanks for getting back to me so promptly!

Yep this works for us :). I understand the logic, makes a lot of sense. One thing I would suggest though is documenting this feature in the README.md as resolution for this bug. Would save a good deal of time reverse engineering the code :)

pedrams1 commented Jul 25, 2016

@ahmetalpbalkan Thanks for getting back to me so promptly!

Yep this works for us :). I understand the logic, makes a lot of sense. One thing I would suggest though is documenting this feature in the README.md as resolution for this bug. Would save a good deal of time reverse engineering the code :)

@ahmetb

This comment has been minimized.

Show comment
Hide comment
@ahmetb

ahmetb Jul 25, 2016

Contributor

@pedrams1 no worries. We are actively rewriting the extension and this is certainly good input.

Contributor

ahmetb commented Jul 25, 2016

@pedrams1 no worries. We are actively rewriting the extension and this is certainly good input.

@ahmetb ahmetb closed this Aug 1, 2016

kriszentner added a commit to kriszentner/azure-quickstart-templates that referenced this issue Sep 22, 2016

Disabling enableInternalDNSCheck
This seems to be a bug with putting internal URLs in expressroute in fileUris. Filed at: Azure/azure-linux-extensions#142
@swani01

This comment has been minimized.

Show comment
Hide comment

swani01 commented Dec 14, 2017

Please add this setting to the documentation template on the website: https://docs.microsoft.com/en-us/azure/templates/microsoft.compute/virtualmachines/extensions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment