Skip to content

Flex Consumption Deployment Fails (100s Timeout at [Kudu-RemoveWorkersStep]) when VNet Integration and Private Endpoint are Enabled #2635

@nikithamkoshy

Description

@nikithamkoshy

Summary

When deploying a Flex Consumption Function App that is configured with both VNet Integration (outbound) and a Private Endpoint (inbound), the deployment consistently fails.

The code package uploads successfully, but the deployment hangs at the [Kudu-RemoveWorkersStep] and fails with a 100-second HttpClient.Timeout. This error is caused by the FunctionsSyncManager failing to sync the new triggers with the Azure management plane.

This happens even when the app's subnet is correctly configured with a NAT Gateway for internet access and a Microsoft.Storage service endpoint.

The only workaround is to temporarily delete the Function App's Private Endpoint, which confirms this is a platform-level networking bug related to this specific topology.

Steps to Reproduce

  • Create a Flex Consumption Function App.
  • Create a VNet with an apps subnet.
  • Configure VNet Integration (outbound) on the Function App, linking it to the apps subnet.
  • Configure an inbound Private Endpoint on the Function App, connecting it to a subnet in the same VNet.
  • Create a publicly accessible Storage Account for AzureWebJobsStorage.
  • On the apps subnet, configure the following:
    • Add a Microsoft.Storage Service Endpoint.
    • Attach a NAT Gateway (with a Public IP) to provide outbound internet access.
  • Attempt to deploy any function code using the Azure CLI (from a local file or SAS URL):
    az functionapp deployment source config-zip -g <rg> -n <app-name> --src "my-package.zip"

Expected Behavior

The deployment completes successfully. The new function code is deployed, and the new triggers (e.g., A, B, D) are correctly synced and visible in the Azure Portal.

Actual Behavior

The az cli command hangs for several minutes after showing Deployment endpoint responded with status code 202. It then fails with a "partially successful" message.

The deployment logs show the package upload completes, but the sync times out:

...

{"log_time": "2025-10-27T13:58:25.9867565Z", "id": "...", "message": "[Kudu-UploadPackageStep] completed. Uploaded package to storage successfully.", "type": 0},
{"log_time": "2025-10-27T13:58:26.1048234Z", "id": "...", "message": "[Kudu-RemoveWorkersStep] starting.", "type": 0},
{"log_time": "2025-10-27T14:03:30.1222084Z", "id": "...", "message": "Deployment was successful with Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.", "type": 1}

The app is left in a broken state:

  • The Azure Portal shows the old functions (A, B, C).
  • The admin API (/admin/functions) shows the new functions (A, B, D).
  • The synctriggers endpoint fails, and the app's internal logs show this exception:
System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
   at Microsoft.Azure.WebJobs.Script.WebHost.Management.FunctionsSyncManager.TrySyncTriggersAsync

Workaround

The only effective workaround is to temporarily delete the Private Endpoint on the Function App.

  1. Delete the Private Endpoint.
  2. Temporarily allow public access to the app (via Access Restrictions).
  3. Re-run the deployment.
  4. The deployment succeeds in seconds.
  5. Re-add the Private Endpoint and re-enable Access Restrictions.

This strongly indicates the Private Endpoint is causing a routing conflict that breaks the FunctionsSyncManager's outbound call to the public Azure management plane, even when a NAT Gateway is present.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions