You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Issue] "azd up" fails to deploy Azure AI template - "UserError" - "Deployment chat-deployment-xxxxx not found in endpoint mloe-xxxxx, workspace ai-project-xxxxx"
#4037
Closed
1 task done
nitya opened this issue
Jun 26, 2024
· 1 comment
· Fixed by #4043
Am currently using a GitHub Codespaces configured with a devcontainer that grabs the latest Azure Developer CLI using this command:
RUN curl -fsSL https://aka.ms/install-azd.sh | bash -s -- --version daily
Output from azd version
Run azd version and copy and paste the output here:
This is the output I get:
azd version 1.9.3 (commit e1624330dcc7dde440ecc1eda06aac40e68aa0a3)
Describe the bug
Issue: "azd up" completes provisioning but terminates prematurely with error during "deploy"
The application has run correctly in the past. However, in the current instance,
azd up completes provisioning step correctly
it also completes post-provisioning hooks execution
then fails with a "UserError" on the deploy step
The error appears to be timing related
azd deploy complains that a specific chat-deployment endpoint is not available
that deployment is in provisioning state at this point (and gets deployed successfully later)
meanwhile azd process terminates on error (so no post-deployment actions are run)
The problem caused:
When testing the deployed app we get a "Network Error"
By trial and error we determined this was because Traffic Allocation for deployment was 0%
Manually using "Update traffic" to set it to 100% - allowed test to pass on retry
The insight:
Premature azd termination of azd prevented traffic allocation setup being completed.
This agrees with azd documentation which indicates azd should wait for deployment to enter terminal provisioning state, then shift traffic to new deployment.
To Reproduce
(This bug was originally seen on Jun 20 - and was reproduced by community on Jun 24. I have re-run the flow on Jun 26 to capture the above screenshots and provide these steps to reproduce)
The bug relates to the azd deploy step on the "Azure-Samples/Contoso-Chat" AZD-enabled template . For capturing this issue report, I created this branch on my fork to have a reproducible commit for validation.
These are the steps to reproduce the bug:
Launch GitHub Codespaces on that fork/branch (commit)
"azd auth login" - and complete workflow. You should see: Logged in to Azure.
"azd up" - enter environment name, subscription, location: I used "Sweden Central"
post-provision hooks run successfully (populate data, connections)
deployment begins - then fails with error shown
🚨 | Error seen in the CLI (VSC on Codespaces) - error message shown as snippet for clarity. Note that the CLI exits as a result of this error, returning to cursor prompt in VS Code.
🚨 | At the same time the Azure AI deployments tab shows that the identified endpoint resource was created and the related chat-deployment resource was still in the process of being created (backend Azure) at the time the error message was seen (CLI, local development IDE)
🚨 | If we look at the deployment resource - it shows that the resource is still in the provisioning state at this time. And it has 0% traffic allocation at this point as expected.
🚨 | If we continue to wait for backend process to complete (takes about 10 mins) - you will see that the Provisioning state is now set to "Succeeded" but traffic allocation still remains at 0%
🚨 | If we now test the deployment with a valid input, we get a "Network error"
Scroll down for debug/workaround that validated the issue.
Expected behavior
Expected that "azd up" would successfully deploy the application and shift traffic to new deployment. This would be validated by testing the deployed endpoint with a relevant test input.
Environment
Information on your environment:
* Language name and version | Python version 3.11.9 (GitHub Codespaces devcontainer)
* Dev Container Dockerfile | mcr.microsoft.com/devcontainers/python:3.11-bullseye
* IDE and version : Visual Studio Code 1.90.2 running in browser (GitHub Codespaces)
Additional context
Also took these actions to support debug:
CLI says: "You can view detailed progress in the Auzre Portal" - with link specified
Opened Link: Shows "Deployment is in progress" with "Status OK" for requested roles.
Opened RG - Deployments: Monitor status and wait till all resources are created
Opened Azure AI - Hub - Projects - Deployments - Monitored page to correlate status to CLI
Debugged issue by assuming it was related to Traffic Allocation.
🚨 | Manually updated traffic allocation on the deployment to 100%
🚨 | Refreshed deployment to validate that traffic allocation was now updated to 100%
🚨 | Tried the test input again - this time it worked! (Validates that issue was because azd did not get to complete the traffic allocation update)
The text was updated successfully, but these errors were encountered:
nitya
changed the title
[Issue] "azd up" fails to deploy Azure AI template - "UserError" - "Deployment chat-deployment-xxxxx not found in endpoint mloe-xxxxx, workspace ai-projject-xxxxx"
[Issue] "azd up" fails to deploy Azure AI template - "UserError" - "Deployment chat-deployment-xxxxx not found in endpoint mloe-xxxxx, workspace ai-project-xxxxx"
Jun 26, 2024
azd is failing with 404 because, for some time, after creating a deployment for the ai-endpoint, the deployment is not propagated and is not found when trying to query its status.
Am currently using a GitHub Codespaces configured with a devcontainer that grabs the latest Azure Developer CLI using this command:
Output from
azd version
Run
azd version
and copy and paste the output here:This is the output I get:
Describe the bug
The application has run correctly in the past. However, in the current instance,
The error appears to be timing related
The problem caused:
The insight:
To Reproduce
(This bug was originally seen on Jun 20 - and was reproduced by community on Jun 24. I have re-run the flow on Jun 26 to capture the above screenshots and provide these steps to reproduce)
The bug relates to the
azd deploy
step on the "Azure-Samples/Contoso-Chat" AZD-enabled template . For capturing this issue report, I created this branch on my fork to have a reproducible commit for validation.These are the steps to reproduce the bug:
🚨 | Error seen in the CLI (VSC on Codespaces) - error message shown as snippet for clarity. Note that the CLI exits as a result of this error, returning to cursor prompt in VS Code.
🚨 | At the same time the Azure AI deployments tab shows that the identified endpoint resource was created and the related chat-deployment resource was still in the process of being created (backend Azure) at the time the error message was seen (CLI, local development IDE)
🚨 | If we look at the deployment resource - it shows that the resource is still in the provisioning state at this time. And it has 0% traffic allocation at this point as expected.
🚨 | If we continue to wait for backend process to complete (takes about 10 mins) - you will see that the Provisioning state is now set to "Succeeded" but traffic allocation still remains at 0%
🚨 | If we now test the deployment with a valid input, we get a "Network error"
Scroll down for debug/workaround that validated the issue.
Expected behavior
Expected that "azd up" would successfully deploy the application and shift traffic to new deployment. This would be validated by testing the deployed endpoint with a relevant test input.
Environment
Information on your environment:
* Language name and version | Python version 3.11.9 (GitHub Codespaces devcontainer)
* Dev Container Dockerfile | mcr.microsoft.com/devcontainers/python:3.11-bullseye
* IDE and version : Visual Studio Code 1.90.2 running in browser (GitHub Codespaces)
Additional context
Also took these actions to support debug:
Debugged issue by assuming it was related to Traffic Allocation.
🚨 | Manually updated traffic allocation on the deployment to 100%
🚨 | Refreshed deployment to validate that traffic allocation was now updated to 100%
🚨 | Tried the test input again - this time it worked! (Validates that issue was because azd did not get to complete the traffic allocation update)
Tagging @kristenwomack @wbreza for awareness
The text was updated successfully, but these errors were encountered: