diff --git a/CHANGELOG.md b/CHANGELOG.md index 2c39949..27ec652 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,20 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres ## [Unreleased] +### Changed +- **Skill + tutorial guidance now require `Cognitive Services OpenAI User` as a prerequisite RBAC role.** + The `agentops-workflow` skill, `tutorial-prompt-agent-quickstart.md`, + `tutorial-end-to-end.md`, and `docs/ci-github-actions.md` now instruct users + to grant the OIDC/CI service principal **both** Foundry User on the Foundry + project **and** Cognitive Services OpenAI User on the underlying Azure AI + Services account that hosts the evaluator model deployment. Foundry + `azure_ai_evaluator` graders impersonate the OIDC principal to call OpenAI; + without the OpenAI User role they fail with a 401 `PermissionDenied` and + every cloud eval metric returns `null`, blocking the first PR run. The skill + now emits the matching `az role assignment create` commands for both roles + (role ids `53ca6127-db72-4b80-b1b0-d745d6d5456d` and + `5e0bd9bd-7b93-4f28-af87-19fc36ad61bd`) before dispatching the workflow. + ### Fixed - **Cloud eval surfaces grader execution errors instead of silent nulls.** When a Foundry `azure_ai_evaluator` grader fails to execute (most diff --git a/docs/ci-github-actions.md b/docs/ci-github-actions.md index 4f6b263..0274ad5 100644 --- a/docs/ci-github-actions.md +++ b/docs/ci-github-actions.md @@ -125,9 +125,23 @@ from GitHub Actions runs. See [Microsoft's WIF docs](https://learn.microsoft.com/azure/active-directory/workload-identities/workload-identity-federation-create-trust?pivots=identity-wif-apps-methods-azp). For Foundry prompt-agent gates, the same app registration / service principal -also needs **Foundry User** on the Foundry project or Foundry resource. Azure -`Reader` is not enough because the eval step calls Foundry data-plane APIs such -as `Microsoft.CognitiveServices/accounts/AIServices/agents/read`. +needs **two** Azure RBAC roles before the first workflow run. Both are required +and the eval step fails silently (every metric returns `null`) if only one is +in place: + +- **Foundry User** on the Foundry project or Foundry resource. Azure `Reader` + is not enough because the eval step calls Foundry data-plane APIs such as + `Microsoft.CognitiveServices/accounts/AIServices/agents/read`. +- **Cognitive Services OpenAI User** on the underlying Azure AI Services + account that hosts the evaluator model deployment. Foundry `azure_ai_evaluator` + graders impersonate the OIDC principal to call OpenAI; without this role + they fail with a 401 `PermissionDenied` on + `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action` + and every metric returns `null` in the cloud eval report. AgentOps lifts that + error into `results.json` and the orchestrator's "0 usable metric scores" + warning so you can see the cause in CI logs, but the workflow still fails the + gate. The role ids are `53ca6127-db72-4b80-b1b0-d745d6d5456d` (Foundry User) + and `5e0bd9bd-7b93-4f28-af87-19fc36ad61bd` (Cognitive Services OpenAI User). The generated eval and doctor workflows install AgentOps telemetry support. When `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT` is set, AgentOps first tries to diff --git a/docs/tutorial-end-to-end.md b/docs/tutorial-end-to-end.md index 951affb..64fc9e6 100644 --- a/docs/tutorial-end-to-end.md +++ b/docs/tutorial-end-to-end.md @@ -428,8 +428,11 @@ this Foundry prompt-agent repo. Create or connect the GitHub repo if needed, create the `dev` environment, wire Azure OIDC, set AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini as a GitHub `dev` environment variable or equivalent Azure DevOps pipeline variable, verify the -OIDC principal has Foundry User access, and show me the plan before changing -GitHub or Azure. +OIDC principal has **both** Foundry User access on the dev Foundry project +**and** Cognitive Services OpenAI User access on the underlying Azure AI +Services account that hosts the evaluator model (both are required — without +the OpenAI User role, every cloud eval metric returns null), and show me the +plan before changing GitHub or Azure. ``` That value is not an `agentops init` answer. It tells the Foundry cloud eval @@ -578,10 +581,13 @@ workflows running for this Foundry agent repo. Extend the PR/dev setup if it already exists, wire Azure OIDC for the `qa` and `production` environments, confirm required Actions variables such as -AZURE_OPENAI_DEPLOYMENT, verify the OIDC principals have Foundry User access, -and keep deploy placeholders unless this repo already has an azd deployment -path. Show me the plan before changing GitHub or Azure, and call out anything -that needs owner/admin permission. +AZURE_OPENAI_DEPLOYMENT, verify the OIDC principals have **both** Foundry User +access on each Foundry project **and** Cognitive Services OpenAI User on the +underlying AI Services account hosting the evaluator model (both are required +— without the OpenAI User role, every cloud eval metric returns null), and +keep deploy placeholders unless this repo already has an azd deployment path. +Show me the plan before changing GitHub or Azure, and call out anything that +needs owner/admin permission. ``` Use this moment in the video to connect the four repos: Foundry Toolkit creates diff --git a/docs/tutorial-prompt-agent-quickstart.md b/docs/tutorial-prompt-agent-quickstart.md index b911f22..6aea54a 100644 --- a/docs/tutorial-prompt-agent-quickstart.md +++ b/docs/tutorial-prompt-agent-quickstart.md @@ -718,9 +718,12 @@ This may be a brand-new folder with no Git repo or GitHub remote yet. Keep the scope to the PR gate and dev deploy only: create or connect the GitHub repo if needed, wire Azure OIDC and required Actions variables/secrets, create only the `dev` environment, verify the OIDC -principal has Foundry User access on the **dev** Foundry project, and -do not set up `qa`, `production`, scheduled Doctor, or hosted -deployment workflows yet. +principal has **both** Foundry User access on the **dev** Foundry project +**and** Cognitive Services OpenAI User on the underlying Azure AI Services +account that hosts the evaluator model (both roles are required — without +the OpenAI User role, the Foundry cloud graders fail with a 401 and every +metric comes back null), and do not set up `qa`, `production`, scheduled +Doctor, or hosted deployment workflows yet. The dev Foundry project endpoint is in `.azure/dev/.env`; the sandbox endpoint is local-only and must not be added to CI. @@ -738,9 +741,19 @@ it skips: - Set Actions variables `AZURE_TENANT_ID`, `AZURE_SUBSCRIPTION_ID`, `AZURE_CLIENT_ID`, `AZURE_AI_FOUNDRY_PROJECT_ENDPOINT` (the dev endpoint), and `APPLICATIONINSIGHTS_CONNECTION_STRING` if available. -- Verify the OIDC principal has **Foundry User** access on the dev - Foundry project. Reader alone is not enough for the data-plane calls - the prompt-agent staging and eval steps make. +- Verify the OIDC principal has **two** Azure RBAC roles before the first + run. Both are required and the eval step fails silently (every metric + returns `null`) if only one is in place: + - **Foundry User** on the dev Foundry project — Reader alone is not + enough for the data-plane calls the prompt-agent staging and eval steps + make. + - **Cognitive Services OpenAI User** on the underlying Azure AI Services + account that hosts the evaluator model deployment. Foundry + `azure_ai_evaluator` graders impersonate the OIDC principal to call + OpenAI; without this role they fail with a 401 `PermissionDenied`. The + AgentOps cloud-results parser lifts that error into `results.json` so + you can see the cause in the artifact, but the workflow still fails + the gate. ## 13. First green PR → merge → dev deploy diff --git a/plugins/agentops/skills/agentops-workflow/SKILL.md b/plugins/agentops/skills/agentops-workflow/SKILL.md index 9528ee6..e53fe76 100644 --- a/plugins/agentops/skills/agentops-workflow/SKILL.md +++ b/plugins/agentops/skills/agentops-workflow/SKILL.md @@ -100,22 +100,40 @@ by discovering the whole Azure subscription. `repo:/:environment:dev`. Do not assume branch or `pull_request` subjects without reading the workflow. 9. Before triggering a Foundry prompt-agent workflow, make sure the OIDC app / - service principal has Foundry data-plane access. It needs **Foundry User** - (role id `53ca6127-db72-4b80-b1b0-d745d6d5456d`, formerly Azure AI User) at - the Foundry project scope, or at the Foundry resource scope if that is the - team's standard. Azure **Reader** is not enough; without this role the eval - step fails on - `Microsoft.CognitiveServices/accounts/AIServices/agents/read`. -10. If the Foundry RBAC assignment is missing, do not run the workflow yet. - Show the exact GitHub OIDC client ID / service principal, desired role, and - target Foundry scope, then ask the user to approve the role assignment or + service principal has **two** RBAC assignments. Both are required; the eval + step fails silently (every metric returns `null`) if only one is in place. + 1. **Foundry User** on the Foundry project (or the Foundry resource scope + if that is the team's standard). Role id + `53ca6127-db72-4b80-b1b0-d745d6d5456d` (formerly Azure AI User). Without + this the candidate-staging step fails on + `Microsoft.CognitiveServices/accounts/AIServices/agents/read`. + 2. **Cognitive Services OpenAI User** on the underlying Azure AI Services + account that hosts the evaluator model deployment + (typically the parent account of the Foundry project). Role id + `5e0bd9bd-7b93-4f28-af87-19fc36ad61bd`. Without this the Foundry + `azure_ai_evaluator` graders fail with a 401 `PermissionDenied` on + `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action` + and every metric comes back `null` in the cloud eval report. AgentOps now + lifts that error into `results.json` and the orchestrator's "0 usable + metric scores" warning so the cause is visible in CI logs, but the + workflow still fails the gate. Grant this role **before** the first run. + Azure **Reader** is not enough for either step. +10. If either RBAC assignment is missing, do not run the workflow yet. + Show the exact GitHub OIDC client ID / service principal, desired role, + target scope (project for Foundry User, AI Services account for Cognitive + Services OpenAI User), then ask the user to approve the role assignment or get an Azure/Foundry admin to grant it. After assignment, read it back or ask the user to confirm before dispatching the workflow. - When the user approves and you know the Foundry scope, use the role id to - avoid rename drift: + When the user approves and you know the scopes, use the role ids to avoid + rename drift: - `az ad sp show --id --query id -o tsv` - `az role assignment list --assignee --scope --include-inherited` - `az role assignment create --assignee-object-id --assignee-principal-type ServicePrincipal --role 53ca6127-db72-4b80-b1b0-d745d6d5456d --scope ` + - `az role assignment create --assignee-object-id --assignee-principal-type ServicePrincipal --role 5e0bd9bd-7b93-4f28-af87-19fc36ad61bd --scope ` + The AI Services account scope looks like + `/subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/` + and can be derived from + `az cognitiveservices account list --resource-group --query "[?kind=='AIServices'].id" -o tsv`. 11. Ask before creating or updating GitHub repos, GitHub environments, variables/secrets, Entra app registrations/service principals, federated credentials, managed identities, or Azure RBAC assignments. @@ -304,11 +322,21 @@ Then configure Workload Identity Federation on the Azure side environment** the workflows will run from. See `docs/ci-github-actions.md` for the exact `az` commands. -Also grant the same app registration / service principal **Foundry User** on the -Foundry project or Foundry resource before the first workflow run. The PR gate -uses Foundry data-plane APIs to read prompt agents; Azure `Reader` only proves -ARM access and will still fail the eval step with -`Microsoft.CognitiveServices/accounts/AIServices/agents/read`. +Also grant the same app registration / service principal **two** Azure +RBAC roles before the first workflow run; both are required and the eval +step fails silently (every metric returns `null`) if only one is in place: + +1. **Foundry User** on the Foundry project or Foundry resource. The PR gate + uses Foundry data-plane APIs to read prompt agents; Azure `Reader` only + proves ARM access and will still fail the eval step with + `Microsoft.CognitiveServices/accounts/AIServices/agents/read`. +2. **Cognitive Services OpenAI User** on the underlying Azure AI Services + account that hosts the evaluator model deployment. Without this, Foundry + `azure_ai_evaluator` graders fail with a 401 `PermissionDenied` on the + OpenAI `chat/completions/action` data action and every metric returns + `null` in the cloud eval report. AgentOps surfaces that error in + `results.json` and the orchestrator's "0 usable metric scores" warning, + but the workflow still fails the gate — fix the role before the run. Tell the user that CI evals emit `agentops.eval.*` telemetry and scheduled Doctor runs emit `agentops.agent.finding.*` telemetry when App Insights is @@ -319,7 +347,11 @@ Monitor deep links. Already done in Step 2 - the `agentops-azure` service connection handles auth. Make sure the underlying service principal or managed -identity has the **Foundry User** role on the Foundry project or resource. +identity has **both** the **Foundry User** role on the Foundry project (or +Foundry resource) **and** the **Cognitive Services OpenAI User** role on the +underlying Azure AI Services account that hosts the evaluator model. Both +are required; without the OpenAI User role the Foundry graders fail with a +401 `PermissionDenied` and every cloud eval metric returns `null`. ## Step 4 - Use azd for deployment diff --git a/src/agentops/templates/skills/agentops-workflow/SKILL.md b/src/agentops/templates/skills/agentops-workflow/SKILL.md index 9528ee6..e53fe76 100644 --- a/src/agentops/templates/skills/agentops-workflow/SKILL.md +++ b/src/agentops/templates/skills/agentops-workflow/SKILL.md @@ -100,22 +100,40 @@ by discovering the whole Azure subscription. `repo:/:environment:dev`. Do not assume branch or `pull_request` subjects without reading the workflow. 9. Before triggering a Foundry prompt-agent workflow, make sure the OIDC app / - service principal has Foundry data-plane access. It needs **Foundry User** - (role id `53ca6127-db72-4b80-b1b0-d745d6d5456d`, formerly Azure AI User) at - the Foundry project scope, or at the Foundry resource scope if that is the - team's standard. Azure **Reader** is not enough; without this role the eval - step fails on - `Microsoft.CognitiveServices/accounts/AIServices/agents/read`. -10. If the Foundry RBAC assignment is missing, do not run the workflow yet. - Show the exact GitHub OIDC client ID / service principal, desired role, and - target Foundry scope, then ask the user to approve the role assignment or + service principal has **two** RBAC assignments. Both are required; the eval + step fails silently (every metric returns `null`) if only one is in place. + 1. **Foundry User** on the Foundry project (or the Foundry resource scope + if that is the team's standard). Role id + `53ca6127-db72-4b80-b1b0-d745d6d5456d` (formerly Azure AI User). Without + this the candidate-staging step fails on + `Microsoft.CognitiveServices/accounts/AIServices/agents/read`. + 2. **Cognitive Services OpenAI User** on the underlying Azure AI Services + account that hosts the evaluator model deployment + (typically the parent account of the Foundry project). Role id + `5e0bd9bd-7b93-4f28-af87-19fc36ad61bd`. Without this the Foundry + `azure_ai_evaluator` graders fail with a 401 `PermissionDenied` on + `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action` + and every metric comes back `null` in the cloud eval report. AgentOps now + lifts that error into `results.json` and the orchestrator's "0 usable + metric scores" warning so the cause is visible in CI logs, but the + workflow still fails the gate. Grant this role **before** the first run. + Azure **Reader** is not enough for either step. +10. If either RBAC assignment is missing, do not run the workflow yet. + Show the exact GitHub OIDC client ID / service principal, desired role, + target scope (project for Foundry User, AI Services account for Cognitive + Services OpenAI User), then ask the user to approve the role assignment or get an Azure/Foundry admin to grant it. After assignment, read it back or ask the user to confirm before dispatching the workflow. - When the user approves and you know the Foundry scope, use the role id to - avoid rename drift: + When the user approves and you know the scopes, use the role ids to avoid + rename drift: - `az ad sp show --id --query id -o tsv` - `az role assignment list --assignee --scope --include-inherited` - `az role assignment create --assignee-object-id --assignee-principal-type ServicePrincipal --role 53ca6127-db72-4b80-b1b0-d745d6d5456d --scope ` + - `az role assignment create --assignee-object-id --assignee-principal-type ServicePrincipal --role 5e0bd9bd-7b93-4f28-af87-19fc36ad61bd --scope ` + The AI Services account scope looks like + `/subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/` + and can be derived from + `az cognitiveservices account list --resource-group --query "[?kind=='AIServices'].id" -o tsv`. 11. Ask before creating or updating GitHub repos, GitHub environments, variables/secrets, Entra app registrations/service principals, federated credentials, managed identities, or Azure RBAC assignments. @@ -304,11 +322,21 @@ Then configure Workload Identity Federation on the Azure side environment** the workflows will run from. See `docs/ci-github-actions.md` for the exact `az` commands. -Also grant the same app registration / service principal **Foundry User** on the -Foundry project or Foundry resource before the first workflow run. The PR gate -uses Foundry data-plane APIs to read prompt agents; Azure `Reader` only proves -ARM access and will still fail the eval step with -`Microsoft.CognitiveServices/accounts/AIServices/agents/read`. +Also grant the same app registration / service principal **two** Azure +RBAC roles before the first workflow run; both are required and the eval +step fails silently (every metric returns `null`) if only one is in place: + +1. **Foundry User** on the Foundry project or Foundry resource. The PR gate + uses Foundry data-plane APIs to read prompt agents; Azure `Reader` only + proves ARM access and will still fail the eval step with + `Microsoft.CognitiveServices/accounts/AIServices/agents/read`. +2. **Cognitive Services OpenAI User** on the underlying Azure AI Services + account that hosts the evaluator model deployment. Without this, Foundry + `azure_ai_evaluator` graders fail with a 401 `PermissionDenied` on the + OpenAI `chat/completions/action` data action and every metric returns + `null` in the cloud eval report. AgentOps surfaces that error in + `results.json` and the orchestrator's "0 usable metric scores" warning, + but the workflow still fails the gate — fix the role before the run. Tell the user that CI evals emit `agentops.eval.*` telemetry and scheduled Doctor runs emit `agentops.agent.finding.*` telemetry when App Insights is @@ -319,7 +347,11 @@ Monitor deep links. Already done in Step 2 - the `agentops-azure` service connection handles auth. Make sure the underlying service principal or managed -identity has the **Foundry User** role on the Foundry project or resource. +identity has **both** the **Foundry User** role on the Foundry project (or +Foundry resource) **and** the **Cognitive Services OpenAI User** role on the +underlying Azure AI Services account that hosts the evaluator model. Both +are required; without the OpenAI User role the Foundry graders fail with a +401 `PermissionDenied` and every cloud eval metric returns `null`. ## Step 4 - Use azd for deployment