fix(crm-agent): grant_dev_user.sh must assign both Azure AI User AND OpenAI User by carvychen · Pull Request #67 · carvychen/agent-platform

carvychen · 2026-05-10T03:19:32Z

Summary

Hit on jiaweichen: granted only `Cognitive Services OpenAI User` per PR #66's helper, waited 50+ min for RBAC propagation, still 401 PermissionDenied. Adding `Azure AI User` unblocks the path. Cross-checked with admin (works): admin has both roles on the Foundry account.

Root cause: AF SDK calls `POST /api/projects/

/openai/v1/responses` (Foundry "Hubless" Responses API), which requires the data action `Microsoft.CognitiveServices/accounts/AIServices/agents/write`. That data action is granted by `Azure AI User`, not by `Cognitive Services OpenAI User` (which covers only the classic Azure OpenAI inference path `POST /openai/deployments//chat/completions`).

PR #66 incorrectly attributed admin's success to `Cognitive Services OpenAI User` — admin already had `Azure AI User` granted earlier in the session, both ended up applied together, and the test wasn't sensitive enough to attribute the fix to the right role. Granting only OpenAI User to a fresh principal (jiaweichen) exposed the gap clearly.

Changes

`scripts/grant_dev_user.sh`: iterate over BOTH roles. Each role uses the same "check existing, create if absent" idempotent pattern. Re-running safely no-ops if both already assigned.
`docs/deployment/llm-foundry-zh.md`: changed "拿到 Cognitive Services OpenAI User 角色" → "拿到这两个角色" (matches reality).
`docs/deployment/troubleshooting-zh.md`: updated root-cause + fix to name both roles. Added the observed "30+ min stuck on 401 with only one role" symptom so the next victim recognizes it.

Test plan

`bash -n` syntax clean
Re-running on jiaweichen (both roles already in place from diagnosis) reports "✓ already has" twice and exits 0
Manual confirm jiaweichen REPL works once `Azure AI User` finishes propagating (granted 03:14 UTC; should be ready around 03:24-03:34 UTC)

🤖 Generated with Claude Code

…OpenAI User Hit it on jiaweichen: granted only `Cognitive Services OpenAI User`, waited 50+ min for propagation, still 401. Adding `Azure AI User` unblocks. Cross-checked with admin (which works): admin has BOTH roles on the Foundry account. The Foundry "Hubless" data plane path that AF actually uses (`POST /api/projects/<p>/openai/v1/responses`) requires the data action `Microsoft.CognitiveServices/accounts/AIServices/agents/write`, which is granted by `Azure AI User` — NOT by `Cognitive Services OpenAI User` (that one only covers the classic Azure OpenAI inference path `POST /openai/deployments/<d>/chat/completions`). The original PR #66 incorrectly attributed admin's success to `Cognitive Services OpenAI User` — admin had Azure AI User granted earlier in the session and both ended up working together. Granting only the OpenAI User role to a fresh principal exposes the gap clearly. Changes: - scripts/grant_dev_user.sh: grant BOTH roles, idempotently. Each iterates the same "list-then-create" pattern; either or both might already exist for re-runs. - docs/deployment/llm-foundry-zh.md: text says "两个角色" instead of the single role. - docs/deployment/troubleshooting-zh.md: updated root-cause + fix to name both roles + the "30+ min stuck" symptom we just observed. Smoke: re-running on jiaweichen now reports both already assigned (both were applied during diagnosis); test pending RBAC propagation of the late Azure AI User assignment (granted 03:14 UTC). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ched code (#68) Stripped narrative comments and dev-history docstrings from files heavily edited in PRs #58–#67. Standard applied per file: - Module docstrings: one short paragraph (purpose + required env / inputs + one link to deeper context). Removed "design rationale", "useful for X", and "same trick Y uses" narrative — that's PR-description material that rots when the codebase moves on. - Inline comments: kept the WHY (subtle invariants, real workarounds, hidden constraints — e.g. why `_bearer_request_hook` exists, why we refresh the OBO cache 60s early). Dropped WHAT comments that well-named identifiers already tell the reader. Files trimmed (lines before → after): scripts/run_agent_local.py 211 → 187 scripts/run_mcp_server_local.py 51 → 41 src/agent/builder.py 187 → 165 src/shared/auth.py 261 → 233 src/shared/bootstrap.py 135 → 115 src/agent/prompts/loader.py 42 → 27 ─── ─── 887 → 768 (-119 lines, ~13%) Drift bonus: bootstrap.py docstring + build_asgi_app docstring referenced `scripts/run_local.py`, which PR #61 renamed to `scripts/run_mcp_server_local.py`. Updated to current name. Skipped: scripts/grant_dev_user.sh — currently in flight on PR #67's branch to avoid merge conflicts; will trim in a follow-up after #67 lands. 84 unit tests pass. No behaviour change — comment edits only (+ stale-name fixes); module imports verified. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…69) * chore(crm-agent): trim grant_dev_user.sh comments — follow-up to PR #68 PR #68 skipped this file because PR #67 was modifying it concurrently. Now that #67 has merged, apply the same trim standard: - Header: 33 lines of design-rationale narrative ("why this exists", per-role data action explanations, RBAC propagation explanation) collapsed to 9 lines (purpose + Usage + 1-line propagation note + link to llm-foundry-zh.md "本地 REPL" for the deeper background). - Inline `# --- --wait: ... ---` block comment shortened from 5 lines to 2. - `--help` output range updated to match the new header (lines 2-10). Why the docs link suffices: the troubleshooting + llm-foundry doc already covers the role split (Azure AI User for Hubless paths, Cognitive Services OpenAI User for classic openai paths) — having the same explanation inlined in the script is duplicated knowledge that drifts. Smoke: --help renders the trimmed header, no-args fails loud, idempotent re-run on a user with both roles already in place reports two ✓. Net: 159 → 132 lines (-27, -17%). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(crm-agent): grant_dev_user.sh --wait probe body must satisfy Foundry's max_output_tokens minimum (16, was 5) User caught it: jiaweichen RBAC propagated successfully but --wait reported '400 (unexpected)' and exit 1 because the probe body had max_output_tokens=5 which Foundry rejects below the minimum 16. Two fixes: - Bump probe body to max_output_tokens=16. Comment names the constraint so the next reader doesn't lower it again. - Treat 400 the same as 200 in the propagation check. 400 means the request reached the inference layer (auth passed), just got rejected on body validation — that's still proof the data plane RBAC is effective. Robust against future API tightening that adds new body validation. Smoke: ./scripts/grant_dev_user.sh jiaweichen@... --wait now reports '[00:02] 200 ✓ propagated' immediately after a previously-effective grant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

carvychen mentioned this pull request May 10, 2026

chore(crm-agent): trim verbose dev-history comments from recently-touched code #68

Merged

3 tasks

carvychen merged commit 4994348 into main May 10, 2026
3 checks passed

carvychen deleted the fix/grant-dev-user-also-azure-ai-user branch May 10, 2026 03:30

carvychen mentioned this pull request May 10, 2026

chore(crm-agent): trim grant_dev_user.sh comments (follow-up to #68) #69

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(crm-agent): grant_dev_user.sh must assign both Azure AI User AND OpenAI User#67

fix(crm-agent): grant_dev_user.sh must assign both Azure AI User AND OpenAI User#67
carvychen merged 1 commit into
mainfrom
fix/grant-dev-user-also-azure-ai-user

carvychen commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carvychen commented May 10, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant