A LangGraph FastAPI service with two agentic graphs deployed to Azure Container Apps and registered as a native agent in Azure AI Foundry via the AAAS protocol.
- What this project is
- Local development
- Architecture
- AI Foundry concepts
- Agent tool types explained
- Foundry integration options
- AAAS protocol — Option C implementation
- LangGraph API reference
- AAAS sessions API reference
- Azure deployment
Two LangGraph graphs exposed as a FastAPI service:
| Graph | What it does |
|---|---|
support_agent |
Customer support — clarification loop + escalation human-in-the-loop interrupt |
code_review |
Code review — re-review loop + accept/reject/re_review interrupt |
Both graphs use Azure OpenAI gpt-4o-mini as the LLM backend (Ollama used locally only).
The service is registered in Azure AI Foundry as a native container_app kind agent via the AAAS (Azure AI Agent Service) protocol — meaning Foundry manages threads and tracing while LangGraph handles all execution logic.
cp .env.example .env
# Set OLLAMA_BASE_URL to your local Ollama instance
docker compose up| Service | URL |
|---|---|
| LangGraph API | http://localhost:8000 |
| API docs | http://localhost:8000/docs |
┌─────────────────────────────────────────────────────┐
│ AZURE AI FOUNDRY │
│ │
│ Agent registry Thread storage Traces/evals │
│ │
│ langgraph-demo-agent (container_app kind) │
│ │ │
└─────────┼───────────────────────────────────────────┘
│ AAAS protocol (sessions/turns)
│ POST /sessions/{id}/turns
│ GET /sessions/{id}/turns/{turn_id}
▼
┌─────────────────────────────────────────────────────┐
│ AZURE CONTAINER APPS (cae-langgraph) │
│ │
│ langgraph-api (external, port 8000) │
│ ├── /runs/* → LangGraph run management │
│ └── /sessions/* → AAAS protocol endpoints │
│ │
│ redis (internal, port 6379) │
│ └── future persistent checkpointer │
└─────────────────────────────────────────────────────┘
│
│ Azure OpenAI (gpt-4o-mini)
▼
┌─────────────────────────────────────────────────────┐
│ AI Hub: safabayar Model: gpt-4o-mini │
│ https://safabayar.cognitiveservices.azure.com/ │
└─────────────────────────────────────────────────────┘
Foundry is a platform layer that sits in front of your agents. It does not run your agent code — your Container App does that.
| Foundry feature | What it does |
|---|---|
| Agent Registry | Catalog of all agents — name, URL, version, metadata |
| Thread management | Owns conversation history, routes messages to the right agent |
| Multi-agent orchestration | Agent A can call Agent B as a sub-agent |
| Portal playground | Test any registered agent from the UI without writing code |
| Tracing & evaluation | Every turn logged — latency, token cost, quality scores |
| Unified auth | All agents secured via Azure AD, no individual API keys per agent |
| Versioning | Roll back an agent to a previous version at any time |
Whether your agent contains:
- A GPT-4o wrapper
- A LangGraph state machine with an LLM
- A LangGraph graph with no LLM (pure rules/deterministic)
- A legacy REST API
...Foundry treats them identically. It sends a message, expects a response. The internals are invisible to Foundry.
┌──────────────────────────────────┐
│ AZURE AI FOUNDRY │
│ │
│ Stores: │
│ - Agent registration (URL) │
│ - Conversation threads │
│ - Traces │
└──────────────┬───────────────────┘
│ sends user message
▼
┌──────────────────────────────────┐
│ YOUR CONTAINER APP │
│ │
│ Runs: │
│ - LangGraph graphs │ ← workflow lives here
│ - LLM calls │
│ - State machine logic │
│ - Business rules │
└──────────────────────────────────┘
The workflow is not in Foundry. Foundry is the front door and bus. Your container is the brain.
Once multiple agents are registered, Foundry routes between them:
User: "I have a bug causing customer complaints"
│
▼
FOUNDRY ORCHESTRATOR
│
┌──────┴──────┐
▼ ▼
code_review support_agent ← both your LangGraph containers
agent agent
│ │
└──────┬───────┘
▼
combined response
The orchestration — which agent runs, what gets passed between them — is defined either in Foundry's workflow system or in a dedicated orchestrator agent.
A function tool in the Agents API is not server-side execution. It is a contract between the LLM and your client application:
Agent LLM → decides to call invoke_langgraph
→ returns tool_call JSON to YOUR code
→ YOUR code calls the LangGraph API
→ YOUR code sends result back to agent
The portal has no way to configure this because your application is the executor, not Azure. The GUI only shows tools that Azure executes entirely on its side:
| Tool | Who executes |
|---|---|
| Web search | Azure (Bing) |
| Code interpreter | Azure (sandboxed container) |
| Azure AI Search | Azure (your index) |
| Function | Your application code |
To use a function tool, your client must run a polling loop:
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
import httpx, json, time
client = AIProjectClient(
endpoint="https://safabayar.services.ai.azure.com/api/projects/proj-default",
credential=DefaultAzureCredential(),
api_version="2025-05-15-preview",
)
LANGGRAPH_URL = "https://langgraph-api.ambitiousglacier-23b2e299.eastus2.azurecontainerapps.io"
def handle_tool_call(name: str, arguments: dict) -> str:
if name == "invoke_langgraph":
r = httpx.post(f"{LANGGRAPH_URL}/runs", json={
"graph_id": arguments["graph_name"],
"input": arguments["input"],
})
return json.dumps(r.json())
return "{}"
thread = client.agents.create_thread()
client.agents.create_message(thread.id, role="user",
content="Review my Python code: def add(a,b): return a+b")
run = client.agents.create_run(thread_id=thread.id, agent_id="langgraph-demo-agent")
while run.status in ("queued", "in_progress", "requires_action"):
time.sleep(1)
run = client.agents.get_run(thread_id=thread.id, run_id=run.id)
if run.status == "requires_action":
tool_outputs = []
for tc in run.required_action.submit_tool_outputs.tool_calls:
result = handle_tool_call(tc.function.name, json.loads(tc.function.arguments))
tool_outputs.append({"tool_call_id": tc.id, "output": result})
run = client.agents.submit_tool_outputs_to_run(
thread_id=thread.id, run_id=run.id, tool_outputs=tool_outputs
)The openapi tool type is visible in the Foundry portal GUI and lets Foundry call the API directly. No client-side execution loop needed. The agent handles start_run → poll state → resume on interrupt autonomously.
| Function tool | OpenAPI tool | AAAS (container_app) | |
|---|---|---|---|
| GUI visible | No | Yes | Yes (as a full agent) |
| Execution | Client code | Foundry → your API | Foundry → your API |
| Extra LLM layer | Yes (orchestrator) | Yes (orchestrator) | No |
| Operations exposed | 1 combined | 3 separate | Protocol-native |
| Thread management | Client | Client | Foundry |
| Tracing | None | None | Full Foundry traces |
Foundry agent (gpt-4o-mini) → decides to call OpenAPI tool → LangGraph API (gpt-4o-mini)
- Two LLM layers — redundant for a single-purpose agent
- No native Foundry thread ownership or tracing
- Good if you need Foundry to choose between multiple tools
Register the URL as a Foundry connection. Visible in Management → Connections. No Foundry features (no threads, tracing, portal playground). Call the LangGraph API directly.
Foundry (manages threads + traces) → AAAS protocol → LangGraph API (brain)
- Single LLM layer — LangGraph does all the work
- Foundry owns threads: history stored and visible in portal
- Full tracing per turn
- Portal playground works natively
- Foundry can orchestrate this alongside other agents
- Requires implementing 4 AAAS protocol endpoints (see below)
AAAS is a small REST contract. Foundry calls these endpoints on your Container App:
POST /sessions → Foundry creates a conversation session
POST /sessions/{id}/turns → Foundry sends a user message; your agent runs
GET /sessions/{id}/turns/{turn_id} → Foundry polls for the result (async)
DELETE /sessions/{id} → Foundry ends the session
That is the entire protocol. Your LangGraph API maps them like this:
POST /sessions → store session with graph_id + new thread_id
POST /sessions/{id}/turns → start (or resume) a LangGraph run; return run_id as turn_id
GET /sessions/{id}/turns/{turn_id} → check run status; map to AAAS turn status
DELETE /sessions/{id} → clean up session record
LangGraph human-in-the-loop interrupts become a two-turn exchange in AAAS:
Turn 1: user sends message
→ LangGraph runs, hits interrupt (e.g. escalation approval needed)
→ turn completes with status="completed", output = the interrupt question
Turn 2: user sends answer (true/false or text)
→ session detects pending interrupt, resumes the run
→ turn completes with status="completed", output = final result
After implementing the protocol endpoints:
import json, subprocess, urllib.request
CA_RESOURCE_ID = (
"/subscriptions/5ec3a6f9-978c-4e02-9d96-135dbc85269e"
"/resourceGroups/rg-bayarsafa-7080"
"/providers/Microsoft.App/containerapps/langgraph-api"
)
payload = {
"name": "langgraph-demo-agent",
"description": "LangGraph demo — support_agent and code_review graphs",
"definition": {
"kind": "container_app",
"container_app_resource_id": CA_RESOURCE_ID,
"container_protocol_versions": [
{"protocol": "AzureAIAgentService", "version": "1.0"}
],
}
}
token = subprocess.check_output(
["az", "account", "get-access-token", "--resource", "https://ai.azure.com/",
"--query", "accessToken", "-o", "tsv"], text=True
).strip()
body = json.dumps(payload).encode()
req = urllib.request.Request(
"https://safabayar.services.ai.azure.com/api/projects/proj-default"
"/agents/langgraph-demo-agent/versions?api-version=2025-05-15-preview",
data=body, method="POST",
headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
)
with urllib.request.urlopen(req) as resp:
print(json.dumps(json.loads(resp.read()), indent=2))- https://ai.azure.com → hub
safabayar→ projectproj-default→ Agents - Open
langgraph-demo-agent→ Edit - Change type to Container App
- Select the
langgraph-apiContainer App from the dropdown - Save
Base URL: https://langgraph-api.ambitiousglacier-23b2e299.eastus2.azurecontainerapps.io
{"status": "ok", "graphs": ["support_agent", "code_review"]}support_agent input:
{
"graph_id": "support_agent",
"input": {
"user_name": "Alice",
"message": "My login is broken"
}
}code_review input:
{
"graph_id": "code_review",
"input": {
"code_snippet": "def add(a, b):\n return a + b",
"language": "python"
}
}Response:
{"run_id": "...", "thread_id": "...", "graph_id": "...", "status": "running"}{
"run_id": "...",
"status": "interrupted",
"interrupt_payload": {
"type": "escalation_approval",
"severity": "high",
"summary": "User Alice has a HIGH severity issue..."
}
}Status values: running | interrupted | complete | error
{"resume_value": true}| Interrupt type | resume_value |
|---|---|
clarification_needed |
"<answer text>" |
escalation_approval |
true or false |
review_decision |
"accept", "reject", or "re_review" |
Events: token, node_update, interrupted, complete, error, heartbeat
{"score": 0.9, "comment": "Very helpful", "key": "user_feedback"}Base URL: https://langgraph-api.ambitiousglacier-23b2e299.eastus2.azurecontainerapps.io
These endpoints implement the Azure AI Agent Service protocol for Foundry integration.
{
"graph_id": "support_agent",
"metadata": {"user_name": "Alice"}
}Response:
{"id": "<session_id>", "graph_id": "support_agent", "status": "created"}{
"input": [{"role": "user", "content": "My login is broken"}]
}If the session has a pending interrupt (from a previous turn), the content is used as the resume_value automatically.
Response:
{"id": "<turn_id>", "session_id": "<session_id>", "status": "in_progress"}Response when in progress:
{"id": "<turn_id>", "status": "in_progress"}Response when interrupted (turn completes, interrupt becomes the agent's reply):
{
"id": "<turn_id>",
"status": "completed",
"output": {
"messages": [
{
"role": "assistant",
"content": "Do you approve escalation to the on-call team? (true/false)"
}
]
},
"interrupt_type": "escalation_approval"
}Response when complete:
{
"id": "<turn_id>",
"status": "completed",
"output": {
"messages": [
{"role": "assistant", "content": "Ticket TKT-A1B2C3D4 created. Severity: high."}
]
}
}Response on error:
{"id": "<turn_id>", "status": "failed", "error": "..."}{"deleted": true}| Resource | Name | Location |
|---|---|---|
| Resource Group | rg-bayarsafa-7080 |
eastus2 |
| ACR | safademo.azurecr.io |
eastus2 |
| Container Apps Env | cae-langgraph |
eastus2 |
| Container App | langgraph-api |
eastus2 |
| Container App | redis (internal) |
eastus2 |
| AI Hub | safabayar |
eastus2 |
| AI Project | proj-default |
— |
| Model | gpt-4o-mini GlobalStandard 10K TPM |
eastus2 |
| AI Agent | langgraph-demo-agent kind: container_app |
— |
| Service | URL |
|---|---|
| LangGraph API | https://langgraph-api.ambitiousglacier-23b2e299.eastus2.azurecontainerapps.io |
| AI Foundry Portal | https://ai.azure.com → safabayar → proj-default → Agents |
| AI Foundry Agent | https://safabayar.services.ai.azure.com/api/projects/proj-default/agents/langgraph-demo-agent |
az acr login --name safademo
docker build -t safademo.azurecr.io/langgraph-api:latest ./langgraph-api
docker push safademo.azurecr.io/langgraph-api:latest
az containerapp update \
--name langgraph-api \
--resource-group rg-bayarsafa-7080 \
--image safademo.azurecr.io/langgraph-api:latestNote:
az acr build(ACR Tasks) is disabled on this subscription. Always build locally.
See docs/azure-deployment.md for the complete step-by-step guide including all az commands.
State is in-memory. MemorySaver loses all run state on container restart. Redis is deployed internally but not yet wired in. To enable persistence replace MemorySaver with AsyncRedisSaver and set:
REDIS_URL=redis://redis.internal.ambitiousglacier-23b2e299.eastus2.azurecontainerapps.io:6379
Single replica only until Redis checkpointer is integrated — multiple replicas cannot share in-memory state.