Skip to content

Conversation

@madebygps
Copy link
Collaborator

Add Keycloak OAuth Authentication for MCP Server

This PR adds OAuth 2.0 authentication to the MCP server using Keycloak as the identity provider, implementing the MCP OAuth specification with Dynamic Client Registration (DCR).

What's Included

Infrastructure

  • Keycloak Container App (keycloak.bicep) - Keycloak 26.0 with pre-configured realm
  • HTTP Route Configuration (http-routes.bicep) - Rule-based routing via Azure Container Apps:
    • /auth/* → Keycloak
    • /* → MCP Server
  • Realm Configuration (keycloak-realm.json) - Pre-configured with:
    • mcp-server audience scope for token validation
    • DCR policies with trusted hosts and max client limits
    • Short-lived tokens (5 min) for security
  • Multi-stage Dockerfile (Dockerfile.keycloak) - Pre-compiles themes for cache consistency

MCP Server

  • OAuth-protected MCP server (keycloak_deployed_mcp.py) - FastMCP with JWT validation
  • Validates tokens against Keycloak's JWKS endpoint
  • Checks issuer, audience, and expiration claims

Agent

  • LangChain agent example (langchainv1_keycloak.py) - Demonstrates:
    • Dynamic Client Registration (DCR)
    • Token acquisition via client_credentials grant
    • Authenticated MCP tool calls

Architecture

┌─────────────┐     1. DCR + Token      ┌─────────────┐
│  LangChain  │ ──────────────────────► │  Keycloak   │
│   Agent     │ ◄────────────────────── │  (direct)   │
│             │                         └─────────────┘
│             │     2. MCP + Bearer
│             │ ──────────────────────► ┌─────────────┐
│             │ ◄────────────────────── │  mcproutes  │ → MCP Server
└─────────────┘                         └─────────────┘

Known Limitations (Demo Trade-offs)

Item Current Production Recommendation Why
Keycloak mode start-dev start with proper config Dev mode has relaxed security defaults
Database H2 in-memory PostgreSQL H2 doesn't persist data across restarts
Replicas 1 (due to H2) Multiple with shared DB H2 is in-memory, can't share state
Keycloak access Public (direct URL) Internal only via routes Route URL isn't known until after deployment
DCR Open (anonymous) Require initial access token Any client can register without auth

Note: Keycloak must be publicly accessible because its URL is dynamically generated by Azure. Token issuer validation requires a known URL, but the mcproutes URL isn't available until after deployment. Using a custom domain would fix this.


Deployment Instructions

Prerequisites

  • Azure CLI and azd installed
  • Docker running (for local builds)
  • Python 3.11+ with uv package manager

1. Set Environment Variables

# Set the Keycloak admin password (required)
azd env set KEYCLOAK_ADMIN_PASSWORD "YourSecurePassword123!"

# Optional: customize realm name (default: mcp)
azd env set KEYCLOAK_REALM_NAME "mcp"

2. Deploy to Azure

azd up

This will:

  • Create Azure Container Apps environment
  • Deploy Keycloak with pre-configured realm
  • Deploy MCP server with OAuth validation
  • Configure HTTP route-based routing

3. Verify Deployment

Check the outputs:

# Get the URLs
azd env get-value MCP_SERVER_URL
azd env get-value KEYCLOAK_DIRECT_URL
azd env get-value KEYCLOAK_ADMIN_CONSOLE

Visit the Keycloak admin console to verify realm is configured:

https://<your-mcproutes-url>/auth/admin

Login with admin / <your-password>


Testing the LangChain Agent

1. Generate Local Environment File

./infra/write_env.sh

This creates .env with:

  • KEYCLOAK_REALM_URL - Direct Keycloak URL for token requests
  • MCP_SERVER_URL - Route URL for MCP calls
  • Azure OpenAI and Cosmos DB settings

2. Run the Agent

cd agents
uv run langchainv1_keycloak.py

Expected Output

============================================================
LangChain Agent with Keycloak-Protected MCP Server
============================================================

Configuration:
  MCP Server:  https://mcproutes.<env>.azurecontainerapps.io/mcp
  Keycloak:    https://mcp-<name>-kc.<env>.azurecontainerapps.io/realms/mcp
  LLM Host:    azure
  Auth:        Dynamic Client Registration (DCR)

[11:40:48] INFO     📝 Registering client via DCR...
           INFO     ✅ Registered client: caef6f47-0243-474d-b...
           INFO     🔑 Getting access token from Keycloak...
           INFO     ✅ Got access token (expires in 300s)
           INFO     📡 Connecting to MCP server...
           INFO     🔧 Getting available tools...
           INFO     ✅ Found 1 tools: ['add_expense']
           INFO     💬 User query: Add an expense: yesterday I bought a laptop...
           ...
           INFO     📊 Agent Response:
The expense of $1200 for the laptop purchase has been successfully recorded.

- Add Keycloak container with pre-configured MCP realm (DCR enabled)
- Add keycloak_deployed_mcp.py with RemoteAuthProvider + JWTVerifier
- Add separate aca-noauth.bicep for non-authenticated MCP server
- Add LangChain agent example with Keycloak token acquisition
- Configure HTTP routes for multi-container deployment
- Scale Keycloak to 1 replica (fixes theme cache hash mismatch)
- Use direct Keycloak URL for issuer validation

Tested: DCR, token endpoint, MCP auth all working via LangChain agent
- Update langchainv1_keycloak.py to use Dynamic Client Registration
  instead of hardcoded client credentials
- Add register_client_via_dcr() to create clients at runtime
- Remove TEST_CLIENT_ID and TEST_CLIENT_SECRET constants
- Update Dockerfile.keycloak to multi-stage build with kc.sh build
  for consistent theme cache hashes across replicas
Bicep improvements:
- Add @description() decorators to all parameters in aca.bicep,
  aca-noauth.bicep, keycloak.bicep, and http-routes.bicep
- Remove redundant dependsOn in http-routes.bicep (implicit via existing)

Token issuer fix:
- Update write_env.sh and write_env.ps1 to use direct Keycloak URL
  for KEYCLOAK_REALM_URL instead of routed URL
- Fixes 401 Unauthorized errors caused by issuer mismatch between
  token's iss claim and MCP server's expected issuer
- Add MCP_SERVER_URL to env scripts for local agent testing
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds OAuth 2.0 authentication to the MCP (Model Context Protocol) server using Keycloak as the identity provider. The implementation follows the MCP OAuth specification with Dynamic Client Registration (DCR), enabling secure access to the expense tracking tools. The PR introduces a comprehensive authentication layer while maintaining backward compatibility through a separate no-auth deployment option.

Key Changes

  • Keycloak OAuth server deployment with pre-configured MCP realm and DCR policies
  • HTTP route-based routing configuration to handle /auth/* (Keycloak) and /* (MCP server) paths
  • JWT token validation in the MCP server using FastMCP's RemoteAuthProvider

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
servers/keycloak_deployed_mcp.py New OAuth-protected MCP server with JWT verification via Keycloak
servers/Dockerfile Updated to run keycloak_deployed_mcp instead of deployed_mcp
servers/Dockerfile.noauth New Dockerfile for backward-compatible no-auth MCP server
infra/keycloak.bicep Keycloak container app deployment with admin credentials and realm import
infra/Dockerfile.keycloak Multi-stage Keycloak Docker image with pre-compiled themes and realm configuration
infra/keycloak-realm.json Pre-configured Keycloak realm with MCP audience scope and DCR policies
infra/http-routes.bicep HTTP route configuration for path-based routing between Keycloak and MCP server
infra/main.bicep Orchestrates deployment of Keycloak, MCP servers, and routing infrastructure
infra/aca.bicep Updated MCP server container app with Keycloak authentication environment variables
infra/aca-noauth.bicep New container app deployment for no-auth MCP server variant
infra/main.parameters.json Added Keycloak configuration parameters (admin credentials, realm name, audience)
infra/write_env.sh Extended to include Keycloak and MCP server URLs in generated .env file
infra/write_env.ps1 PowerShell version of environment file generation script
scripts/keycloak_setup.sh Manual Keycloak configuration script for DCR setup and testing
agents/langchainv1_keycloak.py Example LangChain agent demonstrating DCR flow and authenticated MCP calls
azure.yaml Added keycloak and mcpnoauth service definitions for azd deployment
.vscode/mcp.json Added auth-expenses server configuration for VS Code integration
Comments suppressed due to low confidence (1)

.vscode/mcp.json:37

  • [nitpick] Formatting change: The file was changed from spaces to tabs for indentation. While this is a valid change, ensure this is consistent with the project's code style guidelines and that the entire file (including lines not shown in the diff) uses tabs consistently.
	"servers": {
		"expenses-mcp": {
			"type": "stdio",
			"command": "uv",
			"cwd": "${workspaceFolder}",
			"args": [
				"run",
				"servers/basic_mcp_stdio.py"
			]
		},
		"expenses-mcp-http": {
			"type": "http",
			"url": "http://localhost:8000/mcp"
		},
		"expenses-mcp-debug": {
			"type": "stdio",
			"command": "uv",
			"cwd": "${workspaceFolder}",
			"args": [
				"run",
				"--",
				"python",
				"-m",
				"debugpy",
				"--listen",
				"0.0.0.0:5678",
				"servers/basic_mcp_stdio.py"
			]
		},
	},
	"inputs": []
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 92 to 120
data = response.json()
client_id = data["client_id"]
client_secret = data["client_secret"]

logger.info(f"✅ Registered client: {client_id[:20]}...")
return client_id, client_secret


async def get_keycloak_token(client_id: str, client_secret: str) -> str:
"""Get an access token from Keycloak using client_credentials grant."""
token_url = f"{KEYCLOAK_REALM_URL}/protocol/openid-connect/token"

logger.info("🔑 Getting access token from Keycloak...")

async with httpx.AsyncClient() as client:
response = await client.post(
token_url,
data={
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
},
headers={"Content-Type": "application/x-www-form-urlencoded"},
)

if response.status_code != 200:
raise Exception(f"Failed to get token: {response.status_code} - {response.text}")

token_data = response.json()
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic error messages reduce debuggability. The error messages on lines 92 and 120 should be more specific. Consider including additional context such as the URL being called and any relevant response headers to help diagnose authentication issues.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in latest commit

Comment on lines 47 to 88
module app 'core/host/container-app-upsert.bicep' = {
name: '${serviceName}-container-app-module'
params: {
name: name
location: location
tags: union(tags, { 'azd-service-name': serviceName })
identityName: mcpNoAuthIdentity.name
exists: exists
containerAppsEnvironmentName: containerAppsEnvironmentName
containerRegistryName: containerRegistryName
ingressEnabled: true
env: [
{
name: 'AZURE_OPENAI_CHAT_DEPLOYMENT'
value: openAiDeploymentName
}
{
name: 'AZURE_OPENAI_ENDPOINT'
value: openAiEndpoint
}
{
name: 'RUNNING_IN_PRODUCTION'
value: 'true'
}
{
name: 'AZURE_CLIENT_ID'
value: mcpNoAuthIdentity.properties.clientId
}
{
name: 'AZURE_COSMOSDB_ACCOUNT'
value: cosmosDbAccount
}
{
name: 'AZURE_COSMOSDB_DATABASE'
value: cosmosDbDatabase
}
{
name: 'AZURE_COSMOSDB_CONTAINER'
value: cosmosDbContainer
}
]
targetPort: 8000
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module deploys an MCP server (deployed_mcp.py) without any Keycloak or JWT authentication and exposes it via external ingress (ingressEnabled: true) on port 8000. An attacker who can discover the mcpnoauth container app FQDN can call the MCP HTTP endpoint directly without a Bearer token and, via the Cosmos DB environment variables, read and write all expenses data. To avoid this auth bypass, either remove this no-auth service, gate it with the same JWT verification as keycloak_deployed_mcp, or disable external ingress so it is only reachable from a trusted private network.

Copilot uses AI. Check for mistakes.
infra/main.bicep Outdated
Comment on lines 787 to 795
module cosmosDbRoleMcpNoAuth 'core/security/documentdb-sql-role.bicep' = {
scope: resourceGroup
name: 'cosmosdb-role-mcpnoauth'
params: {
databaseAccountName: cosmosDb.outputs.name
principalId: mcpnoauth.outputs.identityPrincipalId
roleDefinitionId: '/${subscription().id}/resourceGroups/${resourceGroup.name}/providers/Microsoft.DocumentDB/databaseAccounts/${cosmosDb.outputs.name}/sqlRoleDefinitions/00000000-0000-0000-0000-000000000002'
}
}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the cosmosDbRoleMcpNoAuth module grants the unauthenticated mcpnoauth container app a Cosmos DB Data Contributor role, giving it full read/write access to the expenses container. Combined with the externally exposed, no-auth MCP server, any attacker who can reach that app can use its HTTP API to exfiltrate or corrupt all expense data in Cosmos DB. Restrict this role to the Keycloak-protected service only (or remove it entirely) and ensure the no-auth app does not have data-plane permissions unless it is fully authenticated and isolated.

Suggested change
module cosmosDbRoleMcpNoAuth 'core/security/documentdb-sql-role.bicep' = {
scope: resourceGroup
name: 'cosmosdb-role-mcpnoauth'
params: {
databaseAccountName: cosmosDb.outputs.name
principalId: mcpnoauth.outputs.identityPrincipalId
roleDefinitionId: '/${subscription().id}/resourceGroups/${resourceGroup.name}/providers/Microsoft.DocumentDB/databaseAccounts/${cosmosDb.outputs.name}/sqlRoleDefinitions/00000000-0000-0000-0000-000000000002'
}
}
// Removed: Do not assign data-plane permissions to unauthenticated app.

Copilot uses AI. Check for mistakes.
madebygps and others added 8 commits December 4, 2025 12:22
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Added optional Keycloak auth to agent (controlled by KEYCLOAK_REALM_URL env var)
- Fixed health endpoint in keycloak_deployed_mcp.py
- Updated agent bicep to pass KEYCLOAK_REALM_URL
- Agent now uses mcproutes URL for MCP server
},
"inputs": []
}
"servers": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the whitespace diff from? Pre-commit? (I did add precommit btw if you want to install it)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

infra/*.bicep
infra/*.ps1
infra/*.sh
infra/core
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .dockerignore file excludes infra/keycloak-realm.json but this file is needed by the Keycloak Dockerfile (see line 4 of Dockerfile.keycloak). This will cause the Docker build to fail with a "file not found" error.

To fix this, add an exception to allow the keycloak-realm.json file:

# Exclude most of infra, but allow keycloak files needed for builds
infra/*.bicep
infra/*.ps1
infra/*.sh
infra/core
!infra/keycloak-realm.json
Suggested change
infra/core
infra/core
!infra/keycloak-realm.json

Copilot uses AI. Check for mistakes.
Comment on lines 22 to 26
def require_env_var(name: str) -> str:
value = os.getenv(name)
if value is None or value.strip() == "":
logger.error(f"Missing required environment variable: {name}")
exit(1)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using exit(1) instead of sys.exit(1) can cause issues in some contexts (e.g., when the code is imported as a module). The built-in exit() is intended for the interactive interpreter, while sys.exit() is the proper way to exit a program.

Change to:

import sys
...
def require_env_var(name: str) -> str:
    value = os.getenv(name)
    if value is None or value.strip() == "":
        logger.error(f"Missing required environment variable: {name}")
        sys.exit(1)
    return value

Copilot uses AI. Check for mistakes.
resources: {
cpu: json('2.0')
memory: '4.0Gi'
}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Keycloak container app is missing health probes configuration. Container Apps uses health probes to determine if a container is ready to receive traffic and whether it should be restarted.

Keycloak has health endpoints enabled (KC_HEALTH_ENABLED=true on line 106), so you should add probes similar to the server configuration:

template: {
  containers: [
    {
      // ... existing configuration ...
      probes: [
        {
          type: 'Startup'
          httpGet: {
            path: '/health/ready'
            port: 8080
          }
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 60
        }
        {
          type: 'Readiness'
          httpGet: {
            path: '/health/ready'
            port: 8080
          }
          periodSeconds: 10
          failureThreshold: 3
        }
        {
          type: 'Liveness'
          httpGet: {
            path: '/health/live'
            port: 8080
          }
          periodSeconds: 30
          failureThreshold: 3
        }
      ]
    }
  ]
}
Suggested change
}
}
probes: [
{
type: 'Startup'
httpGet: {
path: '/health/ready'
port: 8080
}
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 60
}
{
type: 'Readiness'
httpGet: {
path: '/health/ready'
port: 8080
}
periodSeconds: 10
failureThreshold: 3
}
{
type: 'Liveness'
httpGet: {
path: '/health/live'
port: 8080
}
periodSeconds: 30
failureThreshold: 3
}
]

Copilot uses AI. Check for mistakes.
Comment on lines 3 to 24
# Copy the MCP realm configuration for import
COPY infra/keycloak-realm.json /opt/keycloak/data/import/mcp-realm.json

# Build Keycloak to pre-compile themes (fixes cache hash mismatch across replicas)
RUN /opt/keycloak/bin/kc.sh build

# Production image with pre-built themes
FROM quay.io/keycloak/keycloak:26.0

# Copy built Keycloak with consistent theme cache hashes
COPY --from=builder /opt/keycloak/ /opt/keycloak/

# Expose port 8080
EXPOSE 8080

ENTRYPOINT ["/opt/keycloak/bin/kc.sh"]

# Start in dev mode with H2 database (still uses pre-built themes)
# --proxy-headers=xforwarded tells Keycloak it's behind a reverse proxy that sets X-Forwarded-* headers
# --hostname-strict=false allows dynamic hostname resolution from proxy headers
# --import-realm imports the MCP realm on startup
CMD ["start-dev", "--http-port=8080", "--proxy-headers=xforwarded", "--hostname-strict=false", "--import-realm"]
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Keycloak image imports infra/keycloak-realm.json (COPY infra/keycloak-realm.json … together with --import-realm), and that realm configuration enables anonymous Dynamic Client Registration with a broad trusted-hosts wildcard ("*.azurecontainerapps.io"). As a result, any internet client can register an OAuth client and obtain mcp-server audience tokens via client_credentials, effectively granting unauthorized access to the MCP server that trusts this realm. To fix this, update the realm configuration to disable anonymous client registration (require an initial access token or authenticated registration only) and restrict trusted hosts to a tight allowlist for your deployment rather than an open wildcard.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm okay well now I see this is a fair security point. Is there a good way with azd where we can do that restriction once we know our exact fqdn?

raise RuntimeError(f"Token request failed: {response.status_code} - {response.text}")

token_data = response.json()
logger.info(f"✅ Got access token (expires in {token_data.get('expires_in', '?')}s)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised there isnt a Keycloak Python SDK- wouldnt the best practice be to cache the token and only refresh when near expiration? Wondering if we should make our own KeyCloakClient that does that.

"""
LangChain agent that connects to Keycloak-protected MCP server.
This script demonstrates:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For agent-framework, you modified the current code, but for langchain, you made a new one?
I think we can put a KeyCloakClient in a separate file that both existing agents could use?

@pamelafox
Copy link
Contributor

@madebygps Can you add a file section to the README about Deploying with Keycloak authentication?
You can bring things in from the PR description

Copy link
Contributor

@pamelafox pamelafox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

@madebygps madebygps merged commit 1f69dee into Azure-Samples:main Dec 5, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants