# AWS Data Processing MCP Server on AgentCore Runtime

This notebook demonstrates how to test and deploy the AWS Data Processing MCP server to Amazon Bedrock AgentCore Runtime.

## Prerequisites Setup

**Step 1:** Clone the AWS MCP repository
```bash
git clone https://github.com/awslabs/mcp.git
```

**Step 2:** Copy the AWS Data Processing MCP server to your project root
```bash
cp -r ./mcp/src/aws-dataprocessing-mcp-server ./
```

**Step 3:** Set up your environment variables in `.env` file:
```
COGNITO_POOL_ID=your_pool_id
COGNITO_REGION=us-east-1
COGNITO_USERNAME=admin
COGNITO_CLIENT_SECRET=your_client_secret
COGNITO_PASSWORD=your_password
AWS_PROFILE=default
AWS_REGION=us-east-1
CUSTOM_TAGS=false
```

**Step 4:** Follow the instructions below to complete the deployment

## AWS Data Processing MCP Server Overview

The AWS Data Processing MCP server provides comprehensive data processing tools and real-time pipeline visibility across AWS Glue, Amazon EMR-EC2, and Amazon Athena. This integration equips AI assistants with 40+ specialized tools organized into 14 handler classes:

### AWS Glue Integration (20+ tools)

#### Data Catalog Management
- **`manage_aws_glue_databases`**: Create, update, delete, and list Glue databases
- **`manage_aws_glue_tables`**: Manage tables with schema definition and partitioning
- **`manage_aws_glue_connections`**: Configure connections to external data sources
- **`manage_aws_glue_partitions`**: Handle table partitions for optimized querying
- **`manage_aws_glue_catalog`**: Import and manage external catalogs

#### ETL Job Orchestration
- **`manage_aws_glue_jobs`**: Create, run, monitor, and manage Glue ETL jobs
- **`manage_aws_glue_crawlers`**: Automated data discovery and cataloging
- **`manage_aws_glue_classifiers`**: Custom data format detection

#### Interactive Development
- **`manage_aws_glue_sessions`**: Interactive Spark and Ray workloads
- **`manage_aws_glue_statements`**: Execute code in interactive sessions

#### Workflow Management
- **`manage_aws_glue_workflows`**: Orchestrate complex ETL activities
- **`manage_aws_glue_triggers`**: Schedule and automate workflow execution

#### Security and Configuration
- **`manage_aws_glue_usage_profiles`**: Resource allocation and cost management
- **`manage_aws_glue_security_configurations`**: Data encryption settings
- **`manage_aws_glue_encryption`**: Catalog encryption management
- **`manage_aws_glue_resource_policies`**: Access control policies

### Amazon EMR Integration (10+ tools)

#### Cluster Management
- **`manage_aws_emr_clusters`**: Create, configure, monitor, and terminate EMR clusters
- **`manage_aws_emr_ec2_instances`**: Manage instance fleets and groups with auto-scaling
- **`manage_aws_emr_ec2_steps`**: Submit and monitor Hadoop, Spark, and other job steps

### Amazon Athena Integration (10+ tools)

#### Query Management
- **`manage_aws_athena_query_executions`**: Execute, monitor, and manage SQL queries
- **`manage_aws_athena_named_queries`**: Create reusable query libraries

#### Data Catalog Operations
- **`manage_aws_athena_data_catalogs`**: Manage multiple catalog types (LAMBDA, GLUE, HIVE, FEDERATED)
- **`manage_aws_athena_databases_and_tables`**: Database and table metadata discovery
- **`manage_aws_athena_workgroups`**: Cost control and access management

### Common Resource Management

#### IAM and S3 Tools
- **`add_inline_policy`**: Create custom IAM policies for data processing services
- **`get_policies_for_role`**: Retrieve role permissions
- **`create_data_processing_role`**: Create specialized IAM roles
- **`get_roles_for_service`**: Find service-specific roles
- **`list_s3_buckets`**: Analyze S3 bucket usage for data processing
- **`upload_to_s3`**: Upload scripts and code to S3
- **`analyze_s3_usage_for_data_processing`**: Usage pattern analysis

### Key Features
- **Read/Write Modes**: Optional `--allow-write` flag for mutating operations
- **Sensitive Data Access**: `--allow-sensitive-data-access` for logs and events
- **Resource Tagging**: MCP-managed resource tracking with optional `CUSTOM_TAGS` override
- **Multi-Service Integration**: Seamless workflows across Glue, EMR, and Athena
- **Natural Language Interface**: Complex data operations through conversational AI

All tools require appropriate AWS permissions and work together to provide end-to-end data processing pipeline management.

## Prerequisites

Before running this notebook, ensure you have:

### System Requirements
- Python 3.10 or higher
- AWS CLI configured with valid credentials
- Docker installed (for containerization)

### AWS Permissions
Your AWS credentials must have comprehensive permissions for:

#### Required for All Operations
- Amazon Bedrock AgentCore
- Amazon ECR (for container registry)
- Amazon Cognito (for authentication)
- IAM (for role creation)
- AWS Systems Manager Parameter Store
- AWS Secrets Manager

#### Data Processing Services
- **AWS Glue**: Full access for data catalog, ETL jobs, crawlers, interactive sessions
- **Amazon EMR**: Cluster management, instance operations, step execution
- **Amazon Athena**: Query execution, workgroup management, data catalog operations
- **Amazon S3**: Data storage and script hosting
- **CloudWatch**: Logging and monitoring

**Security Note**: This MCP server requires extensive permissions for comprehensive data processing operations. Use appropriate access controls in production environments.

### Project Structure
- `aws-dataprocessing-mcp-server/` - The MCP server implementation
- `dataprocessing-requirements.txt` - Python dependencies for data processing
- `utils.py` - Helper functions for Cognito and IAM setup

## 1. Install Dependencies

Install all required Python packages using uv (recommended) or pip:

In [1]:
# Configure AWS Profile
import os
os.environ['AWS_PROFILE'] = 'ibc2025'
print(f"AWS Profile set to: {os.environ['AWS_PROFILE']}")

AWS Profile set to: ibc2025


In [2]:
# Check current Python and install packages directly
import sys
print(f"Python executable: {sys.executable}")

# Install packages using the current Python interpreter
import subprocess
try:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-r", "dataprocessing-requirements.txt"])
    print("‚úì All packages installed successfully")
except subprocess.CalledProcessError as e:
    print(f"Error installing packages: {e}")
    
# Verify key packages are available
try:
    import boto3
    print("‚úì boto3 available")
except ImportError:
    print("‚ùå boto3 not available")
    
try:
    from dotenv import load_dotenv
    print("‚úì python-dotenv available")
except ImportError:
    print("‚ùå python-dotenv not available")

Python executable: /opt/homebrew/opt/python@3.11/bin/python3.11
‚úì All packages installed successfully
‚úì boto3 available
‚úì python-dotenv available



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/opt/python@3.11/bin/python3.11 -m pip install --upgrade pip[0m


In [3]:
# Configure AWS Profile
import os
os.environ['AWS_PROFILE'] = 'ibc2025'
print(f"AWS Profile set to: {os.environ['AWS_PROFILE']}")

AWS Profile set to: ibc2025


In [4]:
!uv pip install -r dataprocessing-requirements.txt

[2mAudited [1m10 packages[0m [2min 31ms[0m[0m


## 2. Local Testing

Before deploying to AgentCore Runtime, let's test the AWS Data Processing MCP server locally.

### 2.1 MCP Server Wrapper

The `mcp-server.py` file creates a FastMCP wrapper around the AWS Data Processing MCP server for AgentCore deployment. This wrapper:

- Imports all 14 handler classes from the original server implementation
- Registers 40+ tools across Glue, EMR, and Athena services
- Configures the server for HTTP transport on port 8000
- Supports `--allow-write` and `--allow-sensitive-data-access` flags
- Enables stateless HTTP mode for AgentCore compatibility
- Provides comprehensive instructions for data processing workflows

The server exposes tools from these handler classes:
- **Glue Handlers**: Data Catalog, ETL Jobs, Crawlers, Interactive Sessions, Workflows, Commons
- **EMR Handlers**: Clusters, Instances, Steps
- **Athena Handlers**: Queries, Data Catalogs, Workgroups
- **Common Handlers**: IAM and S3 resource management

In [5]:
%%writefile mcp-server.py
#!/usr/bin/env python3
import os
import sys
import argparse

# Add the path before any imports
sys.path.insert(0, os.path.abspath("./aws-dataprocessing-mcp-server"))

# Import all handler classes from the original server
from awslabs.aws_dataprocessing_mcp_server.handlers.athena.athena_data_catalog_handler import AthenaDataCatalogHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.athena.athena_query_handler import AthenaQueryHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.athena.athena_workgroup_handler import AthenaWorkGroupHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.commons.common_resource_handler import CommonResourceHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.emr.emr_ec2_cluster_handler import EMREc2ClusterHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.emr.emr_ec2_instance_handler import EMREc2InstanceHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.emr.emr_ec2_steps_handler import EMREc2StepsHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.glue.crawler_handler import CrawlerHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.glue.data_catalog_handler import GlueDataCatalogHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.glue.glue_commons_handler import GlueCommonsHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.glue.glue_etl_handler import GlueEtlJobsHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.glue.interactive_sessions_handler import GlueInteractiveSessionsHandler
from awslabs.aws_dataprocessing_mcp_server.handlers.glue.worklows_handler import GlueWorkflowAndTriggerHandler

# Create a new FastMCP instance with correct parameters
from mcp.server.fastmcp import FastMCP

# Parse command line arguments
parser = argparse.ArgumentParser(description='AWS Data Processing MCP Server')
parser.add_argument('--allow-write', action='store_true', help='Enable write operations')
parser.add_argument('--allow-sensitive-data-access', action='store_true', help='Allow access to sensitive data')
args, unknown = parser.parse_known_args()

# Store flags in environment variables for handlers
if args.allow_write:
    os.environ['ALLOW_WRITE'] = 'true'
if args.allow_sensitive_data_access:
    os.environ['ALLOW_SENSITIVE_DATA_ACCESS'] = 'true'

mcp = FastMCP(
    'awslabs.aws-dataprocessing-mcp-server',
    host="0.0.0.0",
    stateless_http=True,
    instructions="""AWS Data Processing MCP Server provides comprehensive tools for managing AWS data processing services including Glue, EMR, and Athena.

    This server enables you to:
    - Manage AWS Glue Data Catalog with databases, tables, connections, and partitions
    - Create and orchestrate ETL jobs with automated crawlers and interactive sessions
    - Deploy and manage Amazon EMR clusters with instance fleet management
    - Execute and monitor Hadoop, Spark, and other big data processing steps
    - Run SQL queries through Amazon Athena with workgroup management
    - Configure security, encryption, and access policies across all services
    - Manage IAM roles and S3 resources for data processing workflows

    ## Available Tool Categories:
    
    ### AWS Glue Tools (20+ tools)
    - **Data Catalog**: manage_aws_glue_databases, manage_aws_glue_tables, manage_aws_glue_connections, manage_aws_glue_partitions, manage_aws_glue_catalog
    - **ETL Jobs**: manage_aws_glue_jobs (create, run, monitor job runs, bookmarks)
    - **Crawlers**: manage_aws_glue_crawlers, manage_aws_glue_classifiers
    - **Interactive Sessions**: manage_aws_glue_sessions, manage_aws_glue_statements
    - **Workflows**: manage_aws_glue_workflows, manage_aws_glue_triggers
    - **Security & Config**: manage_aws_glue_usage_profiles, manage_aws_glue_security_configurations, manage_aws_glue_encryption, manage_aws_glue_resource_policies
    
    ### Amazon EMR Tools (10+ tools)
    - **Cluster Management**: manage_aws_emr_clusters (create, configure, terminate clusters)
    - **Instance Management**: manage_aws_emr_ec2_instances (fleet and group management)
    - **Step Execution**: manage_aws_emr_ec2_steps (submit and monitor jobs)
    
    ### Amazon Athena Tools (10+ tools)
    - **Query Execution**: manage_aws_athena_query_executions (execute, monitor SQL queries)
    - **Named Queries**: manage_aws_athena_named_queries (reusable query libraries)
    - **Data Catalogs**: manage_aws_athena_data_catalogs (LAMBDA, GLUE, HIVE, FEDERATED)
    - **Discovery**: manage_aws_athena_databases_and_tables (metadata exploration)
    - **Workgroups**: manage_aws_athena_workgroups (cost control, access management)
    
    ### Common Resource Tools
    - **IAM Management**: add_inline_policy, get_policies_for_role, create_data_processing_role, get_roles_for_service
    - **S3 Operations**: list_s3_buckets, upload_to_s3, analyze_s3_usage_for_data_processing

    ## Operation Modes:
    - **Read-Only Mode** (default): Safe exploration and monitoring operations
    - **Write Mode** (--allow-write): Enable resource creation, modification, and deletion
    - **Sensitive Data Access** (--allow-sensitive-data-access): Access logs, events, and sensitive configurations

    ## Common Workflows:
    1. **Data Discovery**: Create crawlers ‚Üí Generate tables ‚Üí Query with Athena
    2. **ETL Pipeline**: Design Glue jobs ‚Üí Configure workflows ‚Üí Monitor execution
    3. **Big Data Processing**: Launch EMR clusters ‚Üí Submit steps ‚Üí Process at scale
    4. **Analytics**: Set up Athena workgroups ‚Üí Execute queries ‚Üí Analyze results

    ## Resource Management:
    - All resources are tagged for MCP management (unless CUSTOM_TAGS=true)
    - Only MCP-created resources can be modified or deleted through this server
    - IAM roles and policies are automatically configured for service access

    For more information about AWS Data Processing services, visit:
    - AWS Glue: https://aws.amazon.com/glue/
    - Amazon EMR: https://aws.amazon.com/emr/
    - Amazon Athena: https://aws.amazon.com/athena/
    """,
    dependencies=[
        'pydantic',
        'loguru',
        'boto3',
        'requests',
        'pyyaml',
        'cachetools',
    ],
)

# Initialize handlers with write permissions
allow_write = args.allow_write
allow_sensitive = args.allow_sensitive_data_access

# Athena handlers
athena_data_catalog_handler = AthenaDataCatalogHandler(allow_write=allow_write)
athena_query_handler = AthenaQueryHandler(allow_write=allow_write, allow_sensitive_data_access=allow_sensitive)
athena_workgroup_handler = AthenaWorkGroupHandler(allow_write=allow_write)

# Common resource handler
common_resource_handler = CommonResourceHandler(allow_write=allow_write)

# EMR handlers
emr_cluster_handler = EMREc2ClusterHandler(allow_write=allow_write)
emr_instance_handler = EMREc2InstanceHandler(allow_write=allow_write)
emr_steps_handler = EMREc2StepsHandler(allow_write=allow_write)

# Glue handlers
crawler_handler = CrawlerHandler(allow_write=allow_write)
glue_data_catalog_handler = GlueDataCatalogHandler(allow_write=allow_write)
glue_commons_handler = GlueCommonsHandler(allow_write=allow_write)
glue_etl_handler = GlueEtlJobsHandler(allow_write=allow_write)
glue_sessions_handler = GlueInteractiveSessionsHandler(allow_write=allow_write, allow_sensitive_data_access=allow_sensitive)
glue_workflow_handler = GlueWorkflowAndTriggerHandler(allow_write=allow_write)

# Register all tools from handlers
# Athena tools
mcp.tool(name='manage_aws_athena_data_catalogs')(athena_data_catalog_handler.manage_aws_athena_data_catalogs)
mcp.tool(name='manage_aws_athena_databases_and_tables')(athena_data_catalog_handler.manage_aws_athena_databases_and_tables)
mcp.tool(name='manage_aws_athena_query_executions')(athena_query_handler.manage_aws_athena_query_executions)
mcp.tool(name='manage_aws_athena_named_queries')(athena_query_handler.manage_aws_athena_named_queries)
mcp.tool(name='manage_aws_athena_workgroups')(athena_workgroup_handler.manage_aws_athena_workgroups)

# Common resource tools
mcp.tool(name='add_inline_policy')(common_resource_handler.add_inline_policy)
mcp.tool(name='get_policies_for_role')(common_resource_handler.get_policies_for_role)
mcp.tool(name='create_data_processing_role')(common_resource_handler.create_data_processing_role)
mcp.tool(name='get_roles_for_service')(common_resource_handler.get_roles_for_service)
mcp.tool(name='list_s3_buckets')(common_resource_handler.list_s3_buckets)
mcp.tool(name='upload_to_s3')(common_resource_handler.upload_to_s3)
mcp.tool(name='analyze_s3_usage_for_data_processing')(common_resource_handler.analyze_s3_usage_for_data_processing)

# EMR tools
mcp.tool(name='manage_aws_emr_clusters')(emr_cluster_handler.manage_aws_emr_clusters)
mcp.tool(name='manage_aws_emr_ec2_instances')(emr_instance_handler.manage_aws_emr_ec2_instances)
mcp.tool(name='manage_aws_emr_ec2_steps')(emr_steps_handler.manage_aws_emr_ec2_steps)

# Glue tools
mcp.tool(name='manage_aws_glue_crawlers')(crawler_handler.manage_aws_glue_crawlers)
mcp.tool(name='manage_aws_glue_classifiers')(crawler_handler.manage_aws_glue_classifiers)
mcp.tool(name='manage_aws_glue_crawler_management')(crawler_handler.manage_aws_glue_crawler_management)
mcp.tool(name='manage_aws_glue_databases')(glue_data_catalog_handler.manage_aws_glue_databases)
mcp.tool(name='manage_aws_glue_tables')(glue_data_catalog_handler.manage_aws_glue_tables)
mcp.tool(name='manage_aws_glue_connections')(glue_data_catalog_handler.manage_aws_glue_connections)
mcp.tool(name='manage_aws_glue_partitions')(glue_data_catalog_handler.manage_aws_glue_partitions)
mcp.tool(name='manage_aws_glue_catalog')(glue_data_catalog_handler.manage_aws_glue_catalog)
mcp.tool(name='manage_aws_glue_usage_profiles')(glue_commons_handler.manage_aws_glue_usage_profiles)
mcp.tool(name='manage_aws_glue_security_configurations')(glue_commons_handler.manage_aws_glue_security_configurations)
mcp.tool(name='manage_aws_glue_encryption')(glue_commons_handler.manage_aws_glue_encryption)
mcp.tool(name='manage_aws_glue_resource_policies')(glue_commons_handler.manage_aws_glue_resource_policies)
mcp.tool(name='manage_aws_glue_jobs')(glue_etl_handler.manage_aws_glue_jobs)
mcp.tool(name='manage_aws_glue_sessions')(glue_sessions_handler.manage_aws_glue_sessions)
mcp.tool(name='manage_aws_glue_statements')(glue_sessions_handler.manage_aws_glue_statements)
mcp.tool(name='manage_aws_glue_workflows')(glue_workflow_handler.manage_aws_glue_workflows)
mcp.tool(name='manage_aws_glue_triggers')(glue_workflow_handler.manage_aws_glue_triggers)

if __name__ == "__main__":
    write_mode = "with write access" if args.allow_write else "in read-only mode"
    sensitive_mode = " and sensitive data access" if args.allow_sensitive_data_access else ""
    print(f"Starting AWS Data Processing MCP server {write_mode}{sensitive_mode} on http://0.0.0.0:8000")
    mcp.run(transport="streamable-http")

Overwriting mcp-server.py


### 2.2 Local Test Client

The `mcp-client.py` creates a simple test client that:

- Connects to the local MCP server at `http://0.0.0.0:8000/mcp`
- Lists all available tools
- Provides basic connectivity testing

This client helps verify that your MCP server is running correctly before deployment.

In [6]:
%%writefile mcp-client.py
#!/usr/bin/env python3
import asyncio
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

async def test_server():
    mcp_url = "http://0.0.0.0:8000/mcp"
    
    try:
        async with streamablehttp_client(mcp_url, {}, terminate_on_close=False) as (
            read_stream, write_stream, _
        ):
            async with ClientSession(read_stream, write_stream) as session:
                await session.initialize()
                
                tool_result = await session.list_tools()
                print(f"Found {len(tool_result.tools)} tools:")
                
                # Group tools by service
                glue_tools = [t.name for t in tool_result.tools if 'glue' in t.name]
                emr_tools = [t.name for t in tool_result.tools if 'emr' in t.name]
                athena_tools = [t.name for t in tool_result.tools if 'athena' in t.name]
                common_tools = [t.name for t in tool_result.tools if t.name not in glue_tools + emr_tools + athena_tools]
                
                print(f"\nAWS Glue tools ({len(glue_tools)}):")
                for tool in glue_tools:
                    print(f"  - {tool}")
                    
                print(f"\nAmazon EMR tools ({len(emr_tools)}):")
                for tool in emr_tools:
                    print(f"  - {tool}")
                    
                print(f"\nAmazon Athena tools ({len(athena_tools)}):")
                for tool in athena_tools:
                    print(f"  - {tool}")
                    
                print(f"\nCommon resource tools ({len(common_tools)}):")
                for tool in common_tools:
                    print(f"  - {tool}")
                
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    asyncio.run(test_server())

Overwriting mcp-client.py


### 2.3 Local Testing Instructions

To test your AWS Data Processing MCP server locally:

1. **Terminal 1**: Start the MCP server
   ```bash
   python mcp-server.py --allow-write --allow-sensitive-data-access
   ```
   Expected output: `Starting AWS Data Processing MCP server with write access and sensitive data access on http://0.0.0.0:8000`
   
2. **Terminal 2**: Run the test client
   ```bash
   python mcp-client.py
   ```
   Expected output: `Found 25+ tools:` followed by categorized tool lists

**Note**: Local testing requires AWS credentials with appropriate permissions for Glue, EMR, and Athena services. The server will start but tools may fail without proper AWS access.

## 3. Amazon Cognito Authentication Setup

AgentCore Runtime requires JWT-based authentication. We'll use Amazon Cognito to provide bearer tokens for accessing our deployed MCP server.

The `utils.py` file contains helper functions:
- `get_cognito_pool_info()`: Retrieves configuration from an existing Cognito User Pool
- `setup_cognito_user_pool()`: Creates a new User Pool if needed
- `create_agentcore_role()`: Creates the necessary IAM role with proper permissions

In [7]:
import os
from dotenv import load_dotenv
from utils import get_cognito_pool_info, create_agentcore_role

load_dotenv()

pool_id = os.getenv('COGNITO_POOL_ID', 'us-east-1_XXXXX')
region = os.getenv('COGNITO_REGION', 'us-east-1')
    
print(f"Get Cognito user pool info for pool id: {pool_id} in region: {region}")

print("Setting up Amazon Cognito user pool...")
cognito_config = get_cognito_pool_info(pool_id, region)
print("Cognito setup completed ‚úì")

Get Cognito user pool info for pool id: eu-central-1_PaVtjk8dt in region: eu-central-1
Setting up Amazon Cognito user pool...
Using client ID: 4rit5a00iqft9ak8sl5hb28sr
Making OAuth request to: https://mcp-registry-241533163649-mcp-gateway-registry.auth.eu-central-1.amazoncognito.com/oauth2/token
‚úÖ Successfully obtained bearer token via OAuth client credentials
Pool id: eu-central-1_PaVtjk8dt
Discovery URL: https://cognito-idp.eu-central-1.amazonaws.com/eu-central-1_PaVtjk8dt/.well-known/openid-configuration
Client ID: 4rit5a00iqft9ak8sl5hb28sr
Bearer Token: eyJraWQiOiJYZ1wvblZQWmtZeEtLM3ZicHBPRVQ2cUVYZUZLdE12QkVcLzVyWjJGK3ZoZWs9IiwiYWxnIjoiUlMyNTYifQ.eyJzdWIiOiI0cml0NWEwMGlxZnQ5YWs4c2w1aGIyOHNyIiwidG9rZW5fdXNlIjoiYWNjZXNzIiwic2NvcGUiOiJtY3AtcmVnaXN0cnlcL3JlYWQgbWNwLXJlZ2lzdHJ5XC93cml0ZSIsImF1dGhfdGltZSI6MTc1NzUzOTQ2NywiaXNzIjoiaHR0cHM6XC9cL2NvZ25pdG8taWRwLmV1LWNlbnRyYWwtMS5hbWF6b25hd3MuY29tXC9ldS1jZW50cmFsLTFfUGFWdGprOGR0IiwiZXhwIjoxNzU3NTQzMDY3LCJpYXQiOjE3NTc1Mzk0NjcsInZlcnNpb24iOj

tool_name = "dataprocessing_mcp_server"  # Fixed: Remove 'aws_' prefix to meet validation rules
additional_managed_policies = [
    'AWSGlueConsoleFullAccess',  # Fixed: Use console access instead of service role
    'AmazonS3FullAccess', 
    'AmazonEMRFullAccessPolicy_v2',
    'AmazonAthenaFullAccess'
]
print(f"Creating IAM role for {tool_name}...")
agentcore_iam_role = create_agentcore_role(
    agent_name=tool_name, 
    managed_policies=additional_managed_policies
)
print(f"IAM role created ‚úì")
print(f"Role ARN: {agentcore_iam_role['Role']['Arn']}")

In [8]:
tool_name = "dataproc_mcp_ibcv2"
additional_managed_policies = [
    'AWSGlueConsoleFullAccess',
    'AmazonS3FullAccess', 
    'AmazonEMRFullAccessPolicy_v2',
    'AmazonAthenaFullAccess'
]
print(f"Creating IAM role for {tool_name}...")
agentcore_iam_role = create_agentcore_role(
    agent_name=tool_name, 
    managed_policies=additional_managed_policies
)
print(f"IAM role created ‚úì")
print(f"Role ARN: {agentcore_iam_role['Role']['Arn']}")

Creating IAM role for dataproc_mcp_ibcv2...
attaching inline role policy agentcore-dataproc_mcp_ibcv2-role
Attaching 4 managed policies...
Attaching managed policy: AWSGlueConsoleFullAccess
‚úÖ Successfully attached AWSGlueConsoleFullAccess
Attaching managed policy: AmazonS3FullAccess
‚úÖ Successfully attached AmazonS3FullAccess
Attaching managed policy: AmazonEMRFullAccessPolicy_v2
‚úÖ Successfully attached AmazonEMRFullAccessPolicy_v2
Attaching managed policy: AmazonAthenaFullAccess
‚úÖ Successfully attached AmazonAthenaFullAccess
IAM role created ‚úì
Role ARN: arn:aws:iam::241533163649:role/agentcore-dataproc_mcp_ibcv2-role


In [9]:
import boto3
import json

iam_client = boto3.client('iam')
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()["Account"]
region = boto3.Session().region_name

role_name = 'agentcore-dataproc_mcp_ibcv2-role'  # Updated for new tool name

# Define the correct trust policy for bedrock-agentcore
trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AssumeRolePolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock-agentcore.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": account_id
                },
                "ArnLike": {
                    "aws:SourceArn": f"arn:aws:bedrock-agentcore:{region}:{account_id}:*"
                }
            }
        }
    ]
}

print(f"Updating trust policy for role: {role_name}")
print(f"Account ID: {account_id}")
print(f"Region: {region}")

try:
    # Update the assume role policy
    iam_client.update_assume_role_policy(
        RoleName=role_name,
        PolicyDocument=json.dumps(trust_policy)
    )
    print("‚úÖ Trust policy updated successfully")
    
    # Verify the update
    role_info = iam_client.get_role(RoleName=role_name)
    print("\n‚úÖ Role verification:")
    print(f"  - Role ARN: {role_info['Role']['Arn']}")
    print(f"  - Trust policy updated: Yes")
    
    # Wait for propagation
    import time
    print("\n‚è≥ Waiting 30 seconds for IAM changes to propagate...")
    time.sleep(30)
    print("‚úÖ IAM propagation wait completed")
    
    print("\n‚úÖ Role is now ready. You can retry the launch operation.")
    
except Exception as e:
    print(f"‚ùå Error updating trust policy: {e}")

Updating trust policy for role: agentcore-dataproc_mcp_ibcv2-role
Account ID: 241533163649
Region: eu-central-1
‚úÖ Trust policy updated successfully

‚úÖ Role verification:
  - Role ARN: arn:aws:iam::241533163649:role/agentcore-dataproc_mcp_ibcv2-role
  - Trust policy updated: Yes

‚è≥ Waiting 30 seconds for IAM changes to propagate...
‚úÖ IAM propagation wait completed

‚úÖ Role is now ready. You can retry the launch operation.


In [10]:
import boto3
import json

iam_client = boto3.client('iam')
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()["Account"]
region = boto3.Session().region_name

role_name = 'agentcore-dataproc_mcp_ibcv2-role'  # Updated for new tool name

# Define the corrected ECR policy
ecr_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ECRAuthToken",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ECRImageAccess",
            "Effect": "Allow",
            "Action": [
                "ecr:BatchGetImage",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchCheckLayerAvailability"
            ],
            "Resource": [
                f"arn:aws:ecr:{region}:{account_id}:repository/bedrock-agentcore-dataprocessing_mcp_server",
                f"arn:aws:ecr:{region}:{account_id}:repository/bedrock-agentcore-dataprocessing_mcp_server/*"
            ]
        }
    ]
}

print(f"Adding ECR permissions to role: {role_name}")
print(f"Account ID: {account_id}")
print(f"Region: {region}")

try:
    # Add the ECR policy as an inline policy
    iam_client.put_role_policy(
        RoleName=role_name,
        PolicyName='ECRAccessPolicy',
        PolicyDocument=json.dumps(ecr_policy)
    )
    print("‚úÖ ECR permissions added successfully")
    
    # List all policies attached to the role
    inline_policies = iam_client.list_role_policies(RoleName=role_name)
    print(f"\nüìã Inline policies attached to role:")
    for policy in inline_policies['PolicyNames']:
        print(f"  - {policy}")
    
    # Wait for propagation
    import time
    print("\n‚è≥ Waiting 20 seconds for IAM changes to propagate...")
    time.sleep(20)
    print("‚úÖ IAM propagation wait completed")
    
    print("\n‚úÖ ECR permissions are now properly configured. You can retry the launch operation.")
    
except Exception as e:
    print(f"‚ùå Error adding ECR permissions: {e}")

Adding ECR permissions to role: agentcore-dataproc_mcp_ibcv2-role
Account ID: 241533163649
Region: eu-central-1
‚úÖ ECR permissions added successfully

üìã Inline policies attached to role:
  - AgentCorePolicy
  - ECRAccessPolicy

‚è≥ Waiting 20 seconds for IAM changes to propagate...
‚úÖ IAM propagation wait completed

‚úÖ ECR permissions are now properly configured. You can retry the launch operation.


## 5. AgentCore Runtime Configuration

Configure the AgentCore Runtime deployment using the Bedrock AgentCore Starter Toolkit:

### Configuration Parameters
- **Entrypoint**: `mcp-server.py` (our FastMCP wrapper with handler registration)
- **Execution Role**: The IAM role created above with data processing permissions
- **Requirements**: `dataprocessing-requirements.txt` with all dependencies
- **Protocol**: MCP (Model Context Protocol)
- **Authentication**: Custom JWT authorizer with Cognito

### Auto-Generated Resources
- Dockerfile optimized for the MCP server with data processing dependencies
- Amazon ECR repository for container storage
- AgentCore Runtime configuration with environment variables

The configuration validates that all required files exist before proceeding.

In [11]:
from bedrock_agentcore_starter_toolkit import Runtime
from boto3.session import Session
import time

boto_session = Session()
region = boto_session.region_name
print(f"Using AWS region: {region}")

required_files = ['mcp-server.py', 'dataprocessing-requirements.txt']
for file in required_files:
    if not os.path.exists(file):
        raise FileNotFoundError(f"Required file {file} not found")
print("All required files found ‚úì")

agentcore_runtime = Runtime()

auth_config = {
    "customJWTAuthorizer": {
        "allowedClients": [
            cognito_config['client_id']
        ],
        "discoveryUrl": cognito_config['discovery_url'],
    }
}

print("Configuring AgentCore Runtime...")
response = agentcore_runtime.configure(
    entrypoint="mcp-server.py",
    execution_role=agentcore_iam_role['Role']['Arn'],
    auto_create_ecr=True,
    requirements_file="dataprocessing-requirements.txt",
    region=region,
    authorizer_configuration=auth_config,
    protocol="MCP",
    agent_name=tool_name
)
print("Configuration completed ‚úì")

Entrypoint parsed: file=/Users/dohtem/Downloads/claude/ibc2025-acme-corp-bedrockagentcore-chatbot/aws-mcp-server-agentcore/mcp-server.py, bedrock_agentcore_name=mcp-server
Configuring BedrockAgentCore agent: dataproc_mcp_ibcv2


Using AWS region: eu-central-1
All required files found ‚úì
Configuring AgentCore Runtime...


Generated Dockerfile: /Users/dohtem/Downloads/claude/ibc2025-acme-corp-bedrockagentcore-chatbot/aws-mcp-server-agentcore/Dockerfile
Generated .dockerignore: /Users/dohtem/Downloads/claude/ibc2025-acme-corp-bedrockagentcore-chatbot/aws-mcp-server-agentcore/.dockerignore
Changing default agent from 'dataproc_mcp_ibc' to 'dataproc_mcp_ibcv2'
Bedrock AgentCore configured: /Users/dohtem/Downloads/claude/ibc2025-acme-corp-bedrockagentcore-chatbot/aws-mcp-server-agentcore/.bedrock_agentcore.yaml


Configuration completed ‚úì


## 6. Deployment to AgentCore Runtime

Launch the MCP server to AgentCore Runtime. This process:

### Build and Deploy Steps
1. **Container Build**: Creates Docker image from the generated Dockerfile with data processing dependencies
2. **ECR Push**: Uploads the container to Amazon ECR
3. **Runtime Creation**: Deploys the AgentCore Runtime with comprehensive permissions
4. **Service Registration**: Registers the MCP server endpoint with 25+ data processing tools

### Expected Outputs
- Agent ARN: Unique identifier for the deployed runtime
- Agent ID: Short identifier for management operations
- ECR URI: Container image location

**Note**: This process typically takes 8-12 minutes due to the comprehensive dependencies and handler registrations.

In [12]:
print("Launching MCP server to AgentCore Runtime...")
print("This may take several minutes due to data processing dependencies...")
launch_result = agentcore_runtime.launch()
print("Launch completed ‚úì")
print(f"Agent ARN: {launch_result.agent_arn}")
print(f"Agent ID: {launch_result.agent_id}")

üöÄ CodeBuild mode: building in cloud (RECOMMENDED - DEFAULT)
   ‚Ä¢ Build ARM64 containers in the cloud with CodeBuild
   ‚Ä¢ No local Docker required
üí° Available deployment modes:
   ‚Ä¢ runtime.launch()                           ‚Üí CodeBuild (current)
   ‚Ä¢ runtime.launch(local=True)                 ‚Üí Local development
   ‚Ä¢ runtime.launch(local_build=True)           ‚Üí Local build + cloud deploy (NEW)
Starting CodeBuild ARM64 deployment for agent 'dataproc_mcp_ibcv2' to account 241533163649 (eu-central-1)
Starting CodeBuild ARM64 deployment for agent 'dataproc_mcp_ibcv2' to account 241533163649 (eu-central-1)
Setting up AWS resources (ECR repository, execution roles)...
Getting or creating ECR repository for agent: dataproc_mcp_ibcv2


Launching MCP server to AgentCore Runtime...
This may take several minutes due to data processing dependencies...
Repository doesn't exist, creating new ECR repository: bedrock-agentcore-dataproc_mcp_ibcv2


‚úÖ ECR repository available: 241533163649.dkr.ecr.eu-central-1.amazonaws.com/bedrock-agentcore-dataproc_mcp_ibcv2
Using execution role from config: arn:aws:iam::241533163649:role/agentcore-dataproc_mcp_ibcv2-role
‚úÖ Execution role validation passed: arn:aws:iam::241533163649:role/agentcore-dataproc_mcp_ibcv2-role
Preparing CodeBuild project and uploading source...
Getting or creating CodeBuild execution role for agent: dataproc_mcp_ibcv2
Role name: AmazonBedrockAgentCoreSDKCodeBuild-eu-central-1-183e5f3ed3
CodeBuild role doesn't exist, creating new role: AmazonBedrockAgentCoreSDKCodeBuild-eu-central-1-183e5f3ed3
Creating IAM role: AmazonBedrockAgentCoreSDKCodeBuild-eu-central-1-183e5f3ed3
‚úì Role created: arn:aws:iam::241533163649:role/AmazonBedrockAgentCoreSDKCodeBuild-eu-central-1-183e5f3ed3
Attaching inline policy: CodeBuildExecutionPolicy to role: AmazonBedrockAgentCoreSDKCodeBuild-eu-central-1-183e5f3ed3
‚úì Policy attached: CodeBuildExecutionPolicy
Waiting for IAM role propaga

Launch completed ‚úì
Agent ARN: arn:aws:bedrock-agentcore:eu-central-1:241533163649:runtime/dataproc_mcp_ibcv2-1MQhok269k
Agent ID: dataproc_mcp_ibcv2-1MQhok269k


## 7. Runtime Status Monitoring

Monitor the AgentCore Runtime deployment status:

### Status States
- **CREATING**: Runtime is being deployed
- **READY**: Runtime is operational and ready to serve requests
- **CREATE_FAILED**: Deployment failed
- **UPDATE_FAILED**: Update operation failed
- **DELETE_FAILED**: Deletion operation failed

The monitoring loop checks status every 10 seconds until reaching a terminal state. Only proceed to testing when status is **READY**.

In [13]:
status_response = agentcore_runtime.status()
status = status_response.endpoint['status']
print(f"Initial status: {status}")

end_status = ['READY', 'CREATE_FAILED', 'DELETE_FAILED', 'UPDATE_FAILED']
while status not in end_status:
    print(f"Status: {status} - waiting...")
    time.sleep(10)
    status_response = agentcore_runtime.status()
    status = status_response.endpoint['status']

if status == 'READY':
    print("‚úì AgentCore Runtime is READY!")
else:
    print(f"‚ö† AgentCore Runtime status: {status}")

Retrieved Bedrock AgentCore status for: dataproc_mcp_ibcv2


Initial status: READY
‚úì AgentCore Runtime is READY!


In [14]:
import boto3
import json

ssm_client = boto3.client('ssm', region_name=region)
secrets_client = boto3.client('secretsmanager', region_name=region)

# Store Cognito credentials in Secrets Manager with the actual path used in the client
try:
    cognito_credentials_response = secrets_client.create_secret(
        Name='mcp/aws_dataprocessing_server-ibc/cognito/credentials',
        Description='Cognito credentials for AWS Data Processing MCP server',
        SecretString=json.dumps(cognito_config)
    )
    print("‚úì Cognito credentials stored in Secrets Manager")
except secrets_client.exceptions.ResourceExistsException:
    secrets_client.update_secret(
        SecretId='mcp/aws_dataprocessing_server-ibc/cognito/credentials',
        SecretString=json.dumps(cognito_config)
    )
    print("‚úì Cognito credentials updated in Secrets Manager")

# Store the actual agent ARN in Parameter Store
agent_arn_response = ssm_client.put_parameter(
    Name='/mcp/aws_dataprocessing_server-ibc/runtime/agent_arn',
    Value=launch_result.agent_arn,
    Type='String',
    Description='Agent ARN for AWS Data Processing MCP server',
    Overwrite=True
)
print("‚úì Agent ARN stored in Parameter Store")

print("\nConfiguration stored successfully!")
print(f"Agent ARN: {launch_result.agent_arn}")
print(f"Parameter path: /mcp/aws_dataprocessing_server-ibc/runtime/agent_arn")
print(f"Secret path: mcp/aws_dataprocessing_server-ibc/cognito/credentials")

‚úì Cognito credentials updated in Secrets Manager
‚úì Agent ARN stored in Parameter Store

Configuration stored successfully!
Agent ARN: arn:aws:bedrock-agentcore:eu-central-1:241533163649:runtime/dataproc_mcp_ibcv2-1MQhok269k
Parameter path: /mcp/aws_dataprocessing_server-ibc/runtime/agent_arn
Secret path: mcp/aws_dataprocessing_server-ibc/cognito/credentials


In [15]:
%%writefile mcp_client_remote.py
import asyncio
import boto3
import json
import sys
from boto3.session import Session

from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

async def main():
    boto_session = Session()
    region = boto_session.region_name
    
    print(f"Using AWS region: {region}")
    
    try:
        ssm_client = boto3.client('ssm', region_name=region)
        agent_arn_response = ssm_client.get_parameter(Name='/mcp/aws_dataprocessing_server-ibc/runtime/agent_arn')
        agent_arn = agent_arn_response['Parameter']['Value']
        print(f"Retrieved Agent ARN: {agent_arn}")

        secrets_client = boto3.client('secretsmanager', region_name=region)
        response = secrets_client.get_secret_value(SecretId='mcp/aws_dataprocessing_server-ibc/cognito/credentials')
        secret_value = response['SecretString']
        parsed_secret = json.loads(secret_value)
        bearer_token = parsed_secret['bearer_token']
        print("‚úì Retrieved bearer token from Secrets Manager")
        
    except Exception as e:
        print(f"Error retrieving credentials: {e}")
        sys.exit(1)
    
    if not agent_arn or not bearer_token:
        print("Error: AGENT_ARN or BEARER_TOKEN not retrieved properly")
        sys.exit(1)
    
    encoded_arn = agent_arn.replace(':', '%3A').replace('/', '%2F')
    mcp_url = f"https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{encoded_arn}/invocations?qualifier=DEFAULT"
    headers = {
        "authorization": f"Bearer {bearer_token}",
        "Content-Type": "application/json",
        "Accept": "application/json, text/event-stream"
    }
    
    print(f"\nConnecting to: {mcp_url}")

    try:
        async with streamablehttp_client(mcp_url, headers, terminate_on_close=False) as (
            read_stream,
            write_stream,
            _,
        ):
            async with ClientSession(read_stream, write_stream) as session:
                print("\nüîÑ Initializing MCP session...")
                await session.initialize()
                
                tool_result = await session.list_tools()
                
                # Group tools by service
                glue_tools = [tool for tool in tool_result.tools if 'glue' in tool.name]
                emr_tools = [tool for tool in tool_result.tools if 'emr' in tool.name]
                athena_tools = [tool for tool in tool_result.tools if 'athena' in tool.name]
                common_tools = [tool for tool in tool_result.tools if tool not in glue_tools + emr_tools + athena_tools]
                
                print("\nüìã Available MCP Tools:")
                print("=" * 60)
                
                print(f"\nüîß AWS Glue Tools ({len(glue_tools)} tools):")
                for tool in glue_tools:
                    print(f"   ‚Ä¢ {tool.name}")
                    
                print(f"\nüöÄ Amazon EMR Tools ({len(emr_tools)} tools):")
                for tool in emr_tools:
                    print(f"   ‚Ä¢ {tool.name}")
                    
                print(f"\nüìä Amazon Athena Tools ({len(athena_tools)} tools):")
                for tool in athena_tools:
                    print(f"   ‚Ä¢ {tool.name}")
                    
                print(f"\nüõ†Ô∏è  Common Resource Tools ({len(common_tools)} tools):")
                for tool in common_tools:
                    print(f"   ‚Ä¢ {tool.name}")
                
                print(f"\n‚úÖ Successfully connected to MCP server!")
                print(f"Found {len(tool_result.tools)} AWS Data Processing tools available.")
                print(f"Server is ready for comprehensive data processing workflows across Glue, EMR, and Athena.")
                
    except Exception as e:
        print(f"‚ùå Error connecting to MCP server: {e}")
        sys.exit(1)

if __name__ == "__main__":
    asyncio.run(main())

Overwriting mcp_client_remote.py


In [16]:
print("Testing deployed MCP server...")
print("=" * 60)
!uv run python3 mcp_client_remote.py

Testing deployed MCP server...
Using AWS region: eu-central-1
Retrieved Agent ARN: arn:aws:bedrock-agentcore:eu-central-1:241533163649:runtime/dataproc_mcp_ibcv2-1MQhok269k
‚úì Retrieved bearer token from Secrets Manager

Connecting to: https://bedrock-agentcore.eu-central-1.amazonaws.com/runtimes/arn%3Aaws%3Abedrock-agentcore%3Aeu-central-1%3A241533163649%3Aruntime%2Fdataproc_mcp_ibcv2-1MQhok269k/invocations?qualifier=DEFAULT

üîÑ Initializing MCP session...
^C
Error in sys.excepthook:
Traceback (most recent call last):
  File "/Users/dohtem/Downloads/claude/ibc2025-acme-corp-bedrockagentcore-chatbot/aws-mcp-server-agentcore/.venv/lib/python3.10/site-packages/exceptiongroup/_formatting.py", line 71, in exceptiongroup_excepthook
    sys.stderr.write("".join(traceback.format_exception(etype, value, tb)))
KeyboardInterrupt

Original exception was:
Traceback (most recent call last):
  File "/Users/dohtem/Downloads/claude/ibc2025-acme-corp-bedrockagentcore-chatbot/aws-mcp-server-agentcore