- Overview
- Architecture
- Quick Start
- Infrastructure Deployment
- Directory Structure
- How It Works
- MCP Servers
- Available Tools
- Data Flow and Routing
- Security
- Extension Patterns
- Configuration
- Testing
- Code Quality
- Monitoring and Debugging
- Contributing
- License
A multi-agent computer vision framework that combines Strands Agents and Model Context Protocol (MCP) with AWS services to perform image analysis, video understanding, object segmentation, background removal, and semantic image search.
The system uses an orchestrator agent that delegates tasks to specialist agents, each backed by dedicated MCP servers exposing domain-specific tools. This separation of concerns allows each component to scale, evolve, and be tested independently.
Key capabilities:
- Image analysis and description via Amazon Bedrock (Claude, Nova)
- Object detection and label recognition via Amazon Rekognition
- Object segmentation via Meta's Segment Anything Model (SAM)
- Background removal via rembg
- Video analysis via Amazon Nova
- Semantic image search via Amazon OpenSearch Serverless with Titan embeddings
- Streamlit web UI with media upload and conversational interaction
Prerequisites:
- Python 3.11+
- AWS account with access to Bedrock, S3, Rekognition, and (optionally) OpenSearch Serverless
- AWS credentials configured (via environment variables or assumed role)
- A VPC with subnets and security groups (required for OpenSearch Serverless VPC endpoint)
aws cloudformation deploy \
--template-file cfn.yaml \
--stack-name cv-mcp-server \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides \
CollectionName=collection \
VPCId=vpc-xxxxxxxx \
SubnetIds=subnet-aaa,subnet-bbb \
SecurityGroupIds=sg-xxxxxxxxRetrieve the generated credentials from Secrets Manager and export them:
eval "$(aws secretsmanager get-secret-value \
--secret-id cv-mcp-server-unix-credentials \
--query SecretString --output text)"Or create a .env file manually:
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_SESSION_TOKEN=your_session_token
AWS_REGION=us-east-1
BUCKET_NAME=your-cv-bucket
OPENSEARCH_ENDPOINT=your-endpoint.aoss.amazonaws.compython3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtstreamlit run application/app.pyThe cfn.yaml CloudFormation template provisions:
- S3 bucket with encryption at rest (AES-256),
versioning, public access block, a bucket policy
enforcing HTTPS-only access
(
DenyInsecureTransport), server access logging, and a lifecycle policy that expires uploaded media after 30 days - S3 access logs bucket with 90-day retention for audit trail
- IAM role with scoped permissions for S3, Bedrock, Rekognition, OpenSearch, and CloudWatch Logs
- IAM user with assume-role capability for local development
- OpenSearch Serverless collection (vector search) with encryption, VPC-restricted network policy, and data access policies
- OpenSearch Serverless VPC endpoint for private network access
- Secrets Manager secrets containing ready-to-use credential export commands for Unix/macOS and Windows
The IAM policies follow least-privilege principles:
- S3 actions scoped to the specific bucket
- Bedrock
InvokeModelscoped to specific model ARNs in the deployment region only, enforced viaaws:RequestedRegioncondition - OpenSearch
APIAccessAllandUpdateCollectionscoped to the specific collection ARN; onlyBatchGetCollectionandListCollectionsuseResource: "*"(required by AWS) - CloudWatch Logs scoped to the
/cv-mcp-server/*log group prefix - Rekognition
DetectLabels(requiresResource: "*"per AWS documentation)
| Parameter | Required | Description |
|---|---|---|
CollectionName |
No | OpenSearch collection name (default: collection) |
VPCId |
Yes | VPC ID for OpenSearch VPC endpoint |
SubnetIds |
Yes | Subnet IDs for OpenSearch VPC endpoint |
SecurityGroupIds |
Yes | Security group IDs for OpenSearch VPC endpoint |
├── application/
│ ├── app.py # Streamlit UI with media upload
│ ├── chat.py # Multi-agent orchestration
│ ├── info.py # Model configuration
│ ├── prompts/
│ │ ├── __init__.py
│ │ ├── cv_agent.py # CV specialist prompt
│ │ └── interaction_agent.py # Orchestrator prompt
│ ├── aws_cv_mcp_server/
│ │ ├── __init__.py
│ │ ├── server.py # CV MCP server entry
│ │ ├── cv_tools.py # CV tool implementations
│ │ ├── bedrock_utils.py # Bedrock invocation utils
│ │ ├── connections.py # AWS client management
│ │ ├── models.py # Pydantic response models
│ │ └── scanner.py # Image scanning utils
│ └── image-opensearch-server/
│ ├── src/
│ │ ├── server.py # OpenSearch MCP server
│ │ └── util.py # Client factories
│ ├── test_client.py # MCP client tests
│ ├── test_aoss_connection.py
│ ├── requirements.txt
│ └── README.md
├── tests/
│ ├── __init__.py
│ ├── run_tests.py
│ ├── test_config.py
│ ├── test_cv_tools.py
│ └── test_cv_integration.py
├── assets/
│ ├── test_image.png
│ ├── test_image_pexels.jpg
│ ├── test_video.mp4
│ └── AmazonEmber_Lt.ttf
├── docs/
│ ├── architecture.png
│ └── architecture.xml
├── cfn.yaml # CloudFormation template
├── requirements.txt # Pinned Python dependencies
├── pyproject.toml
├── .pylintrc
├── .python-version
├── .env # Credentials (not committed)
├── .gitignore
├── LICENSE
└── README.md
The framework uses a three-tier agent architecture:
- The Interaction Agent (orchestrator) receives user queries, determines which specialist to invoke, and coordinates multi-step workflows.
- The CV Agent handles S3-based image and video operations: cropping, label detection, description, background removal, SAM segmentation, and video analysis.
- The Image OpenSearch Agent handles URL-based images: generating descriptions, creating multimodal embeddings, ingesting into OpenSearch, and running similarity searches.
Agents are built with Strands Agents and
communicate with tools via MCP. Each agent's behavior
is defined by a system prompt in
application/prompts/.
MCP clients are opened once per request and distributed to specialist agents via closures, avoiding Streamlit session-state threading issues:
def create_specialist_tools_with_clients(
cv_client, opensearch_client
):
@tool
def cv_agent(query: str) -> str:
return cv_agent_impl(query, cv_client)
@tool
def image_opensearch_agent(query: str) -> str:
return image_opensearch_agent_impl(
query, opensearch_client
)
return [cv_agent, image_opensearch_agent]Background agent threads store image references in a thread-safe global list. After the agent response completes, the main Streamlit thread retrieves them for display. This avoids Streamlit's "missing ScriptRunContext" errors.
Runs as a subprocess via stdio transport:
.venv/bin/python -m application.aws_cv_mcp_server.server
Exposes tools for image/video processing backed by S3, Bedrock, and Rekognition. Includes filename sanitization to prevent path traversal (CWE-22), generic error messages to prevent information disclosure (CWE-209), and rate limiting on expensive operations.
Runs as a subprocess with its own virtual environment:
python -m src.server
Exposes tools for image description, embedding generation, and OpenSearch vector search. Includes URL validation to prevent SSRF (CWE-918) — blocks private IPs, loopback, link-local, reserved ranges, and the AWS metadata service. HTTP redirects are disabled. Bulk ingest operations are rate-limited.
| Tool | Description | AWS Service |
|---|---|---|
describe_image |
Analyze image content | Bedrock (Claude) |
detect_labels |
Detect objects with bounding boxes | Rekognition |
crop_bounding_box |
Extract a region from an image | S3 + Pillow |
remove_background |
Remove image background | rembg (ONNX) |
segment_anything |
Segment all objects (SAM) | SAM + PyTorch |
analyze_video |
Analyze video content | Bedrock (Nova) |
| Tool | Description | AWS Service |
|---|---|---|
generate_image_description |
Describe image from URL | Bedrock (Claude) |
generate_multimodal_embedding |
Create vector embedding | Bedrock (Titan) |
ingest_image_to_opensearch |
Describe, embed, index | Bedrock + OpenSearch |
query_images_by_text |
Search images by text | Bedrock + OpenSearch |
query_images_by_image |
Find similar images | Bedrock + OpenSearch |
bulk_ingest_images |
Batch ingest images | Bedrock + OpenSearch |
| Tool | Description |
|---|---|
ui_show_image |
Display a single image with caption |
ui_show_images |
Display multiple images in a grid |
The orchestrator routes requests based on input type:
- Input contains
http://orhttps://→ Image OpenSearch Agent - Input references uploaded/S3 files → CV Agent
- Display requests follow a two-step process: analyze content first, then display with AI-generated caption
User: "Crop all people from photo.jpg"
→ Orchestrator detects S3 file → CV Agent
→ detect_labels("photo.jpg") → "Person" boxes
→ crop_bounding_box("photo.jpg", bbox)
→ ui_show_images(["cropped_person_abc123.jpg"])
User: "Index this image: https://example.com/cat.jpg"
→ Orchestrator detects URL → OpenSearch Agent
→ ingest_image_to_opensearch(url, "my-index")
→ Returns: description, embedding, document ID
The codebase includes the following security hardening measures:
- SSRF prevention (CWE-918): all URL-fetching
functions validate URLs against private IP ranges,
loopback, link-local, reserved addresses, and the
AWS metadata endpoint (
169.254.169.254). HTTP redirects are disabled (follow_redirects=False). - Path traversal prevention (CWE-22): S3 key
construction uses
os.path.basename()and regex sanitization to strip traversal sequences. - File upload validation (CWE-434):
app.pychecks file magic bytes against expected image/video signatures before accepting uploads. - Video format allowlist: video file extensions
are validated against
ALLOWED_VIDEO_FORMATSbefore being sent to the Bedrock API.
- Generic error messages (CWE-209): all tool functions return generic error messages to callers. Detailed errors including S3 paths, bucket names, and stack traces are logged server-side only.
- No credential logging: AWS credentials are never logged, even at debug level.
- Expensive operation throttling:
analyze_video(5 calls/min),segment_anything(3 calls/min), andbulk_ingest_images(3 calls/min) are rate-limited via token-bucket limiters to prevent cost abuse.
- Prompt injection protection: system prompts in both agents include explicit rules to reject embedded instructions from user content.
- Tool consent gating:
BYPASS_TOOL_CONSENTis only enabled when theENVIRONMENTvariable is explicitly set to a non-production value. Defaults to production (consent required).
- SAM model hash verification: downloaded SAM model weights are verified against known SHA256 hashes before loading. Files with mismatched hashes are deleted and rejected.
- S3 transport encryption: bucket policy denies
all requests where
aws:SecureTransportisfalse. - S3 access logging: all bucket access is logged to a dedicated access logs bucket with 90-day retention.
- S3 lifecycle policy: uploaded media under the
mcp/prefix expires after 30 days; noncurrent versions expire after 7 days. - OpenSearch VPC restriction: the OpenSearch Serverless collection is accessible only via a VPC endpoint — public access is disabled.
- IAM least privilege: Bedrock access is
restricted to the deployment region via
aws:RequestedRegioncondition. CloudWatch Logs access is scoped to/cv-mcp-server/*. - Pinned dependencies: all Python dependencies
in
requirements.txtare pinned to exact versions to prevent supply-chain attacks.
- Implement in
application/aws_cv_mcp_server/cv_tools.py:
async def new_tool(param: str) -> Dict[str, Any]:
"""Tool description for agent understanding."""
# implementation
return {"status": "success", "result": result}- Register in
application/aws_cv_mcp_server/server.py:
from .cv_tools import new_tool
@mcp.tool(name='new_tool')
async def mcp_new_tool(param: str) -> Dict:
return await new_tool(param)- Update the CV agent prompt in
application/prompts/cv_agent.pyto reference the new tool.
- Create
application/prompts/new_agent.pywithSYSTEM_PROMPT - Implement
new_agent_impl()inchat.pyfollowing thecv_agent_impl()pattern - Add to
create_specialist_tools_with_clients()as a new@tool - Update the orchestrator prompt in
interaction_agent.pywith delegation instructions
| Variable | Required | Description |
|---|---|---|
AWS_ACCESS_KEY_ID |
Yes | AWS access key |
AWS_SECRET_ACCESS_KEY |
Yes | AWS secret key |
AWS_SESSION_TOKEN |
No | Session token (assumed roles) |
AWS_REGION |
Yes | AWS region (default: us-east-1) |
BUCKET_NAME |
Yes | S3 bucket for media storage |
OPENSEARCH_ENDPOINT |
No | OpenSearch Serverless endpoint |
ENVIRONMENT |
No | Set to non-production to enable tool consent bypass |
Models are configured in application/info.py with
regional availability. The default is Claude 3.7
Sonnet. Available options in the Streamlit sidebar:
- Claude 4 Sonnet (with interleaved thinking)
- Claude 3.7 Sonnet (with extended thinking)
- Claude 3.5 Sonnet
- Claude 3.5 Haiku
Video analysis uses Amazon Nova models (Lite, Pro, Premier).
# Run all tests
AWS_REGION=us-east-1 BUCKET_NAME=test-bucket \
python -m pytest tests/ -v
# Or use the test runner
python tests/run_tests.pyTests mock AWS services for fast, offline execution.
Test assets are in assets/. The AWS_REGION and
BUCKET_NAME environment variables must be set for
test collection to succeed.
python -m pylint application/ tests/ --rcfile=.pylintrcTarget: maintain score >= 9.5/10. The project follows
PEP 8 with project-specific adjustments defined in
.pylintrc.
All application modules use Python's logging module
with structured output to stderr:
logging.basicConfig(
level=logging.INFO,
format="%(filename)s:%(lineno)d | %(message)s",
handlers=[logging.StreamHandler(sys.stderr)],
)Enable debug-level logging for detailed tracing:
logging.basicConfig(level=logging.DEBUG)Common issues:
- MCP connection failures: check virtual environment
paths in
chat.pyand verify AWS credentials - Image display issues: verify S3 bucket name and permissions
- Model errors: confirm Bedrock model access is enabled in your region
- Rate limit errors: expensive operations
(
analyze_video,segment_anything,bulk_ingest_images) are throttled — wait and retry
- Run tests:
python -m pytest tests/ -v - Run linting:
python -m pylint application/ tests/ --rcfile=.pylintrc - Follow PEP 8, add docstrings to public functions, use type hints
- Update this README when adding features
- Update agent prompts when changing tool behavior
This project is licensed under MIT-0. See LICENSE for details.
