# Agentic Use Cases with Mistral

This notebook explores advanced agentic use cases and multi-agent workflow patterns using Mistral models on AWS. You'll learn how to build sophisticated security investigation systems that go beyond simple agent calls to create coordinated, multi-agent workflows with specialized roles and intelligent orchestration.

## Overview

This notebook demonstrates how to build intelligent security analysis systems using **Strands Agents** framework with Mistral's **Pixtral Large** model on Amazon Bedrock. 

### What You'll Build:

#### 1. Single-Agent Security Analysis
- Analyze CloudWatch logs for security incidents
- Review AWS SecurityHub compliance findings
- Detect performance anomalies in metrics data
- Correlate events across multiple data sources
- Provide actionable remediation recommendations

#### 2. Multi-Agent Workflow Orchestration
- **Triage Agent**: Routes investigations to appropriate specialists
- **Log Analysis Agent**: Focuses on CloudWatch security events
- **Compliance Agent**: Reviews SecurityHub findings
- **Metrics Agent**: Detects performance anomalies
- **Remediation Agent**: Synthesizes findings into action plans

### Key Capabilities:

- **Tool Functions**: Agents use decorated Python functions with `@tool` to query data
- **Sequential Orchestration**: Agents build upon each other's findings
- **Conditional Routing**: Workflow adapts based on investigation type
- **State Management**: Track findings across multiple agent executions
- **Synthesis**: Final agent combines insights from all specialists

This notebook uses tool functions to query dummy data (CloudWatch logs, SecurityHub findings, and time-series metrics) to simulate real-world security analysis workflows.

## Part 1: Single-Agent Security Analysis with Pixtral Large and Strands Agents

This section demonstrates using Pixtral Large with AWS Bedrock and the **Strands Agents** framework to create a single security analysis agent that can investigate security incidents, detect anomalies, and provide actionable insights.

### What You'll Learn in Part 1:
- Creating realistic dummy data for CloudWatch logs and SecurityHub findings
- Setting up Strands Agents with Pixtral Large (us.mistral.pixtral-large-2502-v1:0)
- Implementing tool functions with the `@tool` decorator
- Running single-agent security investigations
- Tracking execution time and Bedrock API usage

### What's Coming in Part 2:
After mastering single-agent analysis, you'll learn to build **multi-agent workflows** with specialized agents, conditional routing, and orchestrated synthesis of findings.

### Setup and Imports

In [None]:
import boto3
import json
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from typing import Dict, List, Any
import warnings
warnings.filterwarnings('ignore')

# Strands Agents imports
from strands import Agent
from strands.models import BedrockModel
from strands.tools import tool

print("✅ Setup complete!")

### Create Dummy CloudWatch Log Data

We'll create realistic CloudWatch log entries with various events including failed authentications, application errors, and security incidents.

In [None]:
# Create dummy CloudWatch log entries
base_time = datetime.now() - timedelta(hours=2)

cloudwatch_logs = [
    {
        "timestamp": (base_time + timedelta(minutes=5)).isoformat(),
        "level": "ERROR",
        "message": "Failed authentication attempt for user: admin",
        "source_ip": "185.220.101.47",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
        "request_id": "req-001",
        "service": "auth-service",
        "error_code": "AUTH_FAILED",
        "attempt_count": 3
    },
    {
        "timestamp": (base_time + timedelta(minutes=6)).isoformat(),
        "level": "ERROR",
        "message": "Failed authentication attempt for user: admin",
        "source_ip": "185.220.101.47",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
        "request_id": "req-002",
        "service": "auth-service",
        "error_code": "AUTH_FAILED",
        "attempt_count": 4
    },
    {
        "timestamp": (base_time + timedelta(minutes=7)).isoformat(),
        "level": "CRITICAL",
        "message": "Account locked due to multiple failed authentication attempts",
        "source_ip": "185.220.101.47",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
        "request_id": "req-003",
        "service": "auth-service",
        "error_code": "ACCOUNT_LOCKED",
        "username": "admin"
    },
    {
        "timestamp": (base_time + timedelta(minutes=15)).isoformat(),
        "level": "ERROR",
        "message": "Database connection timeout",
        "source_ip": "10.0.1.45",
        "user_agent": "Internal-Service/1.0",
        "request_id": "req-004",
        "service": "api-gateway",
        "error_code": "DB_TIMEOUT",
        "latency_ms": 30000
    },
    {
        "timestamp": (base_time + timedelta(minutes=18)).isoformat(),
        "level": "WARNING",
        "message": "API rate limit exceeded",
        "source_ip": "203.0.113.42",
        "user_agent": "python-requests/2.28.0",
        "request_id": "req-005",
        "service": "api-gateway",
        "error_code": "RATE_LIMIT_EXCEEDED",
        "requests_count": 1502
    },
    {
        "timestamp": (base_time + timedelta(minutes=25)).isoformat(),
        "level": "CRITICAL",
        "message": "SQL injection attempt detected",
        "source_ip": "198.51.100.89",
        "user_agent": "curl/7.68.0",
        "request_id": "req-006",
        "service": "api-gateway",
        "error_code": "SECURITY_VIOLATION",
        "payload": "SELECT * FROM users WHERE id='1' OR '1'='1'--"
    },
    {
        "timestamp": (base_time + timedelta(minutes=32)).isoformat(),
        "level": "ERROR",
        "message": "S3 access denied",
        "source_ip": "10.0.2.67",
        "user_agent": "aws-sdk-python/1.26.0",
        "request_id": "req-007",
        "service": "data-processor",
        "error_code": "S3_ACCESS_DENIED",
        "bucket": "sensitive-data-bucket"
    },
    {
        "timestamp": (base_time + timedelta(minutes=45)).isoformat(),
        "level": "ERROR",
        "message": "Failed authentication attempt for user: root",
        "source_ip": "185.220.101.47",
        "user_agent": "Mozilla/5.0 (X11; Linux x86_64)",
        "request_id": "req-008",
        "service": "auth-service",
        "error_code": "AUTH_FAILED",
        "attempt_count": 1
    },
    {
        "timestamp": (base_time + timedelta(minutes=50)).isoformat(),
        "level": "WARNING",
        "message": "Unusual spike in 404 errors",
        "source_ip": "192.0.2.156",
        "user_agent": "Mozilla/5.0 (compatible; SomeBot/1.0)",
        "request_id": "req-009",
        "service": "web-frontend",
        "error_code": "NOT_FOUND",
        "error_count": 237
    },
    {
        "timestamp": (base_time + timedelta(minutes=60)).isoformat(),
        "level": "CRITICAL",
        "message": "Cryptocurrency mining script detected in uploaded file",
        "source_ip": "198.51.100.123",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
        "request_id": "req-010",
        "service": "file-upload-service",
        "error_code": "MALWARE_DETECTED",
        "file_hash": "a5f3d8b2e9c1a7f4b6e8d2c9a1f5b3e7"
    }
]

# Convert to DataFrame for easy viewing
df_logs = pd.DataFrame(cloudwatch_logs)
print(f"📊 Created {len(df_logs)} CloudWatch log entries\n")
print(df_logs[['timestamp', 'level', 'service', 'error_code', 'source_ip']].to_string(index=False))
print(f"\n✅ Dummy CloudWatch logs created successfully!")

### Create Dummy SecurityHub Findings

Create realistic AWS SecurityHub findings with various severity levels and security issues.

In [None]:
# Create dummy SecurityHub findings
security_hub_findings = [
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/IAM.1/finding/001",
        "title": "IAM.1 - IAM policies should not allow full '*:*' administrative privileges",
        "severity": "CRITICAL",
        "status": "ACTIVE",
        "resource_type": "AwsIamRole",
        "resource_id": "arn:aws:iam::123456789012:role/PowerUserRole",
        "description": "The IAM role PowerUserRole has a policy that grants full administrative privileges",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=5)).isoformat(),
        "last_observed": (base_time + timedelta(minutes=10)).isoformat(),
        "recommendation": "Apply least privilege principles and restrict permissions"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/S3.1/finding/002",
        "title": "S3.1 - S3 Block Public Access setting should be enabled",
        "severity": "HIGH",
        "status": "ACTIVE",
        "resource_type": "AwsS3Bucket",
        "resource_id": "arn:aws:s3:::sensitive-data-bucket",
        "description": "S3 bucket sensitive-data-bucket does not have Block Public Access enabled",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=3)).isoformat(),
        "last_observed": (base_time + timedelta(minutes=15)).isoformat(),
        "recommendation": "Enable S3 Block Public Access at the bucket level"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/EC2.2/finding/003",
        "title": "EC2.2 - VPC default security group should not allow inbound and outbound traffic",
        "severity": "HIGH",
        "status": "ACTIVE",
        "resource_type": "AwsEc2SecurityGroup",
        "resource_id": "sg-0abc123def456",
        "description": "Default security group allows unrestricted inbound traffic on port 22",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=7)).isoformat(),
        "last_observed": (base_time + timedelta(minutes=20)).isoformat(),
        "recommendation": "Remove all inbound and outbound rules from the default security group"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/RDS.1/finding/004",
        "title": "RDS.1 - RDS snapshots should be private",
        "severity": "CRITICAL",
        "status": "ACTIVE",
        "resource_type": "AwsRdsDbSnapshot",
        "resource_id": "arn:aws:rds:us-west-2:123456789012:snapshot:prod-db-snapshot-001",
        "description": "RDS snapshot prod-db-snapshot-001 is publicly accessible",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(hours=12)).isoformat(),
        "last_observed": (base_time + timedelta(minutes=25)).isoformat(),
        "recommendation": "Make the RDS snapshot private immediately"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/cis-aws-foundations-benchmark/v/1.4.0/2.1.5/finding/005",
        "title": "CIS 2.1.5 - Ensure that S3 buckets have server-side encryption enabled",
        "severity": "MEDIUM",
        "status": "ACTIVE",
        "resource_type": "AwsS3Bucket",
        "resource_id": "arn:aws:s3:::backup-bucket-prod",
        "description": "S3 bucket backup-bucket-prod does not have default encryption enabled",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=2)).isoformat(),
        "last_observed": (base_time + timedelta(minutes=30)).isoformat(),
        "recommendation": "Enable default server-side encryption for the S3 bucket"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/CloudTrail.1/finding/006",
        "title": "CloudTrail.1 - CloudTrail should be enabled and configured with at least one multi-Region trail",
        "severity": "HIGH",
        "status": "RESOLVED",
        "resource_type": "AwsAccount",
        "resource_id": "arn:aws:iam::123456789012:root",
        "description": "CloudTrail is not enabled in this account",
        "compliance_status": "PASSED",
        "first_observed": (base_time - timedelta(days=10)).isoformat(),
        "last_observed": (base_time - timedelta(days=1)).isoformat(),
        "recommendation": "CloudTrail has been enabled - no action needed"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/Lambda.1/finding/007",
        "title": "Lambda.1 - Lambda functions should prohibit public access",
        "severity": "CRITICAL",
        "status": "ACTIVE",
        "resource_type": "AwsLambdaFunction",
        "resource_id": "arn:aws:lambda:us-west-2:123456789012:function:DataProcessingFunction",
        "description": "Lambda function DataProcessingFunction allows public invocation",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(hours=6)).isoformat(),
        "last_observed": (base_time + timedelta(minutes=40)).isoformat(),
        "recommendation": "Remove the public access policy from the Lambda function"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/GuardDuty.1/finding/008",
        "title": "GuardDuty.1 - GuardDuty should be enabled",
        "severity": "HIGH",
        "status": "ACTIVE",
        "resource_type": "AwsAccount",
        "resource_id": "arn:aws:iam::123456789012:root",
        "description": "GuardDuty is not enabled in us-west-2 region",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=30)).isoformat(),
        "last_observed": (base_time + timedelta(minutes=45)).isoformat(),
        "recommendation": "Enable AWS GuardDuty for threat detection"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/KMS.1/finding/009",
        "title": "KMS.1 - IAM customer managed policies should not allow decryption actions on all KMS keys",
        "severity": "MEDIUM",
        "status": "ACTIVE",
        "resource_type": "AwsIamPolicy",
        "resource_id": "arn:aws:iam::123456789012:policy/DeveloperPolicy",
        "description": "IAM policy allows kms:Decrypt on all KMS keys",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=15)).isoformat(),
        "last_observed": (base_time + timedelta(minutes=50)).isoformat(),
        "recommendation": "Restrict KMS decrypt permissions to specific keys"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/EC2.8/finding/010",
        "title": "EC2.8 - EC2 instances should use IMDSv2",
        "severity": "HIGH",
        "status": "ACTIVE",
        "resource_type": "AwsEc2Instance",
        "resource_id": "i-0abc123def456789",
        "description": "EC2 instance is using IMDSv1 which is vulnerable to SSRF attacks",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=1)).isoformat(),
        "last_observed": (base_time + timedelta(minutes=55)).isoformat(),
        "recommendation": "Configure instance to require IMDSv2"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/SecretsManager.1/finding/011",
        "title": "SecretsManager.1 - Secrets Manager secrets should have automatic rotation enabled",
        "severity": "MEDIUM",
        "status": "ACTIVE",
        "resource_type": "AwsSecretsManagerSecret",
        "resource_id": "arn:aws:secretsmanager:us-west-2:123456789012:secret:prod/database/password",
        "description": "Secret prod/database/password does not have automatic rotation enabled",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=20)).isoformat(),
        "last_observed": (base_time + timedelta(hours=1)).isoformat(),
        "recommendation": "Enable automatic rotation for this secret"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/aws-foundational-security-best-practices/v/1.0.0/ELB.1/finding/012",
        "title": "ELB.1 - Application Load Balancer should be configured to redirect HTTP to HTTPS",
        "severity": "MEDIUM",
        "status": "ACTIVE",
        "resource_type": "AwsElbv2LoadBalancer",
        "resource_id": "arn:aws:elasticloadbalancing:us-west-2:123456789012:loadbalancer/app/prod-alb/abc123",
        "description": "Load balancer accepts HTTP traffic without redirecting to HTTPS",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=4)).isoformat(),
        "last_observed": (base_time + timedelta(hours=1, minutes=5)).isoformat(),
        "recommendation": "Configure HTTP to HTTPS redirect rule"
    },
    {
        "finding_id": "arn:aws:securityhub:us-west-2:123456789012:subscription/pci-dss/v/3.2.1/PCI.IAM.7/finding/013",
        "title": "PCI.IAM.7 - IAM user credentials unused for 90 days should be disabled",
        "severity": "LOW",
        "status": "ACTIVE",
        "resource_type": "AwsIamUser",
        "resource_id": "arn:aws:iam::123456789012:user/john.doe",
        "description": "IAM user john.doe has not been used for 127 days",
        "compliance_status": "FAILED",
        "first_observed": (base_time - timedelta(days=37)).isoformat(),
        "last_observed": (base_time + timedelta(hours=1, minutes=10)).isoformat(),
        "recommendation": "Disable or remove inactive IAM user"
    }
]

# Convert to DataFrame
df_findings = pd.DataFrame(security_hub_findings)
print(f"🔒 Created {len(df_findings)} SecurityHub findings\n")
print(df_findings[['title', 'severity', 'status', 'resource_type']].to_string(index=False))

# Show severity distribution
print(f"\n📊 Severity Distribution:")
print(df_findings['severity'].value_counts().to_string())
print(f"\n✅ Dummy SecurityHub findings created successfully!")

### Create Time-Series Metrics Data

Generate realistic time-series data for error rates and latency metrics showing normal patterns and anomalies.

In [None]:
# Generate time-series metrics data (last 2 hours, 5-minute intervals)
time_points = pd.date_range(end=datetime.now(), periods=24, freq='5T')

# Error rate (requests per minute) - baseline ~5, anomaly spike to 45
np.random.seed(42)
error_rates = []
for i, t in enumerate(time_points):
    if i < 10:  # Normal period
        error_rates.append(max(0, np.random.normal(5, 1.5)))
    elif i < 13:  # Anomaly spike (corresponds to attack time)
        error_rates.append(max(0, np.random.normal(45, 8)))
    else:  # Recovery period
        error_rates.append(max(0, np.random.normal(7, 2)))

# Latency (milliseconds) - baseline ~150ms, anomaly spike to ~2500ms
latencies = []
for i, t in enumerate(time_points):
    if i < 10:  # Normal period
        latencies.append(max(50, np.random.normal(150, 30)))
    elif i < 13:  # Anomaly spike
        latencies.append(max(500, np.random.normal(2500, 400)))
    elif i < 16:  # Recovery period
        latencies.append(max(200, np.random.normal(800, 150)))
    else:  # Back to normal
        latencies.append(max(50, np.random.normal(180, 35)))

# Create DataFrame
df_metrics = pd.DataFrame({
    'timestamp': time_points,
    'error_rate': error_rates,
    'latency_ms': latencies,
    'service': 'api-gateway'
})

print(f"📈 Created time-series metrics data\n")
print(df_metrics.to_string(index=False))

# Show statistics
print(f"\n📊 Metrics Summary:")
print(f"Error Rate - Mean: {df_metrics['error_rate'].mean():.2f}, Max: {df_metrics['error_rate'].max():.2f}")
print(f"Latency - Mean: {df_metrics['latency_ms'].mean():.2f}ms, Max: {df_metrics['latency_ms'].max():.2f}ms")
print(f"\n✅ Time-series metrics created successfully!")

### Implement Tool Functions

Create tool functions that the agent can use to query our dummy data.

In [None]:
# Tool function definitions with @tool decorator

@tool
def query_cloudwatch_logs(severity: str = None, service: str = None, source_ip: str = None, limit: int = 10) -> str:
    """
    Query CloudWatch logs with optional filters.
    
    Args:
        severity: Filter by log level (ERROR, WARNING, CRITICAL)
        service: Filter by service name
        source_ip: Filter by source IP address
        limit: Maximum number of results to return
    
    Returns:
        JSON string containing matching log entries
    """
    filtered_logs = cloudwatch_logs.copy()
    
    if severity:
        filtered_logs = [log for log in filtered_logs if log['level'] == severity.upper()]
    if service:
        filtered_logs = [log for log in filtered_logs if log['service'] == service]
    if source_ip:
        filtered_logs = [log for log in filtered_logs if log['source_ip'] == source_ip]
    
    result = filtered_logs[:limit]
    return json.dumps(result, indent=2)


@tool
def query_security_hub_findings(severity: str = None, status: str = "ACTIVE", resource_type: str = None, limit: int = 10) -> str:
    """
    Query SecurityHub findings with optional filters.
    
    Args:
        severity: Filter by severity (CRITICAL, HIGH, MEDIUM, LOW)
        status: Filter by status (ACTIVE, RESOLVED)
        resource_type: Filter by AWS resource type
        limit: Maximum number of results to return
    
    Returns:
        JSON string containing matching findings
    """
    filtered_findings = security_hub_findings.copy()
    
    if severity:
        filtered_findings = [f for f in filtered_findings if f['severity'] == severity.upper()]
    if status:
        filtered_findings = [f for f in filtered_findings if f['status'] == status.upper()]
    if resource_type:
        filtered_findings = [f for f in filtered_findings if f['resource_type'] == resource_type]
    
    result = filtered_findings[:limit]
    return json.dumps(result, indent=2)


@tool
def get_metrics_for_timerange(start_time: str = None, end_time: str = None, service: str = "api-gateway") -> str:
    """
    Get CloudWatch metrics for a specific time range.
    
    Args:
        start_time: Start time in ISO format (optional)
        end_time: End time in ISO format (optional)
        service: Service name to query metrics for
    
    Returns:
        JSON string containing time-series metrics data
    """
    filtered_metrics = df_metrics[df_metrics['service'] == service].copy()
    
    if start_time:
        filtered_metrics = filtered_metrics[filtered_metrics['timestamp'] >= pd.to_datetime(start_time)]
    if end_time:
        filtered_metrics = filtered_metrics[filtered_metrics['timestamp'] <= pd.to_datetime(end_time)]
    
    # Convert to dict for JSON serialization
    result = filtered_metrics.to_dict(orient='records')
    # Convert timestamps to strings
    for record in result:
        record['timestamp'] = record['timestamp'].isoformat()
    
    return json.dumps(result, indent=2)


@tool
def analyze_ip_address(ip_address: str) -> str:
    """
    Analyze an IP address to determine if it's suspicious and provide context.
    
    Args:
        ip_address: IP address to analyze
    
    Returns:
        JSON string with IP analysis results
    """
    # Count occurrences in logs
    occurrences = sum(1 for log in cloudwatch_logs if log['source_ip'] == ip_address)
    
    # Get all log entries for this IP
    ip_logs = [log for log in cloudwatch_logs if log['source_ip'] == ip_address]
    
    # Determine if suspicious (simplified heuristic)
    error_count = sum(1 for log in ip_logs if log['level'] in ['ERROR', 'CRITICAL'])
    is_suspicious = error_count >= 2 or occurrences >= 3
    
    # Check if it's a known malicious IP range (simplified check)
    is_tor_exit = ip_address.startswith('185.220.')
    
    result = {
        "ip_address": ip_address,
        "total_requests": occurrences,
        "error_count": error_count,
        "is_suspicious": is_suspicious,
        "is_known_malicious": is_tor_exit,
        "risk_level": "HIGH" if is_suspicious and is_tor_exit else "MEDIUM" if is_suspicious else "LOW",
        "services_accessed": list(set(log['service'] for log in ip_logs)),
        "recent_activity": ip_logs[-3:] if ip_logs else []
    }
    
    return json.dumps(result, indent=2)


# Test the tool functions
print("🔧 Tool Functions Implemented:\n")
print("1. query_cloudwatch_logs(severity, service, source_ip, limit)")
print("2. query_security_hub_findings(severity, status, resource_type, limit)")
print("3. get_metrics_for_timerange(start_time, end_time, service)")
print("4. analyze_ip_address(ip_address)")
print("\n✅ All tool functions ready!")

### Example: Query Metrics Tool Directly

Before setting up agents, let's see how the metrics tool works by calling it directly to detect anomalies.

In [None]:
# Test the metrics tool directly
print("Testing get_metrics_for_timerange tool...\n")

# Get all metrics for api-gateway
metrics_json = get_metrics_for_timerange(service="api-gateway")
metrics_data = json.loads(metrics_json)

print(f"Retrieved {len(metrics_data)} data points\n")

# Find the anomaly spike
print("Looking for anomalies...\n")
for point in metrics_data:
    if point['error_rate'] > 40 or point['latency_ms'] > 2000:
        print(f"🚨 ANOMALY DETECTED:")
        print(f"   Timestamp: {point['timestamp']}")
        print(f"   Error Rate: {point['error_rate']:.2f} req/min (baseline ~5)")
        print(f"   Latency: {point['latency_ms']:.2f}ms (baseline ~150ms)")
        print()

print("✅ The metrics show clear anomaly spikes that agents can detect!")

### Set Up Bedrock Model with Pixtral Large

Configure the Pixtral Large model for use with Strands Agents.

In [None]:
# Create BedrockModel instance with Pixtral Large
model_id = "us.mistral.pixtral-large-2502-v1:0"

bedrock_model = BedrockModel(
    model_id=model_id,
    streaming=False
)

# Initialize Bedrock client for cost calculation
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-west-2")

print(f"✅ BedrockModel configured with {model_id}")
print(f"📋 Tools available for agent: query_cloudwatch_logs, query_security_hub_findings, get_metrics_for_timerange, analyze_ip_address")

### Create Security Agent Function

Now we'll create the main agent execution function that:
- Initializes a Strands Agent with our Pixtral Large model
- Configures the agent with our security analysis tool functions
- Executes security investigations and tracks execution time
- Returns comprehensive analysis with remediation recommendations

The agent will automatically:
1. Understand the user's security query
2. Call the appropriate tool functions to gather data
3. Analyze the results across multiple data sources
4. Provide actionable security insights

In [None]:
# Agent execution function with Strands Agents and cost tracking
def run_security_agent(user_query: str):
    """
    Run the security analysis agent with Pixtral Large using Strands Agents.
    
    Args:
        user_query: The investigation query from the user
    
    Returns:
        Agent response and cost information
    """
    # Define system prompt
    system_prompt = """You are a security analyst AI assistant with access to CloudWatch logs, SecurityHub findings, and metrics data.

Your role is to:
1. Investigate security incidents by analyzing logs and findings
2. Identify patterns and anomalies in the data
3. Correlate events across different data sources
4. Provide clear, actionable security recommendations
5. Explain your reasoning step-by-step

When investigating:
- Start by querying relevant logs and findings
- Look for suspicious IPs, repeated failures, and security violations
- Check metrics for anomalies during incident timeframes
- Analyze the severity and impact of findings
- Provide a comprehensive summary with remediation steps

Be thorough but concise. Use the available tools to gather evidence before drawing conclusions."""
    
    # Create tools list
    tools = [
        query_cloudwatch_logs,
        query_security_hub_findings,
        get_metrics_for_timerange,
        analyze_ip_address
    ]
    
    # Create agent
    agent = Agent(
        model=bedrock_model,
        tools=tools,
        system_prompt=system_prompt
    )
    
    print(f"🤖 Starting investigation with Pixtral Large...")
    print(f"📝 Query: {user_query}\n")
    
    # Track start time for approximate cost calculation
    import time
    start_time = time.time()
    
    # Run agent
    response = agent(user_query)
    
    end_time = time.time()
    
    print(f"\n{'='*80}")
    print(f"💬 Agent Response:")
    print(f"{'='*80}")
    print(response)
    
    # Note: Strands Agent doesn't expose token usage directly in the response
    # For accurate cost tracking, you would need to access the underlying model's usage stats
    # This is a simplified cost estimation
    
    print(f"\n{'='*80}")
    print(f"⏱️  Execution Time: {end_time - start_time:.2f}s")
    print(f"💰 Note: For detailed token usage and costs, check CloudWatch Logs or Bedrock metrics")
    print(f"{'='*80}")
    
    return {
        "response": response,
        "execution_time": end_time - start_time
    }

print("✅ Agent execution function ready!")

## Part 2: Multi-Agent Security Investigation Workflow

Instead of making simple single-agent calls, we can create a sophisticated **multi-agent workflow** that orchestrates multiple specialized agents to perform comprehensive security investigations. This workflow demonstrates production-ready patterns for building complex agentic systems.

### Why Multi-Agent Workflows?

**Single-Agent Limitations:**
- One agent tries to do everything
- Generic prompts lead to unfocused analysis
- No specialization or deep expertise
- Difficult to manage complexity

**Multi-Agent Benefits:**
- **Specialization**: Each agent focuses on specific domain (logs, compliance, metrics)
- **Modularity**: Easy to add, remove, or modify individual agents
- **Scalability**: Distribute work across multiple focused agents
- **Quality**: Specialized prompts and tools yield better results

### Workflow Capabilities

This workflow demonstrates:

1. **Intelligent Routing** - Triage agent classifies investigation type and determines which specialists to invoke
2. **Sequential Execution** - Each agent builds upon previous findings in a coordinated manner
3. **Conditional Logic** - Workflow adapts based on investigation type (log_analysis, compliance_review, metrics_analysis, full_audit)
4. **State Management** - Tracks all findings across the workflow for comprehensive synthesis
5. **Synthesis** - Final remediation agent combines insights from all specialists into prioritized action plans

### Workflow Architecture

Our security workflow consists of 5 specialized agents:
- **Triage Agent**: Analyzes the query and determines which specialized agents to invoke
- **Log Analysis Agent**: Examines CloudWatch logs for security events (tools: query_cloudwatch_logs, analyze_ip_address)
- **Compliance Agent**: Reviews SecurityHub findings and compliance violations (tool: query_security_hub_findings)
- **Metrics Agent**: Detects anomalies in performance metrics (tool: get_metrics_for_timerange)
- **Remediation Agent**: Synthesizes all findings and provides prioritized action plans

### Workflow State Management

Define the state object that tracks the investigation workflow.

In [None]:
# Workflow State Management
from typing import List, Dict, Any
from dataclasses import dataclass, field

@dataclass
class SecurityWorkflowState:
    """State object that tracks the security investigation workflow"""
    query: str
    investigation_type: str = ""  # triage, log_analysis, compliance, full_audit, etc.
    
    # Agent results
    log_findings: List[Dict] = field(default_factory=list)
    compliance_findings: List[Dict] = field(default_factory=list)
    metrics_analysis: Dict = field(default_factory=dict)
    threat_intelligence: Dict = field(default_factory=dict)
    
    # Final output
    remediation_plan: str = ""
    priority_actions: List[str] = field(default_factory=list)
    
    # Workflow tracking
    agents_invoked: List[str] = field(default_factory=list)
    execution_time: float = 0.0

print("✅ Workflow state management ready!")

### Specialized Security Agents

Create specialized agents for different aspects of security analysis.

In [None]:
# Specialized Security Agents

def create_triage_agent():
    """Agent that determines investigation type and routing"""
    system_prompt = """You are a security triage specialist. Analyze the user's query and determine:
    1. Investigation type (log_analysis, compliance_review, metrics_analysis, threat_investigation, full_audit)
    2. Which specialized agents should be invoked
    3. The order of agent execution
    
    Respond with a brief classification and reasoning."""
    
    tools = []  # Triage agent doesn't need tools, just classification
    agent = Agent(model=bedrock_model, tools=tools, system_prompt=system_prompt)
    return agent

def create_log_analysis_agent():
    """Agent specialized in CloudWatch log analysis"""
    system_prompt = """You are a log analysis specialist. Your role:
    1. Query CloudWatch logs for security events
    2. Identify suspicious patterns (failed auth, SQL injection, malware, etc.)
    3. Extract relevant IPs and services
    4. Provide severity assessment
    
    Be thorough and identify ALL security-relevant events."""
    
    tools = [query_cloudwatch_logs, analyze_ip_address]
    agent = Agent(model=bedrock_model, tools=tools, system_prompt=system_prompt)
    return agent

def create_compliance_agent():
    """Agent specialized in SecurityHub compliance review"""
    system_prompt = """You are a compliance and security findings specialist. Your role:
    1. Query SecurityHub findings
    2. Prioritize by severity and impact
    3. Identify compliance violations
    4. Assess risk levels
    
    Focus on CRITICAL and HIGH severity findings first."""
    
    tools = [query_security_hub_findings]
    agent = Agent(model=bedrock_model, tools=tools, system_prompt=system_prompt)
    return agent

def create_metrics_agent():
    """Agent specialized in metrics anomaly detection"""
    system_prompt = """You are a metrics analysis specialist. Your role:
    1. Analyze time-series metrics for anomalies
    2. Identify performance degradation patterns
    3. Correlate metrics spikes with security events
    4. Detect potential DDoS or resource exhaustion attacks
    
    Look for statistically significant deviations from baseline."""
    
    tools = [get_metrics_for_timerange]
    agent = Agent(model=bedrock_model, tools=tools, system_prompt=system_prompt)
    return agent

def create_remediation_agent():
    """Agent that synthesizes findings and creates action plans"""
    system_prompt = """You are a security remediation specialist. Your role:
    1. Synthesize findings from all agents
    2. Create a prioritized remediation plan
    3. Provide specific, actionable steps
    4. Estimate impact and urgency
    
    Structure your output as:
    - Executive Summary
    - Priority 1 Actions (immediate)
    - Priority 2 Actions (short-term)
    - Priority 3 Actions (long-term)
    - Recommended monitoring"""
    
    tools = []  # Remediation agent synthesizes, doesn't need tools
    agent = Agent(model=bedrock_model, tools=tools, system_prompt=system_prompt)
    return agent

print("✅ Specialized security agents created!")

### Workflow Orchestration

The orchestrator manages the sequential execution of specialized agents based on the investigation type.

In [None]:
# Security Investigation Workflow Orchestrator

def run_security_workflow(user_query: str) -> SecurityWorkflowState:
    """
    Orchestrates a multi-agent security investigation workflow.
    
    Workflow Steps:
    1. Triage - Classify the investigation type
    2. Routing - Determine which agents to invoke
    3. Specialized Analysis - Run appropriate agents in sequence
    4. Synthesis - Combine findings into actionable insights
    
    Args:
        user_query: The security investigation request
        
    Returns:
        SecurityWorkflowState with complete investigation results
    """
    import time
    start_time = time.time()
    
    # Initialize state
    state = SecurityWorkflowState(query=user_query)
    
    print("="*80)
    print("🔍 SECURITY INVESTIGATION WORKFLOW")
    print("="*80)
    print(f"Query: {user_query}\n")
    
    # STEP 1: Triage
    print("\n📋 STEP 1: Triage Analysis")
    print("-"*80)
    triage_agent = create_triage_agent()
    triage_response = triage_agent(user_query)
    state.agents_invoked.append("triage")
    
    # Show abbreviated triage response
    triage_str = str(triage_response)
    if len(triage_str) > 200:
        print(f"Triage Result: {triage_str[:200]}... [truncated]")
    else:
        print(f"Triage Result: {triage_str}")
    
    # Determine investigation type from triage response
    triage_lower = triage_str.lower()
    if "full" in triage_lower or "comprehensive" in triage_lower or "audit" in triage_lower:
        state.investigation_type = "full_audit"
    elif "compliance" in triage_lower or "securityhub" in triage_lower or "findings" in triage_lower:
        state.investigation_type = "compliance_review"
    elif "metrics" in triage_lower or "performance" in triage_lower or "anomaly" in triage_lower:
        state.investigation_type = "metrics_analysis"
    elif "log" in triage_lower or "authentication" in triage_lower or "attack" in triage_lower:
        state.investigation_type = "log_analysis"
    else:
        state.investigation_type = "full_audit"  # Default to comprehensive
    
    print(f"\n🎯 Investigation Type: {state.investigation_type}")
    
    # STEP 2 & 3: Execute specialized agents based on investigation type
    if state.investigation_type in ["log_analysis", "full_audit"]:
        print("\n\n🔎 STEP 2: Log Analysis")
        print("-"*80)
        log_agent = create_log_analysis_agent()
        log_query = "Analyze all CloudWatch logs for security events, suspicious IPs, and attack patterns."
        log_response = log_agent(log_query)
        state.agents_invoked.append("log_analysis")
        
        # Show key findings only
        log_str = str(log_response)
        print(f"✓ Log analysis completed ({len(log_str)} chars)")
        state.log_findings.append({"analysis": log_str})
    
    if state.investigation_type in ["compliance_review", "full_audit"]:
        print("\n\n🛡️  STEP 3: Compliance Review")
        print("-"*80)
        compliance_agent = create_compliance_agent()
        compliance_query = "Review all CRITICAL and HIGH severity SecurityHub findings and assess compliance risks."
        compliance_response = compliance_agent(compliance_query)
        state.agents_invoked.append("compliance")
        
        # Show key findings only
        compliance_str = str(compliance_response)
        print(f"✓ Compliance review completed ({len(compliance_str)} chars)")
        state.compliance_findings.append({"analysis": compliance_str})
    
    if state.investigation_type in ["metrics_analysis", "full_audit"]:
        print("\n\n📊 STEP 4: Metrics Analysis")
        print("-"*80)
        metrics_agent = create_metrics_agent()
        metrics_query = "Analyze metrics for api-gateway service to detect anomalies and correlate with security events."
        metrics_response = metrics_agent(metrics_query)
        state.agents_invoked.append("metrics")
        
        # Show key findings only
        metrics_str = str(metrics_response)
        print(f"✓ Metrics analysis completed ({len(metrics_str)} chars)")
        state.metrics_analysis = {"analysis": metrics_str}
    
    # STEP 5: Remediation & Synthesis
    print("\n\n💡 STEP 5: Remediation Planning")
    print("-"*80)
    remediation_agent = create_remediation_agent()
    
    # Build context from all agent findings
    synthesis_context = f"""Based on the following security investigation findings:

USER QUERY: {user_query}

LOG ANALYSIS:
{state.log_findings if state.log_findings else "Not performed"}

COMPLIANCE REVIEW:
{state.compliance_findings if state.compliance_findings else "Not performed"}

METRICS ANALYSIS:
{state.metrics_analysis if state.metrics_analysis else "Not performed"}

Provide a comprehensive remediation plan with prioritized actions."""
    
    remediation_response = remediation_agent(synthesis_context)
    state.agents_invoked.append("remediation")
    state.remediation_plan = str(remediation_response)
    
    # Calculate execution time
    state.execution_time = time.time() - start_time
    
    # Final output - formatted nicely
    print("\n\n" + "="*80)
    print("📝 FINAL REMEDIATION PLAN")
    print("="*80)
    
    # Format the remediation plan for better readability
    plan_lines = state.remediation_plan.split('\n')
    for line in plan_lines:
        if line.strip():
            # Add extra spacing for section headers
            if any(keyword in line for keyword in ['Executive Summary', 'Priority 1', 'Priority 2', 'Priority 3', 'Recommended']):
                print(f"\n{line}")
            else:
                print(line)
    
    print("\n" + "="*80)
    print("📊 WORKFLOW SUMMARY")
    print("="*80)
    print(f"Investigation Type: {state.investigation_type}")
    print(f"Agents Invoked: {', '.join(state.agents_invoked)}")
    print(f"Total Execution Time: {state.execution_time:.2f}s")
    print("="*80)
    
    return state

print("✅ Workflow orchestrator ready!")

#### Test 1: Triage Agent

See how the triage agent classifies different types of security queries.

In [None]:
# Test Triage Agent
print("Testing Triage Agent...\n")
triage_agent = create_triage_agent()

# Test query
test_query = "Check for failed authentication attempts in our logs"
print(f"Query: {test_query}\n")

response = triage_agent(test_query)
print(f"\nTriage Classification:\n{response}")

#### Test 2: Log Analysis Agent

See how the log analysis agent uses tools to investigate CloudWatch logs.

In [None]:
# Test Log Analysis Agent
print("Testing Log Analysis Agent...\n")
log_agent = create_log_analysis_agent()

# Test query
test_query = "Find all CRITICAL security events"
print(f"Query: {test_query}\n")

response = log_agent(test_query)
print(f"\nLog Analysis Result:\n{response}")

#### Test 3: Compliance Agent

See how the compliance agent reviews SecurityHub findings.

In [None]:
# Test Compliance Agent
print("Testing Compliance Agent...\n")
compliance_agent = create_compliance_agent()

# Test query
test_query = "Show me all CRITICAL severity findings"
print(f"Query: {test_query}\n")

response = compliance_agent(test_query)
print(f"\nCompliance Review Result:\n{response}")

#### Test 4: Metrics Analysis Agent

See how the metrics agent detects anomalies in time-series data.

In [None]:
# Test Metrics Agent
print("Testing Metrics Analysis Agent...\n")
metrics_agent = create_metrics_agent()

# Test query
test_query = "Analyze the api-gateway metrics for anomalies in the last 2 hours"
print(f"Query: {test_query}\n")

response = metrics_agent(test_query)
print(f"\nMetrics Analysis Result:\n{response}")

### Full Workflow Example: Comprehensive Security Audit

Now let's run the complete workflow that orchestrates all agents together.

In [None]:
# Workflow Example 1: Comprehensive Security Audit
workflow_query_1 = """Perform a comprehensive security audit of our infrastructure. 
Analyze logs for attacks, review all compliance findings, check for performance anomalies, 
and provide a prioritized remediation plan."""

workflow_result_1 = run_security_workflow(workflow_query_1)