# Genomics Pipeline Supervisor Agent
In this notebook we create a Supervisor Agent that orchestrates 4 specialized genomics agents using the Strands framework:
1. **Data Discovery Agent** - Finds genomics data files (FASTQ pairs) for samples
2. **QC Agent** - Analyzes FASTQC quality control reports
3. **Workflow Orchestrator Agent** - Manages HealthOmics variant calling workflows
4. **Interpretation and Reporting Agent** - Provides cancer variant analysis with CIViC database integration

#### Install required dependencies

In [None]:
from IPython.display import HTML

HTML("""
<style>
/* Make all JupyterLab output areas scrollable */
.jp-OutputArea-output {
    max-height: 300px;
    overflow-y: auto;
}
</style>
""")

In [None]:
!pip install pandas --upgrade --quiet
!pip install -r requirements.txt --quiet
!pip install uv --quiet

#### Import required libraries

We will also import the functions that create each of the four specialized agents defined in [`data_discovery_agent.py`](data_discovery_agent.py), [`qc_agent.py`](qc_agent.py), [`workflow_orchestrator_agent.py`](workflow_orchestrator_agent.py), and [`interptetation_and_reporting_agent.py`](interpretation_and_reporting_agent.py)

In [None]:
import boto3
import logging
from strands import Agent, tool
from strands.models import BedrockModel

# Import agent creation functions
from data_discovery_agent import create_data_discovery_agent
from qc_agent import create_qc_agent
from workflow_orchestrator_agent import create_healthomics_agent
from interpretation_and_reporting_agent import create_cancer_analysis_agent

# Import MCP client helper
from mcp_clients import setup_mcp_clients

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

## Agent Creation
In this section we create the supervisor agent and wrap the four specialized agents as tools

### Agents as Tools with Strands Agents

"Agents as Tools" is an architectural pattern in AI systems where specialized AI agents are wrapped as callable functions (tools) that can be used by other agents. This creates a hierarchical structure where:

1. A primary "supervisor" agent handles user interaction and determines which specialized agent to call
2. Specialized "sub-agents" perform domain-specific tasks when called by the supervisor

This approach mimics human team dynamics, where a manager coordinates specialists, each bringing unique expertise to solve complex problems.

### Initialize MCP Tools
First, we'll initialize the HealthOmics MCP tools that will be shared across agents. This is more efficient than each agent starting it's own instance of the MCP and is safe in this case because the MCP is stateless.

In [None]:
# Initialize MCP clients and get tools
print("Initializing MCP clients...")
healthomics_client, aws_api_client = setup_mcp_clients()

# Enter the context managers to activate the clients
healthomics_client.__enter__()
aws_api_client.__enter__()

# Get tools from the clients
healthomics_tools = healthomics_client.list_tools_sync()
aws_tools = aws_api_client.list_tools_sync()

print(f"✓ Loaded {len(healthomics_tools)} HealthOmics tools")
print(f"✓ Loaded {len(aws_tools)} AWS API tools")

### Create Specialized Agents
Now we'll instantiate each of the four specialized agents, connecting them to their MCP servers as needed. The agents are defined in:
- [`interpretation_and_reporting_agent.py`](interpretation_and_reporting_agent.py),
- [`data_discovery_agent.py`](data_discovery_agent.py),
- [`qc_agent.py`](qc_agent.py),
- [`workflow_orchestrator_agent.py`](workflow_orchestrator_agent.py)

Each provides it's own 'create' method to instantiate them. All four Agents are built with the Strands SDK and follow patterns we have already shown. Feel free to take a look at their definitions.

In [None]:
# Create the 4 specialized agents with appropriate tools
print("Creating specialized agents...")

data_discovery_agent = create_data_discovery_agent(healthomics_tools + aws_tools)
print("✓ Data Discovery Agent created")

qc_agent = create_qc_agent(aws_tools)
print("✓ QC Agent created")

workflow_orchestrator_agent = create_healthomics_agent(healthomics_tools + aws_tools)
print("✓ Workflow Orchestrator Agent created")

interpretation_agent = create_cancer_analysis_agent(aws_tools)
print("✓ Interpretation and Reporting Agent created")

print("\nAll specialized agents ready!")

### Wrap Agents as Tools
Now we'll wrap each specialized agent as a tool that the supervisor can call. The supervisor autonomously decides which tool to use based on the the current conversation content, each tools description (docs), and the supervisor's system prompt.

In [None]:
# 1: Data Discovery Agent Tool
@tool
def data_discovery_tool(query: str) -> str:
    """
    Find genomics data files (FASTQ pairs) for a given sample ID using AWS HealthOmics.
    Use this agent when you need to locate sequencing data files.

    Args:
        query: Request to find genomics data files (e.g., "Find FASTQ or BAM files for sample ABC123").

    Returns:
        Information about the located data files including S3 URIs
    """
    try:
        response = data_discovery_agent(query)
        return str(response)
    except Exception as e:
        return f"Error in data discovery: {str(e)}"

# 2: QC Agent Tool
@tool
def quality_control_tool(query: str) -> str:
    """
    Analyze FASTQC quality control reports and provide recommendations for genomics data preprocessing.
    Use this agent when you need to assess data quality or get QC recommendations.

    Args:
        query: Request to analyze quality control data (e.g., "Analyze QC report at s3://bucket/qc.zip"). Ensure the query contains an S3 URI.
    Returns:
        Quality control analysis results and recommendations
    """
    try:
        response = qc_agent(query)
        return str(response)
    except Exception as e:
        return f"Error in quality control: {str(e)}"

# 3: Workflow Orchestrator Agent Tool
@tool
def workflow_orchestrator_tool(query: str) -> str:
    """
    Orchestrate AWS HealthOmics workflows. Identify which workflows to run, start workflows, monitor their status,
    and retrieve results. Use this agent when you need to run genomics analysis pipelines.

    Args:
        query: Request to find, create or manage workflows (e.g., "Find workflow to call variants for sample xyz").

    Returns:
        Workflow execution status and results
    """
    try:
        response = workflow_orchestrator_agent(query)
        return str(response)
    except Exception as e:
        return f"Error in workflow orchestration: {str(e)}"

# 4: Interpretation and Reporting Agent Tool
@tool
def interpretation_reporting_tool(query: str) -> str:
    """
    Provide comprehensive cancer variant analysis with CIViC database integration.
    Offers evidence-based clinical interpretations and therapeutic recommendations.
    Use this agent when you need to interpret variant calling results.

    Args:
        query: Request for variant interpretation (e.g., "Analyze variants in MAF file at s3://bucket/variants.maf")

    Returns:
        Clinical interpretation and therapeutic recommendations
    """
    try:
        response = interpretation_agent(query)
        return str(response)
    except Exception as e:
        return f"Error in interpretation: {str(e)}"

print("Agent tools created successfully!")

### Create the Supervisor Agent
Finally, we'll create the supervisor agent that coordinates all specialized agents

In [None]:
# Define supervisor agent configuration
supervisor_instruction = """You are a Genomics Pipeline Supervisor Agent that coordinates specialized agents to help users with genomics analysis workflows.

You have access to 4 specialized agents:
1. **Data Discovery Agent** - Finds genomics data files (FASTQ pairs) for samples
2. **Quality Control Agent** - Analyzes FASTQC quality control reports and provides recommendations
3. **Workflow Orchestrator Agent** - Finds or creates AWS Healthomics workflows, appropriate IAM roles, and reference data to run the appropriate workflows
4. **Interpretation and Reporting Agent** - Provides cancer variant analysis with CIViC database integration

Your role is to:
- Understand user queries and determine which agent(s) to engage
- Route questions to the appropriate specialized agent
- Coordinate multi-step workflows that require multiple agents
- Synthesize responses from multiple agents when needed
- Provide clear, consolidated answers to the user

Common workflows:
- **Data Discovery → QC → Workflow -> Interpretation**: Find data, check quality, run workflow and call variants if quality is good, then interpret maf file and generate report
- **Workflow → Interpretation**: Run variant calling, then interpret results
- **Single Agent**: Route simple queries to the appropriate specialist

When responding:
1. Briefly acknowledge the user's request
2. Explain which agent(s) you're engaging and why
3. Present the results from each agent clearly
4. Provide a concise summary with actionable next steps

Be helpful, clear, and efficient in coordinating the genomics analysis pipeline.
"""

# Define the model for supervisor
supervisor_model = BedrockModel(
    model_id="global.anthropic.claude-sonnet-4-20250514-v1:0",
    temperature=0.1,
    max_tokens=4096,
    cache_tools="default"
)

# Create the supervisor agent
supervisor = Agent(
    name="Genomics Pipeline Supervisor",
    model=supervisor_model,
    system_prompt=supervisor_instruction,
    tools=[
        data_discovery_tool,
        quality_control_tool,
        workflow_orchestrator_tool,
        interpretation_reporting_tool
    ]
)

print("✓ Supervisor Agent created successfully!")
print("\nThe supervisor is ready to coordinate genomics analysis workflows.")

## Test the Supervisor Agent
Now let's test the supervisor with different types of queries. In the agent responses you will see it select various tools to help it formulate it's response.

### Sample Questions
Here are some example questions you can ask the supervisor agent

In [None]:
session = boto3.Session()
region = session.region_name
print(f"The current AWS region is: {region}")

# Sample question bank

# Data Discovery Questions
data_discovery_q1 = "Find the FASTQ files for patient P001"
data_discovery_q2 = "Which reference genomes are available to me?"

# Quality Control Questions
qc_q1 = "Analyze the quality control report for sample of patient P001"
qc_q2 = "Is the data quality good enough to proceed with variant calling?"

# Workflow Questions
workflow_q1 = "Start a variant calling workflow for the FASTQ files I just found"
workflow_q2 = "Check the status of workflow run 1234567"
workflow_q3 = "List all my recent workflow runs"

# Interpretation Questions
interpretation_q1 = f"Analyze the variants in the MAF file at s3://aws-genomics-static-{region}/omics-data/tumor-normal/maf/test_civic.mafs"
interpretation_q2 = "Generate a clinical report for the variant analysis"

# Multi-step Questions
multi_step_q1 = "Find data for sample XYZ, check its quality, and if good, start variant calling"
multi_step_q2 = "Run variant calling on sample ABC and then interpret the results"

# Choose a question to test
test_query = data_discovery_q1  # Change this to test different questions

print(f"Testing supervisor with: {test_query}")
print("=" * 80)

### Execute Query
Run the supervisor agent with your chosen query

In [None]:
try:
    # Run the supervisor agent
    response = supervisor(test_query)
    
    print("\n" + "=" * 80)
    print("SUPERVISOR RESPONSE:")
    print("=" * 80)
    print(response)
    
except Exception as e:
    print(f"Error during execution: {e}")
    import traceback
    traceback.print_exc()

### Interactive Mode
Use this cell to have an interactive conversation with the supervisor

In [None]:
# Interactive query - modify and run multiple times
user_query = """
Generate a workflow that says hello to the user, Check the syntax
of the workflow and make any required fixes or improvements.
Finally, show me the workflow definition."""  # Change this to ask different questions

print(f"User: {user_query}")
print("\n" + "=" * 80)

response = supervisor(user_query)
print(f"\nSupervisor: {response}")

In [None]:
user_query = "How can I optimize the genomics-ai-workshop-mutect2 workflow "  # Change this to ask different questions

print(f"User: {user_query}")
print("\n" + "=" * 80)

response = supervisor(user_query)
print(f"\nSupervisor: {response}")

In [None]:
# Interactive query - modify and run multiple times
user_query = "I want to run a mutect2 workflow on patient P001 bam files, then interpret the generated MAF to look for therapeutic implications "  # Change this to ask different questions

print(f"User: {user_query}")


print("\n" + "=" * 80)

response = supervisor(user_query)
print(f"\nSupervisor: {response}")

In [None]:
# Interactive query - modify and run multiple times
user_query = "Find the FASTQ files for patient P001"  # Change this to ask different questions

print(f"User: {user_query}")

print("\n" + "=" * 80)

response = supervisor(user_query)
print(f"\nSupervisor: {response}")