## Create Genomics Store Supervisor Agent

**Updated Version - With Quality-Based Filtering**

In this notebook we create the Genomics Store Supervisor Agent that will interact with **genomicsvariantstore** and **genomicsannotationstore** directly using AWS HealthOmics APIs and Athena queries. This version includes comprehensive quality filtering and uses real HealthOmics store queries with dynamic SQL generation capabilities.

### Key Features:
- Direct queries to genomicsvariantstore and genomicsannotationstore
- Dynamic SQL query generation using AI
- Proper VEP and ClinVar annotation handling
- **Quality-based filtering**: All variant analyses automatically apply:
  - `v.qual > 30 AND contains(v.filters, 'PASS')` for standard quality
  - `v.qual > 50 AND contains(v.filters, 'PASS')` for high quality
- No DynamoDB pipeline tracking dependency
- Real genomic variant analysis functions
- No LIMIT clauses - complete result sets

#### Install required packages for genomics store analysis

In [None]:
!pip install --upgrade boto3
!pip install pandas numpy

#### Import required libraries and initialize AWS clients

In [None]:
import boto3
import json
import time
from datetime import datetime
import pandas as pd
import numpy as np

# Import genomics store modules
from ./tools/genomics_store_interpreters import (
    genomics_store_agent_name,
    genomics_store_agent_description,
    genomics_store_agent_instruction,
    genomics_store_agent_tools,
    model
)

from ./tools/genomics_store_functions import (
    REGION,
    ACCOUNT_ID,
    VARIANT_STORE_NAME,
    ANNOTATION_STORE_NAME,
    LAKE_FORMATION_DATABASE,
    get_stores_information,
    get_available_samples_from_variant_store,
    execute_dynamic_query
)

# Initialize AWS clients
session = boto3.Session()
region = session.region_name or 'us-east-1'
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()['Account']

print(f"AWS Account: {account_id}")
print(f"Region: {region}")
print(f"Variant Store: {VARIANT_STORE_NAME}")
print(f"Annotation Store: {ANNOTATION_STORE_NAME}")
print(f"Database: {LAKE_FORMATION_DATABASE}")
print("\n✅ No DynamoDB dependency - using direct store queries")
print("✅ Quality-based filtering enabled")

#### Initialize Bedrock AgentCore client

In [None]:
# Initialize Bedrock AgentCore client
bedrock_agent_client = boto3.client('bedrock-agent', region_name=region)
bedrock_runtime_client = boto3.client('bedrock-agent-runtime', region_name=region)

print("✅ Bedrock AgentCore clients initialized")

In [None]:
from magic_helper import register_cell_magic

### Preparing your agent for deployment on AgentCore Runtime

In [None]:
%%writefile ./main.py


from strands import Agent, tool
import argparse
import json
from bedrock_agentcore.runtime import BedrockAgentCoreApp
from strands.models import BedrockModel
from bedrock_agentcore_starter_toolkit import Runtime
from boto3.session import Session

# Import everything from vcf_interpreters
from genomics_store_interpreters import *

boto_session = Session()
# Define the model
bedrock_model = BedrockModel(
    model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0", 
    region_name=region,
    temperature=0.1,
    streaming=False
)

# Define orchestrator agent configuration below
agent_name = "vcf-agent-direct"
agent_description = "VCF direct agent for clinical insights discovery"
agent_instruction = """You are an advanced genomics analysis assistant specialized in clinical variant interpretation using AWS HealthOmics genomicsvariantstore and genomicsannotationstore.

Your primary focus is on clinically actionable genomic analysis with these specialized capabilities:

CORE CLINICAL TOOLS:
1. **Gene-Specific Analysis** (query_variants_by_gene): For targeted gene panels, cancer genes (BRCA1/2, TP53), pharmacogenes (CYP2D6, CYP2C19)
2. **Chromosomal Analysis** (query_variants_by_chromosome): For chromosomal abnormalities, CNV analysis, specific genomic regions
3. **Population Frequency Analysis** (analyze_allele_frequencies): For rare disease variants, population genetics, allele frequency comparisons with 1000 Genomes
4. **Sample Comparison** (compare_sample_variants): For family studies, cohort analysis, population stratification
5. **Dynamic Queries** (execute_dynamic_genomics_query): For complex clinical questions requiring custom SQL

CLINICAL DECISION SUPPORT:
- Always apply quality filtering (qual > 30 AND PASS filters)
- Prioritize variants by clinical significance and functional impact
- Include 1000 Genomes frequency data for population context
- Provide actionable clinical interpretations
- Include source of datasets like clinvar for 'Pathogenic' or "Benign' and VEP for defining 'High' or 'low' impact variants 

TOOL SELECTION STRATEGY:
1. **For specific genes**: Use query_variants_by_gene (e.g., "BRCA1 variants", "CYP2D6 pharmacogenomics")
2. **For chromosomal regions**: Use query_variants_by_chromosome (e.g., "chromosome 17 variants", "chr13:32000000-33000000")
3. **For frequency analysis**: Use analyze_allele_frequencies (e.g., "rare variants", "population frequencies")
4. **For cohort studies**: Use compare_sample_variants (e.g., "compare samples", "family analysis")
5. **For complex queries**: Use execute_dynamic_genomics_query

EXECUTION FLOW:
1. Understand the user query
2. Select the MOST APPROPRIATE single tool
3. Call the tool with proper parameters
4. Present the results immediately
5. DO NOT call multiple tools unless user explicitly requests for detailed analysis

Remember: Focus on clinically actionable insights that can inform patient care, genetic counseling, and treatment decisions.
"""

# Create the direct agent with genomics tools
try:
    direct_agent = Agent(
        model=bedrock_model,
        system_prompt=agent_instruction,
        tools=genomics_store_agent_tools
    )
    print("✅ Direct agent created successfully")
except Exception as e:
    print(f"❌ Error creating direct agent: {e}")
    direct_agent = None

app = BedrockAgentCoreApp()

@app.entrypoint
async def strands_agent_bedrock_streaming(payload):
    """
    Invoke the agent with streaming capabilities
    """
    user_input = payload.get("prompt")
    
    if direct_agent is None:
        yield "Error: Direct agent not initialized"
        return
    
    try:
        tool_name = None
        async for event in direct_agent.stream_async(user_input):
            if (
                "current_tool_use" in event
                and event["current_tool_use"].get("name") != tool_name
            ):
                tool_name = event["current_tool_use"]["name"]
                yield f"\n\n🔧 Using tool: {tool_name}\n\n"
            
            if "data" in event:
                yield event["data"]
                
    except Exception as e:
        error_response = {"error": str(e), "type": "stream_error"}
        print(f"Streaming error: {error_response}")
        yield error_response

if __name__ == "__main__":
    app.run()


##### 1. Create infrastructure

### Create Memory

In [None]:
!python ./scripts/agentcore_memory.py create --name genomicsapp

In [None]:
# Deploy the agent to Bedrock AgentCore
deployment_name="genomics_store_agent_6"

iam = boto3.client('iam')
agentcore_iam_role = iam.get_role(RoleName='genomics-vep-pipeline-agent-role')['Role']['Arn']
agentcore_iam_role

In [None]:
# Try running in the terminal
# $agentcore configure --entrypoint agent/main.py -er arn:aws:iam::066466276140:role/genomics-vep-pipeline-agent-role --name genomicsapp_vcf_agent_supervisor --requirements-file runtime_requirements.txt
# $agentcore launch --auto-update-on-conflict
# $streamlit run app.py --server.port 8501 --theme.primaryColor "#667eea"

## Or

### Configure AgentCore Runtime deployment¶

During the configure step, your docker file will be generated based on your application code.

In [None]:
import boto3
import os
from bedrock_agentcore_starter_toolkit import Runtime
from boto3.session import Session
from botocore.config import Config
boto_session = Session()
region = boto_session.region_name

client = boto3.client(service_name='bedrock-runtime', 
                      region_name='us-east-1'
                      )

agentcore_runtime = Runtime()

response = agentcore_runtime.configure(
    entrypoint="vcf_agent_marker.py",
    execution_role=agentcore_iam_role,
    auto_create_ecr=True,
    requirements_file="runtime_requirements.txt",
    region=region,
    agent_name="vcf_agent_supervisor",
    disable_otel=True
)

## Launching agent to AgentCore Runtime

Now that we've got a docker file, let's launch the agent to the AgentCore Runtime. This will create the Amazon ECR repository and the AgentCore Runtime.

In [None]:
launch_result = agentcore_runtime.launch(
     auto_update_on_conflict=True
)
launch_result

#### Interactive Genomics Chat Interface

In [None]:
invoke_response = agentcore_runtime.invoke({"prompt": "Howmany patients are there in present cohort?"})

## Integrate Agentcore Runtime in Streamlite app

In [None]:
!pip install -r ../streamlit_requirements.txt

In [None]:
!streamlit run ../app.py --server.port 8501 --theme.primaryColor "#667eea"

## Use your sagemaker link: https://\<sagemaker-notebook-instance-name\>.notebook.\<region_name\>.sagemaker.aws/proxy/8501/ in the web-browser to chat with your agent