# SV-Agent Chat Interface Demo

This notebook demonstrates the interactive chat capabilities of sv-agent for structural variant analysis using GATK-SV.

## Overview

sv-agent provides:
- **CWL workflow execution** for running GATK-SV analysis
- **WDL to CWL conversion** for GATK-SV workflows
- **Expert guidance** on structural variant analysis
- **Interactive chat** with optional LLM enhancement

## Setup

First, let's import the necessary modules and initialize sv-agent:

In [None]:
# Import required modules
from sv_agent import SVAgent
from sv_agent.chat import SVAgentChat
from sv_agent.llm import OllamaProvider, detect_available_provider
import json
from pathlib import Path

In [None]:
# Initialize the agent
agent = SVAgent()
print("SV-Agent initialized successfully!")

## 1. Basic Chat Interface

Let's start with the rule-based chat system that works without any LLM:

In [None]:
# Create chat with rule-based system (no LLM required)
chat_rules = SVAgentChat(agent, llm_provider="none")
print("Chat initialized with rule-based system")

### Example 1: Understanding sv-agent's capabilities

In [None]:
# Ask about capabilities
response = chat_rules.chat("What can you do?")
print("Q: What can you do?")
print("\nA:", response)

### Example 2: Running CWL workflows

In [None]:
# Ask about running workflows
response = chat_rules.chat("How do I run GATK-SV analysis?")
print("Q: How do I run GATK-SV analysis?")
print("\nA:", response)

### Example 3: Module-specific information

In [None]:
# Ask about specific modules
response = chat_rules.chat("Explain Module00a")
print("Q: Explain Module00a")
print("\nA:", response)

## 2. LLM-Enhanced Chat (with Ollama/Gemma)

If you have Ollama installed with Gemma, the chat interface provides more natural conversations:

In [None]:
# Check if Ollama is available
ollama = OllamaProvider(model="gemma:2b")
if ollama.is_available():
    print("✅ Ollama is available")
    models = ollama.list_models()
    print(f"Available models: {models}")
    
    # Create LLM-enhanced chat
    chat_llm = SVAgentChat(agent, llm_provider=ollama)
    print("\nChat initialized with Gemma LLM")
else:
    print("❌ Ollama not available - using rule-based system")
    chat_llm = chat_rules

### Example 4: Natural language questions

In [None]:
# Ask a complex question
question = "I have 100 samples from families with autism. What's the best way to run sv-agent for this cohort?"
response = chat_llm.chat(question)
print(f"Q: {question}")
print(f"\nA: {response}")

## 3. Best Practices and Recommendations

In [None]:
# Ask about coverage requirements
response = chat_rules.chat("What coverage do I need for reliable SV detection?")
print("Q: What coverage do I need for reliable SV detection?")
print("\nA:", response)

## 4. Troubleshooting Common Issues

In [None]:
# Ask about troubleshooting
response = chat_rules.chat("I'm getting very few SV calls. What should I check?")
print("Q: I'm getting very few SV calls. What should I check?")
print("\nA:", response)

## 5. Workflow Conversion Examples

In [None]:
# Ask about conversion
response = chat_rules.chat("How do I convert Module00a to CWL?")
print("Q: How do I convert Module00a to CWL?")
print("\nA:", response)

## 6. Practical Workflow Execution

Let's see how to actually use sv-agent to run analysis:

In [None]:
# Create a sample input configuration
sample_inputs = {
    "bam_file": {
        "class": "File",
        "path": "/data/samples/NA12878.bam",
        "secondaryFiles": [
            {"class": "File", "path": "/data/samples/NA12878.bam.bai"}
        ]
    },
    "reference": {
        "class": "File",
        "path": "/data/reference/hg38.fa",
        "secondaryFiles": [
            {"class": "File", "path": "/data/reference/hg38.fa.fai"}
        ]
    },
    "sample_id": "NA12878",
    "sex": "female"
}

print("Sample input configuration:")
print(json.dumps(sample_inputs, indent=2))

In [None]:
# Ask about running with this configuration
response = chat_rules.chat("I have a BAM file and reference. What's the complete command to run Module00a?")
print("Q: I have a BAM file and reference. What's the complete command to run Module00a?")
print("\nA:", response)

## 7. Batch Processing Guidance

In [None]:
# Ask about batch processing
questions = [
    "How many samples should I include in a batch?",
    "Can I mix different sequencing platforms?",
    "How do I handle family samples?"
]

for q in questions:
    print(f"Q: {q}")
    response = chat_rules.chat(q)
    print(f"A: {response[:300]}...\n")
    print("=" * 80 + "\n")

## 8. Understanding SV Types

In [None]:
# Get information about different SV types
sv_types = ["DEL", "DUP", "INV", "INS", "BND"]

for sv_type in sv_types:
    response = chat_rules.chat(f"What is a {sv_type}?")
    print(f"Q: What is a {sv_type}?")
    print(f"A: {response}\n")

## 9. Performance and Resource Planning

In [None]:
# Ask about computational requirements
response = chat_rules.chat("What computational resources do I need for 500 samples?")
print("Q: What computational resources do I need for 500 samples?")
print("\nA:", response)

## 10. Interactive Session Example

Here's how you might use the chat interface interactively:

In [None]:
# Simulate an interactive session
conversation = [
    "I'm new to structural variant analysis. Where should I start?",
    "I have 50 whole genome samples at 30x coverage. Is that enough?",
    "Should I run all modules at once or one by one?",
    "How long will the analysis take?",
    "What output files will I get?"
]

print("=== Interactive SV-Agent Session ===")
print()

for i, question in enumerate(conversation, 1):
    print(f"User: {question}")
    response = chat_rules.chat(question)
    print(f"\nSV-Agent: {response[:400]}..." if len(response) > 400 else f"\nSV-Agent: {response}")
    print("\n" + "-" * 80 + "\n")

## Summary

This notebook demonstrated:

1. **Basic chat interface** - Works without any LLM
2. **LLM-enhanced chat** - Natural conversations with Ollama/Gemma
3. **Capability queries** - Understanding what sv-agent can do
4. **Workflow execution** - Running CWL workflows for SV analysis
5. **Module information** - Details about GATK-SV modules
6. **Best practices** - Coverage, batch size, sample selection
7. **Troubleshooting** - Common issues and solutions
8. **Resource planning** - Computational requirements

### Key Commands

- **Convert WDL to CWL**: `sv-agent convert -o output_dir -m ModuleName`
- **Run CWL workflow**: `sv-agent run workflow.cwl inputs.yaml`
- **Interactive chat**: `sv-agent chat`
- **Single question**: `sv-agent ask "your question"`

### Next Steps

1. Convert your GATK-SV WDL workflows to CWL
2. Prepare input YAML files for your samples
3. Run the analysis with sv-agent
4. Use the chat interface for guidance along the way

In [None]:
# Get a final summary of available modules
from sv_agent.knowledge import SVKnowledgeBase

kb = SVKnowledgeBase()
print("Available GATK-SV Modules:")
print("=" * 50)
for module_id, info in kb.modules.items():
    print(f"{module_id}: {info['name']} - {info['purpose']}")