# 🚀 BIA-Ghostcoder 快速启动演示

> **智能生物信息学分析代理快速体验**

## 📋 使用说明

1. **配置 API 密钥**：在下面的 Cell 中填入你的 OpenAI API Key 和 Tavily API Key
2. **顺序执行**：按顺序运行所有 cell（或使用 "Run All"）
3. **查看结果**：AI 将自动生成并执行单细胞质量控制分析代码

## 🎯 本演示功能

- 📊 **自动数据加载**：使用 PBMC 3k 示例数据集
- 🤖 **智能代码生成**：AI 根据任务描述生成专业的分析代码
- 🔄 **自动执行**：在安全的 Docker 环境中执行代码
- 📈 **结果展示**：显示生成的代码和分析结果

## ⚙️ 环境要求

- Python 3.12+
- Docker（用于代码执行）
- 有效的 OpenAI API 密钥

---

**🚨 重要提示**：请确保在第2个 cell 中正确配置你的 API 密钥！


In [None]:
import os 
import scanpy as sc

from langchain_openai import ChatOpenAI

from ghostcoder import GhostCoder
from ghostcoder.utils import *
from ghostcoder.graph import create_ghostcoder_agent, create_coder_agent, create_crawler_agent, create_rag_agent

In [None]:
# 🤖 配置 LLM 模型
# 请填入你的 API 配置信息
openai_api_key = "your_openai_api_key_here"        # 👈 必填：OpenAI API Key
openai_api_base = "https://api.openai.com/v1"      # 👈 API 基础 URL
openai_chat_model = "gpt-4o"                       # 👈 聊天模型
openai_code_model = "gpt-4o"                       # 👈 代码生成模型

def call_chatllm_openai(api_key, api_base, model_name):
    """创建 OpenAI 兼容的 LLM 实例"""
    llm = ChatOpenAI(
        openai_api_key = api_key,
        openai_api_base=api_base,
        model = model_name,
        temperature=0,
        max_retries=3)
    return llm

# 初始化模型
chat_model = call_chatllm_openai(openai_api_key, openai_api_base, openai_chat_model)
code_model = call_chatllm_openai(openai_api_key, openai_api_base, openai_code_model)

print("✅ LLM 模型配置完成")

In [None]:
# 🌐 配置 Tavily 搜索（可选）
# Tavily 用于网络搜索获取最新的生物信息学方法
tavily_api = "your_tavily_api_key_here"  # 👈 可选：Tavily API Key
os.environ["TAVILY_API_KEY"] = tavily_api

print("✅ Tavily 配置完成" if tavily_api != "your_tavily_api_key_here" else "⚠️  Tavily 未配置（可选）")

In [None]:
# 📊 加载示例数据
# 使用 scanpy 内置的 PBMC 3k 数据集（约 2700 个细胞）
print("📥 正在下载 PBMC 3k 示例数据...")
adata = sc.datasets.pbmc3k()
print(f"✅ 数据加载完成：{adata.n_obs} 个细胞，{adata.n_vars} 个基因")

In [None]:
# 🤖 创建 BIA-Ghostcoder 代理
print("🔧 正在初始化 BIA-Ghostcoder 代理...")

agent = GhostCoder(
    chat_model = chat_model, 
    code_model = code_model,
    max_retry = 3,
    name = "demo_agent",
    debug = False
    )

print("✅ 代理创建完成！")

# 🎨 可视化工作流图（可选）
print("📊 绘制代理工作流图...")
try:
    agent.draw_graph()
    print("✅ 工作流图生成完成")
except Exception as e:
    print(f"⚠️  工作流图生成失败: {e}")
    print("💡 这不影响代理的正常运行")

In [None]:
# Test agent
task = "Quality control of the data. wherein the genes are targeted for labeling, the mitochondrial genes (e.g., beginning with “MT-”), the ribosomal genes (e.g., beginning with “RPS” or “RPL”), and the hemoglobin genes (using regular expression matching, e.g., ^HB[^P]); next, common quality control metrics for each cell, including total counts, number of genes detected, and the proportion of total counts represented by a specific group of genes (e.g., mitochondrial genes), were calculated using Scanpy's calculate_qc_metrics() function, and a The log1p transformation is applied to these metrics to optimize the data distribution. Subsequently, the QC metrics of each cell are visualized by violin plots and scatter plots to assess the overall quality of the data. Finally, a threshold is set based on the visualization results to exclude the cells with fewer than 100 genes and genes occurring in fewer than 3 cells to ensure the quality of the data for downstream analyses."

In [None]:
# 🚀 执行 BIA-Ghostcoder 分析
print("🔬 开始执行智能生物信息学分析...")
print("=" * 50)

try:
    # 包装输入数据
    input_wrap = create_input_wrapper([adata])
    
    # 执行分析任务
    generated_code, execution_result = agent.Run(
        task=task, 
        input_wrap=input_wrap,
        task_id="demo_qc_analysis",
        use_reg=True  # 启用 RAG 检索
    )
    
    print("\n🎉 分析完成！")
    print("=" * 50)
    
    print("\n📝 生成的代码：")
    print("-" * 30)
    print(generated_code)
    
    print("\n📊 执行结果：")
    print("-" * 30)
    print(execution_result)
    
except Exception as e:
    print(f"❌ 执行出错：{str(e)}")
    print("💡 请检查配置和网络连接")