# ov.Agent Ten-Task Regression Suite

This notebook instantiates **ov.Agent** once and then reuses it across ten composite tasks that span end-to-end processing, multi-batch integration, trajectory inference, multiome alignment, tumor microenvironment analytics, and spatial modeling. Each task pairs a carefully worded ov.Agent prompt with lightweight validation hooks so you can quickly benchmark whether the agent still covers the relevant skills.

## Notebook structure

1. Bootstrap ov.Agent with the same diagnostics used in `Tutorials-llm/t_ov_agent_pbmc3k.ipynb`.
2. Load the [`ov_agent_tutorial_data_sources.md`](../../OvIntelligence/ov_agent_tutorial_data_sources.md) manifest so every task can remind you which local `.h5ad` files or remote links must be staged.
3. Define ten task specs (prompt + datasets + keyword checks) and a helper `run_task` wrapper that:
   - prints data reminders via `ensure_data(...)`
   - calls `agent.chat(prompt)`
   - records simple keyword-based health checks in `TASK_LOG`
4. Optionally loop through all specs to stress-test ov.Agent end to end, then summarize pass/fail status in a scoreboard.

In [None]:
import sys, omicverse as ov
print(sys.executable)
print('omicverse', getattr(ov, '__version__', 'unknown'), ov.__file__)


In [None]:
import os
from pathlib import Path
import json
from datetime import datetime
from IPython.display import Markdown, display
import scanpy as sc
import omicverse as ov

print('OmicVerse version:', getattr(ov, '__version__', 'unknown'))
print('Supported models:', ov.list_supported_models())

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY', '')
ANTHROPIC_API_KEY = os.getenv('ANTHROPIC_API_KEY', '')
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY', '')

if not (OPENAI_API_KEY or ANTHROPIC_API_KEY or GEMINI_API_KEY):
    print('Warning: set OPENAI_API_KEY / ANTHROPIC_API_KEY / GEMINI_API_KEY before running live ov.Agent requests.')

model_id = 'gpt-5'
api_key = OPENAI_API_KEY or ANTHROPIC_API_KEY or GEMINI_API_KEY
agent = ov.Agent(model=model_id, api_key=api_key)
agent


## Data source manifest helper

`ensure_data(...)` looks up tutorial references inside `OvIntelligence/ov_agent_tutorial_data_sources.md` so that each task can remind you which `.h5ad` or model files should be prepared ahead of time.

In [None]:
DATA_SOURCE_PATH = Path('OvIntelligence/ov_agent_tutorial_data_sources.md')

def _parse_data_source_table():
    mapping = {}
    if not DATA_SOURCE_PATH.exists():
        return mapping
    current_section = None
    for raw_line in DATA_SOURCE_PATH.read_text().splitlines():
        line = raw_line.strip()
        if line.startswith('## '):
            current_section = line[3:].strip()
            continue
        if not line.startswith('| `Tutorial'):
            continue
        parts = [p.strip() for p in line.strip('|').split('|')]
        if len(parts) < 4:
            continue
        tutorial_ref = parts[0].strip('`')
        mapping[tutorial_ref] = {
            'files': parts[1],
            'size': parts[2],
            'hint': parts[3],
            'section': current_section,
        }
    return mapping

DATA_SOURCES = _parse_data_source_table()
print(f'Loaded {len(DATA_SOURCES)} tutorial rows from {DATA_SOURCE_PATH}')


In [None]:
def ensure_data(tutorial_refs):
    """Print local-file reminders for the given tutorial references."""
    refs = tutorial_refs
    if isinstance(refs, str):
        refs = [refs]
    display(Markdown('**Data reminders**'))
    for ref in refs:
        info = DATA_SOURCES.get(ref)
        if info:
            display(Markdown(
                f'- `{ref}` (section: {info.get("section", "n/a")}):
'
                f'  - Files/models: {info.get("files", "see tutorial")}
'
                f'  - Approx. size: {info.get("size", "n/a")}
'
                f'  - Source hint: {info.get("hint", "see tutorial")}'
            ))
        else:
            display(Markdown(f'- `{ref}`: not found in manifest — consult the tutorial README.'))


## Task catalog

Each entry specifies a natural-language ov.Agent prompt, the tutorial references whose data are required, and lightweight keyword checks. The prompts intentionally call out the tutorials so the agent can leverage progressive skill disclosure.

In [None]:
TASK_SPECS = [
    {
        'name': 'PBMC 5k/8k FASTQ → QC → cluster-stability benchmarking',
        'datasets': [
            'Tutorials-single/t_alignment_1k.ipynb',
            'Tutorials-single/t_preprocess.ipynb',
            'Tutorials-single/t_cluster.ipynb',
        ],
        'prompt': """You are ov.Agent orchestrating an end-to-end PBMC 5k/8k workflow.
1. Start from raw 10x FASTQs and reuse the kb-python alignment recipe from `Tutorials-single/t_alignment_1k.ipynb` to output `adata.h5ad` count matrices.
2. Run the PBMC3k-style QC + HVG + scaling pipeline from `Tutorials-single/t_preprocess.ipynb`, citing the concrete Scanpy calls and recommended thresholds.
3. Stress-test Leiden, Louvain, Gaussian mixture, and LDA clustering heads at multiple resolutions as shown in `Tutorials-single/t_cluster.ipynb`, summarize UMAP drift scores for each head, and recommend the most stable resolution.
4. Return the sequence of code/CLI blocks plus a markdown table that lists resolution, cluster head, UMAP drift (0–1), and stability notes.""",
        'checks': [
            {'description': 'Mentions kb-python alignment', 'keywords': ['kb-python', 'FASTQ']},
            {'description': 'Reports UMAP drift or stability table', 'keywords': ['UMAP', 'drift']},
        ],
    },
    {
        'name': 'Pancreas multi-study merge with SIMBA embeddings',
        'datasets': [
            'Tutorials-single/t_single_batch.ipynb',
            'Tutorials-single/t_simba.ipynb',
        ],
        'prompt': """Fuse the Baron, Segerstolpe, and Muraro pancreas scRNA-seq cohorts.
- Follow the donor harmonization guidance from `Tutorials-single/t_single_batch.ipynb` to align preprocessing choices (normalization, HVGs, regressions).
- Apply the SIMBA graph-embedding pipeline from `Tutorials-single/t_simba.ipynb`, explicitly showing the commands that build the heterogeneous graph, train the embedding, and project to 2D.
- Provide before/after UMAPs (describe filenames), kBET, and silhouette metrics while commenting on endocrine vs. exocrine separation.
- Deliver a concise explanation of how to export the learned embeddings for downstream classifiers.""",
        'checks': [
            {'description': 'References SIMBA graph embedding', 'keywords': ['SIMBA', 'embedding']},
            {'description': 'Includes kBET or silhouette metrics', 'keywords': ['kBET', 'silhouette']},
        ],
    },
    {
        'name': 'Paul15 hematopoietic trajectories with MetaTiME diagnostics',
        'datasets': [
            'Tutorials-single/t_traj.ipynb',
            'Tutorials-single/t_metatime.ipynb',
        ],
        'prompt': """Using Paul15-like hematopoietic data:
1. Run diffusion maps, PAGA, and Palantir/VIA as described in `Tutorials-single/t_traj.ipynb` to recover megakaryocyte vs. lymphoid branches.
2. Identify root/terminal states, compute pseudotime, and extract branch marker tables for at least two fates.
3. Invoke the MetaTiME checklist from `Tutorials-single/t_metatime.ipynb` to score cycling programs and flag any branch-specific oscillations.
4. Summarize outputs as (a) code blocks, (b) descriptions of saved plots, and (c) a markdown summary of pseudotime milestones.""",
        'checks': [
            {'description': 'Mentions diffusion/Palantir', 'keywords': ['diffusion', 'Palantir']},
            {'description': 'Cites MetaTiME diagnostics', 'keywords': ['MetaTiME', 'cycling']},
        ],
    },
    {
        'name': 'PBMC Multiome 10k GLUE + MOFA factor discovery',
        'datasets': [
            'Tutorials-single/t_mofa_glue.ipynb',
            'Tutorials-single/t_mofa.ipynb',
        ],
        'prompt': """Perform cross-modal alignment for PBMC Multiome 10k.
- Pair RNA and ATAC embeddings with GLUE (per `Tutorials-single/t_mofa_glue.ipynb`) and report the path to the paired cell metadata.
- Train MOFA on the matched matrices (see `Tutorials-single/t_mofa.ipynb` and `t_mofa_glue`) and differentiate shared, RNA-only, and ATAC-only factors with variance explained tables.
- Highlight at least one IFN-response and one chromatin-accessibility-specific factor, including top marker genes/peaks.
- Provide code snippets plus interpretation bullets for every factor category.""",
        'checks': [
            {'description': 'GLUE pairing mentioned', 'keywords': ['GLUE', 'pair']},
            {'description': 'MOFA factor summaries provided', 'keywords': ['MOFA', 'factor']},
        ],
    },
    {
        'name': 'PBMC 5k scATAC label transfer via GLUE embeddings',
        'datasets': [
            'Tutorials-single/t_anno_trans.ipynb',
        ],
        'prompt': """Transfer PBMC RNA annotations onto PBMC 5k scATAC cells.
- Load the GLUE-derived RNA/ATAC embeddings (`data/analysis_lymph/rna-emb.h5ad` and `data/analysis_lymph/atac-emb.h5ad`).
- Build the cross-modal KNN graph exactly as outlined in `Tutorials-single/t_anno_trans.ipynb` and migrate labels with confidence scores.
- Report per-cluster agreement, flag potential mismatches, and describe how to visualize transferred labels on an ATAC UMAP.
- Return both the python commands and a markdown table of cluster vs. confidence.""",
        'checks': [
            {'description': 'Mentions cross-modal KNN transfer', 'keywords': ['KNN', 'transfer']},
            {'description': 'Reports confidence per cluster', 'keywords': ['confidence', 'cluster']},
        ],
    },
    {
        'name': 'Tumor microenvironment ligand–receptor diagnostics (CellPhoneDBViz)',
        'datasets': [
            'Tutorials-single/t_cellphonedb.ipynb',
        ],
        'prompt': """For a treated vs. untreated tumor microenvironment dataset:
- Follow `Tutorials-single/t_cellphonedb.ipynb` to format metadata, run CellPhoneDB, and visualize ligand–receptor usage.
- Emphasize exhausted T cells vs. M2 macrophages, reporting top LR pairs per condition along with effect sizes or p-values.
- Describe how to generate both heatmaps and chord diagrams using CellPhoneDBViz utilities, including filenames for saved plots.
- Provide guidance on interpreting shifts between conditions.""",
        'checks': [
            {'description': 'CellPhoneDB workflow described', 'keywords': ['CellPhoneDB', 'ligand']},
            {'description': 'Mentions visualization artifacts', 'keywords': ['heatmap', 'chord']},
        ],
    },
    {
        'name': 'MetaTiME-driven immune microenvironment annotation',
        'datasets': [
            'Tutorials-single/t_metatime.ipynb',
        ],
        'prompt': """Annotate tumor-infiltrating immune cells with MetaTiME.
- Optionally remove malignant cells using inferCNV outputs, then recompute neighbors in SCVI space per `Tutorials-single/t_metatime.ipynb`.
- Run MetaTiME scoring, rank meta-components per cluster, and interpret dominant immune states (e.g., IFN-high, cycling, antigen-presenting).
- Output code for preprocessing + scoring and finish with a markdown report that lists cluster → top meta-component plus interpretation.""",
        'checks': [
            {'description': 'Mentions MetaTiME scoring', 'keywords': ['MetaTiME', 'meta-component']},
            {'description': 'References SCVI neighbors/inferCNV filtering', 'keywords': ['SCVI', 'inferCNV']},
        ],
    },
    {
        'name': 'CEFCON driver regulator discovery',
        'datasets': [
            'Tutorials-single/t_cellfate.ipynb',
        ],
        'prompt': """Discover lineage-specific driver regulators with CEFCON.
- Use the preprocessing + prior network setup described in `Tutorials-single/t_cellfate.ipynb` for Nestorowa/Paul15 hematopoiesis.
- Load the NicheNet prior graph, run `ov.single.pyCEFCON`, and export regulon tables for at least two branches (erythroid vs. granulocyte, for example).
- Provide tips on parameter tuning (walk length, regularization) and show how to visualize regulator activity heatmaps.
- Summarize key regulators per branch in markdown.""",
        'checks': [
            {'description': 'Mentions CEFCON run + prior network', 'keywords': ['CEFCON', 'NicheNet']},
            {'description': 'Reports regulon or regulator tables', 'keywords': ['regulon', 'regulator']},
        ],
    },
    {
        'name': 'Precision oncology prioritization (inferCNV + scDrug)',
        'datasets': [
            'Tutorials-single/t_scdrug.ipynb',
        ],
        'prompt': """Combine inferCNV-based malignant calling with scDrug predictions.
- Following `Tutorials-single/t_scdrug.ipynb`, run infercnvpy to separate malignant vs. normal cells and mention the CNV heatmap artifact.
- Feed malignant clones into scDrug, compute predicted IC50 values, and rank at least five compounds per clone.
- Provide code for exporting the ranking table, plus guidance on cross-referencing copy-number context when interpreting drug hits.""",
        'checks': [
            {'description': 'inferCNV workflow described', 'keywords': ['inferCNV', 'CNV']},
            {'description': 'scDrug IC50 ranking reported', 'keywords': ['scDrug', 'IC50']},
        ],
    },
    {
        'name': 'Spatial pseudo-time + alignment (SpaceFlow + STAligner)',
        'datasets': [
            'Tutorials-space/t_spaceflow.ipynb',
            'Tutorials-space/t_staligner.ipynb',
        ],
        'prompt': """Build spatial pseudo-time for Visium DLPFC slice 151676 and align adjacent slices.
- Run SpaceFlow exactly as in `Tutorials-space/t_spaceflow.ipynb` to compute embeddings, domains, and pseudo-spatiotemporal maps (pSM), referencing the `151676_filtered_feature_bc_matrix.h5` input.
- Feed at least two consecutive slices plus their embeddings into STAligner per `Tutorials-space/t_staligner.ipynb`, enabling the triplet-loss GAT to align cortical layers.
- Report output filenames for aligned embeddings and highlight conserved layers across slices in markdown.""",
        'checks': [
            {'description': 'SpaceFlow pseudo-time mentioned', 'keywords': ['SpaceFlow', 'pSM']},
            {'description': 'STAligner alignment described', 'keywords': ['STAligner', 'alignment']},
        ],
    },
]
print(f'{len(TASK_SPECS)} task specs loaded.')


## Task runner utilities

`run_task(...)` displays context, ensures data availability reminders, calls ov.Agent, and stores keyword-check results inside `TASK_LOG` for later summarization. Adjust `RESPONSE_PARSER` if your deployment returns structured JSON instead of plain text.

In [None]:
TASK_LOG = []

def evaluate_response_text(response_text, checks):
    text = response_text or ''
    lower = text.lower()
    results = []
    for check in checks:
        keywords = [kw.lower() for kw in check.get('keywords', [])]
        passed = all(kw in lower for kw in keywords)
        results.append({
            'description': check.get('description', ''),
            'keywords': keywords,
            'passed': passed,
        })
    return results


def run_task(spec):
    display(Markdown(f"## Task: {spec['name']}"))
    ensure_data(spec.get('datasets', []))
    prompt = spec['prompt']
    response = agent.chat(prompt)
    if isinstance(response, dict):
        response_text = json.dumps(response, indent=2)
    else:
        response_text = str(response)
    checks = evaluate_response_text(response_text, spec.get('checks', []))
    TASK_LOG.append({
        'name': spec['name'],
        'timestamp': datetime.utcnow().isoformat() + 'Z',
        'checks': checks,
        'response_preview': response_text[:2000],
    })
    display(Markdown('**Response preview**'))
    display(Markdown(response_text[:2000]))
    display(Markdown('**Check results**'))
    for chk in checks:
        status = '✅' if chk['passed'] else '❌'
        display(Markdown(f"- {status} {chk['description']} (keywords: {', '.join(chk['keywords'])})"))
    return response


## Execute the full suite

Set `RUN_ALL = True` to launch all ten prompts sequentially. You can also call `run_task(TASK_SPECS[i])` manually for spot checks.

In [None]:
RUN_ALL = False

if RUN_ALL:
    for spec in TASK_SPECS:
        run_task(spec)


### Task 1: PBMC 5k/8k FASTQ → QC → cluster-stability benchmarking
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[0])

### Task 2: Pancreas multi-study merge with SIMBA embeddings
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[1])

### Task 3: Paul15 hematopoietic trajectories with MetaTiME diagnostics
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[2])

### Task 4: PBMC Multiome 10k GLUE + MOFA factor discovery
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[3])

### Task 5: PBMC 5k scATAC label transfer via GLUE embeddings
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[4])

### Task 6: Tumor microenvironment ligand–receptor diagnostics (CellPhoneDBViz)
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[5])

### Task 7: MetaTiME-driven immune microenvironment annotation
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[6])

### Task 8: CEFCON driver regulator discovery
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[7])

### Task 9: Precision oncology prioritization (inferCNV + scDrug)
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[8])

### Task 10: Spatial pseudo-time + alignment (SpaceFlow + STAligner)
Run this cell to execute only this scenario. The helper below will print data reminders and log the ov.Agent response.

In [None]:
run_task(TASK_SPECS[9])

## Scoreboard

After running one or more tasks, execute the cell below to summarize keyword checks and preview truncated responses.

In [None]:
from pprint import pprint

if not TASK_LOG:
    print('No tasks have been executed yet.')
else:
    summary = []
    for entry in TASK_LOG:
        summary.append({
            'name': entry['name'],
            'checks_passed': sum(1 for chk in entry['checks'] if chk['passed']),
            'checks_total': len(entry['checks']),
        })
    pprint(summary)
