# ov.Agent with Skills: Natural‑Language Single‑Cell Analysis (PBMC3k)

This tutorial demonstrates how to analyze PBMC3k using `ov.Agent` with project Skills. The agent auto‑discovers skills under `.claude/skills` and injects their guidance to produce better, safer code.


## Prerequisites

- omicverse installed in this environment
- Provider API key in env (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`)
- Skills: discovered automatically from both the installed package and your CWD at `.claude/skills/`

> Tip: `print(ov.list_supported_models())` shows supported models and required env vars.


In [None]:

import os
from pathlib import Path
import scanpy as sc
import omicverse as ov

print('OmicVerse version:', getattr(ov, '__version__', 'unknown'))
print(ov.list_supported_models())

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY', '')

if not OPENAI_API_KEY:
    print('Warning: set OPENAI_API_KEY (or relevant provider key) before running live requests.')

# Nice plotting defaults
sc.settings.set_figure_params(dpi=100)


## Load PBMC dataset (with offline fallback)

Attempts `scanpy.datasets.pbmc3k()`; if unavailable, falls back to `pbmc68k_reduced` or a local `PBMC3K_PATH`.


In [None]:
adata = None
local_path = os.environ.get('PBMC3K_PATH')
if local_path and os.path.exists(local_path):
    adata = sc.read_h5ad(local_path)
    print('Loaded local PBMC3k from:', local_path)
else:
    try:
        adata = sc.datasets.pbmc3k()
        print('Loaded Scanpy pbmc3k dataset')
    except Exception as e:
        print('pbmc3k not available:', e)
        try:
            adata = sc.datasets.pbmc68k_reduced()
            print('Loaded fallback pbmc68k_reduced dataset')
        except Exception as e2:
            raise RuntimeError('Could not load a PBMC dataset. Set PBMC3K_PATH to a local .h5ad file.') from e2

adata


## Initialize ov.Agent (skills auto‑loaded)

Pick a supported model and ensure the correct env var is set. The agent will auto‑load project skills and include them in its planning.


In [None]:

# Choose a supported model (ensure matching env var is set)
model_id = 'gpt-5'  # see ov.list_supported_models()
api_key = OPENAI_API_KEY or os.getenv('ANTHROPIC_API_KEY') or os.getenv('GEMINI_API_KEY')
agent = ov.Agent(model=model_id, api_key=api_key)
agent


## Project Skills Preview

Skills are loaded from the installed package and your current project (`.claude/skills/`). Below we list discovered skills and show basic routing.


In [None]:
from omicverse.utils.skill_registry import SkillRouter
# Import the registry builder if it exists, otherwise handle gracefully
try:
    from omicverse.utils.skill_registry import build_multi_path_skill_registry
except ImportError:
    # Fallback: define a simple registry builder or skip this section
    print("Warning: build_multi_path_skill_registry not found")
    build_multi_path_skill_registry = None

if build_multi_path_skill_registry:
    pkg_root = Path(ov.__file__).resolve().parents[1]
    reg = build_multi_path_skill_registry(pkg_root, Path.cwd())
    print(f'Discovered skills: {len(reg.skills)}')
    print('\n'.join(sorted(reg.skills.keys())[:10]))  # show first 10 slugs

    router = SkillRouter(reg)
    matches = router.route('single preprocessing pbmc3k hvg pca neighbors umap', top_k=3)
    for m in matches:
        print(f"match: {m.skill.slug} score={m.score:.3f} — {m.skill.name}")

## Natural‑language pipeline (skill‑guided)

We’ll drive a typical workflow via natural language. The agent will incorporate guidance from the most relevant skill(s) (e.g., `single-preprocessing`).
1. Quality control (filter cells/genes)
2. Preprocess and HVG selection
3. Clustering (Leiden)
4. Compute UMAP and visualize


In [None]:
# 1) Quality control, guided by single-preprocessing skill
adata = agent.run('quality control with nUMI>500, mito<0.2', adata)

# 2) Preprocessing + HVGs
adata = agent.run('preprocess with 2000 highly variable genes using shiftlog|pearson', adata)

# 3) Clustering
adata = agent.run('leiden clustering resolution=1.0', adata)

# 4) UMAP + visualization (agent may also handle plotting)
adata = agent.run('compute umap and plot colored by leiden', adata)

adata

## Manual visualization (optional)

If plotting wasn’t performed by generated code, visualize here.


In [None]:
# Run leiden clustering
sc.tl.leiden(adata)

# Now plot the UMAP
sc.pl.umap(adata, color=['leiden'], wspace=0.4)

## (Optional) Create a skill from docs links

Use `ov.agent.seeker` to scaffold a new skill from documentation links (requires network).


In [None]:
# Example: build a quick skill from a documentation link (uncomment to run)
# info = ov.agent.seeker(
#     ['https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html'],
#     name='hvg-guidance', target='output', package=False
# )
# info

## Next steps

- Adjust QC thresholds or clustering resolutions and re‑prompt the agent.
- Add cell‑type annotation via prompts (see Tutorials‑single for annotation notebooks).
- Customize `.claude/skills/` in your project to steer analysis with your lab’s SOP.
