# SpatialAgent

An AI agent for spatial biology.

## Setup

In [None]:
# If you are working from a directory outside the SpatialAgent repo root,
# uncomment and adjust the following lines:
#
# import sys
# sys.path.insert(0, "path/to/SpatialAgent/")
#
# from spatialagent.agent import SpatialAgent, make_llm
# llm = make_llm("claude-sonnet-4-5-20250929")
# agent = SpatialAgent(llm=llm, data_path="path/to/SpatialAgent/data", save_path="path/to/save/results/")

from spatialagent.agent import SpatialAgent, make_llm

# Initialize (supports Azure OpenAI, OpenAI, Claude, AWS Bedrock, Gemini)
llm = make_llm("claude-sonnet-4-5-20250929")
agent = SpatialAgent(llm=llm, save_path="./experiments/")

In [2]:
result = agent.run(
    """Design a 50-gene panel for mouse prostate cancer models that captures tumor state, immune process, and tissue context."""
)

[1m<user query>[0m
Design a 50-gene panel for mouse prostate cancer models that captures tumor state, immune process, and tissue context.
[1m</user query>[0m

[1m<skill>[0m retrieved panel_design [1m</skill>[0m

[1m<skill-tools>[0m query_tissue_expression; validate_genes_expression; query_pubmed; query_celltype_genesets; search_czi_datasets; extract_czi_markers; search_panglao [1m</skill-tools>[0m

[1m<tool>[0m selected query_tissue_expression; inspect_tool_code; execute_bash; validate_genes_expression; query_celltype_genesets; search_czi_datasets; execute_python; query_pubmed; search_panglao; extract_czi_markers; web_search; summarize_celltypes; search_semantic_scholar; search_cellmarker2; aggregate_gene_voting; annotate_cell_types; fetch_supplementary_from_doi; annotate_tissue_niches; query_disease_genes; summarize_tissue_regions; extract_url_content [1m</tool>[0m

I'll design a comprehensive 50-gene panel for mouse prostate cancer models that captures tumor state, im

INFO:gget.utils:Performing Enrichr analysis using database PanglaoDB_Augmented_2021.


[94m<observation>[0m
Output:

=== Defining Canonical Markers ===
Saved 57 canonical markers from literature

=== Step 2: Query Cell Type Gene Sets ===
Cell type gene sets relevant to prostate (from PanglaoDB):

  Pericytes:
    P-value: 8.98e-06
    Marker genes: ACTA2, PECAM1, VIM

  Monocytes:
    P-value: 1.31e-05
    Marker genes: PTPRC, PECAM1, CD68

  Hematopoietic Stem Cells:
    P-value: 1.36e-05
    Marker genes: PTPRC, PECAM1, CD68

  Osteoclast Precursor Cells:
    P-value: 4.19e-04
    Marker genes: PTPRC, CD68

  Stromal Cells:
    P-value: 5.63e-04
    Marker genes: ACTA2, PECAM1

  Microglia:
    P-value: 9.11e-04
    Marker genes: PTPRC, PECAM1

  Macrophages:
    P-value: 1.51e-03
    Marker genes: PTPRC, CD68

  Fibroblasts:
    P-value: 1.95e-03
    Marker genes: ACTA2, VIM

  Airway Smooth Muscle Cells:
    P-value: 2.93e-02
    Marker genes: ACTA2

  Pulmonary Vascular Smooth Muscle Cells:
    P-value: 2.93e-02
    Marker genes: ACTA2

  Trophoblast Stem Cells:
 

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/215 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/313 [00:00<?, ?B/s]



[94m<observation>[0m
Output:

=== Step 3: Search CZI Reference Datasets ===

Trying query: Mus musculus prostate normal
CZI CELLxGENE Search Results for 'Mus musculus prostate normal':

Dataset 1:
  dataset_id: a13bda79-9134-46c9-9ed1-a2858be9aafe
  dataset_title: Mouse Male
  collection_name: Single-cell roadmap of human gonadal development
  organism: Mus musculus
  tissue: gonad
  disease: normal
  similarity_score: 0.778

Dataset 2:
  dataset_id: d77ec7d6-ef2e-49d6-9e79-05b7f8881484
  dataset_title: Tabula Sapiens - Prostate
  collection_name: Tabula Sapiens
  organism: Homo sapiens
  tissue: prostate gland
  disease: normal
  similarity_score: 0.773

Dataset 3:
  dataset_id: b47eaa46-508d-4817-8a75-5bece9ea30f9
  dataset_title: Whole dataset: Normalized subset 4
  collection_name: A single-cell transcriptional timelapse of mouse embryonic development, from gastrula to pup
  organism: Mus musculus
  tissue: embryo
  disease: normal
  similarity_score: 0.765
CZI CELLxGENE Search R

INFO:gget.utils:Fetching the tissue expression atlas of AR from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of NKX3-1 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of KRT8 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of KRT5 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of CD3E from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of CD8A from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of CD68 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of ACTA2 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of PECAM1 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of PTEN from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of MKI67 from human ARCHS4 data.


[94m<observation>[0m
Output:

=== Step 11: Validate Key Genes Expression in Prostate ===

Validating expression of key markers in prostate tissue:
Tissue expression for Ar (top 5 tissues by median TPM):
  System.Connective Tissue.Adipose tissue.ADIPOSE: 10.75 TPM
  System.Digestive System.Liver.LIVER: 10.36 TPM
  System.Muscular System.Skeletal muscle.SKELETAL MUSCLE: 9.96 TPM
  System.Digestive System.Liver.HEPATOCYTE: 9.68 TPM
  System.Cardiovascular System.Heart.VENTRICLE: 9.40 TPM

Ar:
Tissue expression for Ar (top 5 tissues by median TPM):
  System.Connective Tissue.Adipose tissue.ADIPOSE: 10.75 TPM
  System.Digestive System.Liver.LIVER: 10.36 TPM
  System.Muscular System.Skeletal muscle.SKELETAL MUSCLE: 9.96 TPM
  System.Digestive System.Liver.HEPATOCYTE: 9.68 TPM
  System.Cardiovascular System.Heart.VENTRICLE: 9.40 TPM
Tissue expression for Nkx3-1 (top 5 tissues by median TPM):
  System.Urogenital/Reproductive System.Ovary.OOCYTE: 8.35 TPM
  System.Urogenital/Reproductive Syst

[1m</conclude>[0m


Cost Summary (claude-sonnet-4-5-20250929)
Total calls:    15
Input tokens:   287,408
Output tokens:  8,462
Total tokens:   295,870
Total cost:     $0.9892



In [None]:
result = agent.run(
    """
    I have a MERFISH mouse liver dataset at './data/example_merfish.h5ad'.
    
    Please:
    1. Load and preprocess the data (normalize, find variable genes)
    3. Annotate cell types
    3. Run UTAG spatial clustering to identify tissue regions
    4. Generate a summary report of the tissue composition
    """,
    config={"thread_id": "annotation_demo"}
)

In [None]:
# continue the same conversation with the same thread_id

result = agent.run(
    """ What are the most interesting cell types and how does it change across different conditions? """,
    config={"thread_id": "annotation_demo"}
)