# Panel Design Task Test

Test the panel design workflow with 3 iterations:
1. Retrieve 3 CZI reference datasets
2. For each iteration: extract cell types, search PanglaoDB, search CellMarker2, score gene importance
3. Aggregate results from all 3 iterations
4. Save final panel to CSV

**No spatial data required** - uses database queries only.

## Expected Output Files
- `czi_reference_celltype_{1,2,3}.csv` - Cell types from CZI
- `pangdb_celltype_{1,2,3}.csv` - PanglaoDB markers
- `cellmarker_celltype_{1,2,3}.csv` - CellMarker2 markers
- `iter{1,2,3}_importance_score.csv` - Per-iteration scores
- `final_gene_panel.csv` - Final aggregated panel

## 1. Setup

In [1]:
import os, sys, shutil

# Working directory is project root
work_dir = "/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx"
os.chdir(work_dir)
sys.path.insert(0, work_dir)

# Create fresh test directory
test_save_path = os.path.join(work_dir, "experiments", "test_panel_design_new")
if os.path.exists(test_save_path):
    shutil.rmtree(test_save_path)
os.makedirs(test_save_path)

print(f"Working directory: {os.getcwd()}")
print(f"Output directory: {test_save_path}")

Working directory: /gnet/is1/p01/shares/regevlab/hanchen/Agent_dev/spatialagent_dev_agx
Output directory: /home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/experiments/test_panel_design_new


## 2. Initialize Agent

In [2]:
from spatialagent.agent import SpatialAgent, make_llm

# llm = make_llm("claude-sonnet-4-5-20250929")
# llm = make_llm("gpt-4.1")
# llm = make_llm("gemini-3-pro-preview")
llm = make_llm("gemini-2.5-pro")

agent = SpatialAgent(
    llm=llm,
    save_path=test_save_path,
    data_path="./data",
)

print(f"Agent ready with {len(agent.tool_registry.tools)} tools")
print(f"Skills available: {agent.skill_manager.list_skills()}")

Auto-loading tools from tool modules...
  Loaded: download_czi_reference (database)
  Loaded: extract_czi_markers (database)
  Loaded: query_celltype_genesets (database)
  Loaded: query_tissue_expression (database)
  Loaded: search_cellmarker2 (database)
  Loaded: search_czi_datasets (database)
  Loaded: search_panglao (database)
  Loaded: validate_genes_expression (database)
  Loaded: extract_pdf_content (literature)
  Loaded: extract_url_content (literature)
  Loaded: fetch_supplementary_from_doi (literature)
  Loaded: query_arxiv (literature)
  Loaded: query_pubmed (literature)
  Loaded: query_scholar (literature)
  Loaded: search_google (literature)
  Loaded: aggregate_gene_voting (analytics)
  Loaded: harmony_transfer_labels (analytics)
  Loaded: infer_cell_cell_interactions (analytics)
  Loaded: infer_dynamics (analytics)
  Loaded: preprocess_spatial_data (analytics)
  Loaded: run_utag_clustering (analytics)
  Loaded: summarize_celltypes (analytics)
  Loaded: summarize_conditions

## 3. Run Panel Design Task

Design a gene panel for human liver tissue with explicit requirements.

In [3]:
# Panel design parameters
TISSUE = "liver"
SPECIES = "human"
PANEL_SIZE = 50  # Target number of genes in final panel

# Simple query - skill should provide the 3-iteration workflow details
query = f"""
Design a gene panel of {PANEL_SIZE} genes for {SPECIES} {TISSUE} spatial transcriptomics.
Save results to '{test_save_path}'.
"""

print(f"Panel Design Parameters:")
print(f"  Tissue: {TISSUE}")
print(f"  Species: {SPECIES}")
print(f"  Panel size: {PANEL_SIZE} genes")
# print(f"Expected: Skill 'panel_design' should guide the 3-iteration workflow")
print("=" * 60)

result = agent.run(query, config={"thread_id": "panel_design"})

Panel Design Parameters:
  Tissue: liver
  Species: human
  Panel size: 50 genes
Expected: Skill 'panel_design' should guide the 3-iteration workflow
[1m<user query>[0m 
Design a gene panel of 50 genes for human liver spatial transcriptomics.
Save results to '/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/experiments/test_panel_design_new'.
 [1m</user query>[0m

[1m<tool>[0m retrieved search_cellmarker2; preprocess_spatial_data; search_panglao; validate_genes_expression; aggregate_gene_voting; query_celltype_genesets; query_tissue_expression; download_czi_reference; annotate_tissue_niches; finalize_gene_panel [1m</tool>[0m

[1m<skill>[0m retrieved panel_design [1m</skill>[0m

I will design a 50-gene panel for human liver spatial transcriptomics by following the provided single-pass workflow.

First, I will perform a literature search using `query_pubmed` to identify well-established marker genes from recent single-cell RNA-seq studies of the human liver. This will fo

INFO:gget.utils:Performing Enrichr analysis using database PanglaoDB_Augmented_2021.


[94m<observation>[0m
Output:
Cell type gene sets relevant to liver (from PanglaoDB):

  Follicular Cells:
    P-value: 4.43e-04
    Marker genes: ALB, CYP3A4

  Hepatoblasts:
    P-value: 4.76e-04
    Marker genes: KRT19, ALB

  Hepatic Stellate Cells:
    P-value: 6.37e-04
    Marker genes: ACTA2, ALB

  Cholangiocytes:
    P-value: 6.57e-04
    Marker genes: KRT19, ALB

  Epsilon Cells:
    P-value: 6.96e-04
    Marker genes: ALB, CYP3A4

  Proximal Tubule Cells:
    P-value: 9.81e-04
    Marker genes: ALB, CYP3A4

  Hepatocytes:
    P-value: 1.77e-03
    Marker genes: ALB, CYP3A4

  Airway Smooth Muscle Cells:
    P-value: 2.93e-02
    Marker genes: ACTA2

  Pulmonary Vascular Smooth Muscle Cells:
    P-value: 2.93e-02
    Marker genes: ACTA2

  Trophoblast Stem Cells:
    P-value: 2.93e-02
    Marker genes: KRT19

[94m</observation>[0m


The `query_celltype_genesets` tool provided a good starting point with markers for several key liver cell types like Hepatocytes (ALB, CYP3A4)

Scoring genes:   0%|          | 0/13 [00:00<?, ?it/s]

[94m<observation>[0m
Error executing code:
IndexError: single positional indexer is out-of-bounds

Traceback (most recent call last):
  File "/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/spatialagent/tool/coding.py", line 193, in execute
    result = eval(code, self.namespace)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1
    score_gene_importance_result = score_gene_importance({
                                 ^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/spatialagent/tool/coding.py", line 196, in execute
    exec(code, self.namespace)
  File "<string>", line 1, in <module>
  File "/home/wangh256/miniforge3/envs/spatial_agent/lib/python3.12/site-packages/langchain_core/tools/base.py", line 605, in invoke
    return self.run(tool_input, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wang

Scoring genes:   0%|          | 0/13 [00:00<?, ?it/s]

[94m<observation>[0m
Error executing code:
IndexError: single positional indexer is out-of-bounds

Traceback (most recent call last):
  File "/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/spatialagent/tool/coding.py", line 193, in execute
    result = eval(code, self.namespace)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1
    score_gene_importance_result = score_gene_importance({
                                 ^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/spatialagent/tool/coding.py", line 196, in execute
    exec(code, self.namespace)
  File "<string>", line 1, in <module>
  File "/home/wangh256/miniforge3/envs/spatial_agent/lib/python3.12/site-packages/langchain_core/tools/base.py", line 605, in invoke
    return self.run(tool_input, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wang

INFO:gget.utils:Fetching the tissue expression atlas of FAM178B from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of TRBC1 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of GATA4 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of KIFC3 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of ABHD17A from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of RIPOR2 from human ARCHS4 data.
ERROR:gget.utils:Gene 'RIPOR2' did not return any tissue expression results. 
If the gene is an Ensembl ID, please set argument 'ensembl=True' (for terminal, add flag: [--ensembl]).
INFO:gget.utils:Fetching the tissue expression atlas of CD247 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of BTG1 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of EAF2 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expre

[94m<observation>[0m
Error executing code:
TypeError: string indices must be integers, not 'str'

Traceback (most recent call last):
  File "/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/spatialagent/tool/coding.py", line 193, in execute
    result = eval(code, self.namespace)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1
    import pandas as pd
    ^^^^^^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/spatialagent/tool/coding.py", line 196, in execute
    exec(code, self.namespace)
  File "<string>", line 58, in <module>
TypeError: string indices must be integers, not 'str'

[94m</observation>[0m


The script failed with a `TypeError: string indices must be integers, not 'str'`. This happened when trying to process the `validation_result`. The `validate_genes_expression` tool returns a dictionary, but my cod

INFO:gget.utils:Fetching the tissue expression atlas of FAM178B from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of TRBC1 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of GATA4 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of KIFC3 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of ABHD17A from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of RIPOR2 from human ARCHS4 data.
ERROR:gget.utils:Gene 'RIPOR2' did not return any tissue expression results. 
If the gene is an Ensembl ID, please set argument 'ensembl=True' (for terminal, add flag: [--ensembl]).
INFO:gget.utils:Fetching the tissue expression atlas of CD247 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of BTG1 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expression atlas of EAF2 from human ARCHS4 data.
INFO:gget.utils:Fetching the tissue expre

[94m<observation>[0m
Output:
Found 1015 unique candidate genes.
--- Raw Validation Result ---
Gene expression validation for 'liver':

✓ Expressed (675/1015): TRBC1 (TPM: 2.9), GATA4 (TPM: 8.4), KIFC3 (TPM: 9.7), ABHD17A (TPM: 9.7), CD247 (TPM: 3.8), BTG1 (TPM: 10.9), ASGR1 (TPM: 11.3), AZGP1 (TPM: 12.8), SEC61G (TPM: 9.6), CLEC1B (TPM: 5.3), IL1R1 (TPM: 11.0), CMTM6 (TPM: 10.8), NR5A2 (TPM: 9.2), PPP1R14B (TPM: 10.1), SPCS1 (TPM: 10.1), CLEC4G (TPM: 4.2), CHST3 (TPM: 9.5), EPSTI1 (TPM: 7.0), SELPLG (TPM: 6.0), C3AR1 (TPM: 6.4), SNX2 (TPM: 9.9), IGF2 (TPM: 12.1), APOA2 (TPM: 14.1), CPA3 (TPM: 2.4), MAGI3 (TPM: 8.3), SMIM1 (TPM: 4.7), ANGPTL2 (TPM: 10.2), CYP3A4 (TPM: 11.5), ALOX5AP (TPM: 4.8), KDR (TPM: 8.4), CLMN (TPM: 10.0), CD300LG (TPM: 2.4), C1QA (TPM: 9.5), SLC28A3 (TPM: 3.9), PLEK (TPM: 7.0), IGHA2 (TPM: 3.6), GZMH (TPM: 1.2), LRAT (TPM: 5.9), FABP1 (TPM: 12.9), ITGA5 (TPM: 13.2), ERG (TPM: 6.8), CD53 (TPM: 7.8), ADGRL4 (TPM: 7.8), EDEM1 (TPM: 10.8), APOC4 (TPM: 7.9), HGF (TPM

Summarizing gene reasons:   0%|          | 0/50 [00:00<?, ?it/s]

[94m<observation>[0m
Output:
Successfully created '/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/experiments/test_panel_design_new/iter1_importance_score.csv' with all required columns.
Successfully created final gene panel with 50 genes. Saved to /home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/experiments/test_panel_design_new/final_gene_panel.csv

[94m</observation>[0m


I have successfully generated the final 50-gene panel for human liver spatial transcriptomics. After encountering and resolving several tool-related errors by manually creating the necessary intermediate files with the correct format, the `finalize_gene_panel` tool executed successfully.

The final panel is saved at `/home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/experiments/test_panel_design_new/final_gene_panel.csv`.

I will now display the contents of the final gene panel to conclude the task.

[91m<act>[0m
#!BASH
cat /home/wangh256/hanchen/Agent_dev/spatialagent_dev_agx/experiments/test

[1m</conclude>[0m



💰 Cost Summary (gemini-2.5-pro)
Total calls:    22
Input tokens:   515,951
Output tokens:  82,059
Total tokens:   598,010
Total cost:     $1.4655



In [None]:
result = agent.run("what is the top 10 genes that are most important?", config={"thread_id": "panel_design"})

## 4. Verify Results

In [None]:
import os
import pandas as pd

print("Generated files:")
print("=" * 50)

files_found = []
for root, dirs, files in os.walk(test_save_path):
    level = root.replace(test_save_path, '').count(os.sep)
    indent = '  ' * level
    print(f"{indent}{os.path.basename(root)}/")
    for file in files:
        filepath = os.path.join(root, file)
        size = os.path.getsize(filepath)
        print(f"{indent}  {file} ({size:,} bytes)")
        files_found.append(filepath)

if not files_found:
    print("  (no files generated)")

In [None]:
# Check if final panel CSV was created
panel_path = os.path.join(test_save_path, "final_gene_panel.csv")

if os.path.exists(panel_path):
    print("Final gene panel found!")
    print("=" * 50)
    df = pd.read_csv(panel_path)
    print(f"Total genes: {len(df)}")
    print(f"Columns: {df.columns.tolist()}")
    print(f"\nFirst 10 genes:")
    print(df.head(10))
    
    if 'cell type' in df.columns:
        print(f"\nUnique cell types covered:")
        all_cell_types = set()
        for ct_str in df['cell type']:
            all_cell_types.update(ct.strip() for ct in ct_str.split(';'))
        print(f"  {len(all_cell_types)} cell types")
else:
    print("WARNING: Final gene panel not found at expected path")
    print("Checking for intermediate files...")
    
    # Check for iteration files
    for i in range(1, 4):
        iter_file = os.path.join(test_save_path, f"iter{i}_importance_score.csv")
        if os.path.exists(iter_file):
            print(f"\nIteration {i} importance scores found:")
            df = pd.read_csv(iter_file)
            print(f"  {len(df)} genes scored")
        else:
            print(f"\nIteration {i} importance scores: NOT FOUND")

## 5. Verify Skill Retrieval

In [None]:
if hasattr(agent, '_selected_skill') and agent._selected_skill:
    print("Skill was retrieved and used")
    print(f"Skill preview: {agent._selected_skill[:300]}...")
else:
    print("No skill was used (agent used general planning)")

## 6. Test Summary

In [None]:
print("\n" + "=" * 60)
print("PANEL DESIGN TEST SUMMARY")
print("=" * 60)

# Check results
panel_path = os.path.join(test_save_path, "final_gene_panel.csv")

checks = {
    "Skill retrieved": hasattr(agent, '_selected_skill') and agent._selected_skill is not None,
    "Files generated": len(files_found) > 0,
}

# Check for iteration files
for i in range(1, 4):
    czi_file = os.path.join(test_save_path, f"czi_reference_celltype_{i}.csv")
    pangdb_file = os.path.join(test_save_path, f"pangdb_celltype_{i}.csv")
    cellmarker_file = os.path.join(test_save_path, f"cellmarker_celltype_{i}.csv")
    iter_score_file = os.path.join(test_save_path, f"iter{i}_importance_score.csv")
    
    checks[f"Iteration {i} - CZI data"] = os.path.exists(czi_file)
    checks[f"Iteration {i} - PanglaoDB"] = os.path.exists(pangdb_file)
    checks[f"Iteration {i} - CellMarker"] = os.path.exists(cellmarker_file)
    checks[f"Iteration {i} - Scores"] = os.path.exists(iter_score_file)

checks["Final panel exists"] = os.path.exists(panel_path)

if os.path.exists(panel_path):
    df = pd.read_csv(panel_path)
    checks["Panel has genes"] = len(df) >= 20
    checks["Has Gene column"] = 'Gene' in df.columns
    checks["Has Importance Score column"] = 'Importance Score' in df.columns
    checks["Has cell type column"] = 'cell type' in df.columns
    checks["Has Reason column"] = 'Reason' in df.columns

for check, passed in checks.items():
    status = "PASS" if passed else "FAIL"
    print(f"  [{status}] {check}")

all_passed = all(checks.values())
print("\n" + ("ALL TESTS PASSED" if all_passed else "SOME TESTS FAILED"))

## Summary

This test verifies the 3-iteration panel design workflow:

1. **Skill retrieval** - `panel_design` skill is selected and guides the agent
2. **3 CZI datasets** - `retrieve_czi_data` returns 3 reference datasets
3. **Per-iteration processing** (x3):
   - `read_czi_data` extracts cell types
   - `search_panglao` finds PanglaoDB markers
   - `search_cellmarker2` finds CellMarker2 markers
   - `score_gene_importance` uses LLM to score genes
4. **Final aggregation** - `finalize_gene_panel` combines all iterations
5. **Output quality** - Final panel has Gene, Importance Score, cell type, Reason columns