# CyteOnto

Semantic Cell Type Annotation Comparison Using Large Language Models and Cell Ontology

## Quick tutorial

This notebook demonstrates how to use CyteOnto for semantic cell type annotation comparison. You can follow the steps below to get started:

### Prerequisites

Before you begin, ensure you have the following:

- Python 3.12+
- UV package manager (recommended)

Navigate to `CyteOnto` directory and install the required packages:

```bash
uv sync
```

### 1. Set API Keys as Environment Variables
 
```bash
LLM_API_KEY=your_api_key_here               # For example OpenAI (can be other like groq, openrouter, google, xai, deepinfra, etc.)
EMBEDDING_MODEL_API_KEY=your_api_key_here   # Can be the same as above if the embedding model is from the same provider
 
# Optional: for higher rate limits
NCBI_API_KEY=your_ncbi_api_key_here         # for using pubmed tool calls
```

### 2. Download precomputed embeddings

Make sure you have downloaded the precomputed embeddings using the provided scripts.

```bash
uv run python scripts/show_embeddings.py
## ⬇️  moonshot-ai_kimi-k2 (Recommended)
##     Name: Moonshot AI Kimi-K2 (descriptions) + Qwen3-Embedding-8B (embeddings)
##     Status: Not downloaded
## ⬇️  deepseek_v3
##     Name: DeepSeek V3 (descriptions) + Qwen3-Embedding-8B (embeddings)
##     Status: Not downloaded
 
uv run python scripts/download_embedding.py moonshot-ai_kimi-k2
```

In [1]:
# Path management, only for running `cyteonto` from notebooks directory
import sys
sys.path.append("..")

### 3. Setup LLM Agent
You can set up a pydantic LLM agent as follows:


In [2]:
import os
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
 
# Initialize your LLM agent
model = OpenAIModel(
    "moonshotai/Kimi-K2-Instruct",
    provider=OpenAIProvider(
        base_url="https://api.deepinfra.com/v1/openai",
        api_key=os.getenv("LLM_API_KEY"),
    ),
)
agent = Agent(model)

### 4. Initialize CyteOnto with LLM and Embedding model


In [None]:
import cyteonto

cyto = cyteonto.CyteOnto(
    base_agent=agent,
    embedding_model="Qwen/Qwen3-Embedding-8B", 
    embedding_provider="deepinfra"
)

### 5. Compare Cell Type Annotations

Prepare your labels as lists

In [23]:
author_labels = ["animal stem cell", "BFU-E", "CFU-M", "neutrophilic granuloblast"]
algorithm1_labels = ["stem cell", "blast forming unit erythroid", "erythroid stem cell", "spermatogonium"]
algorithm2_labels = ["neuronal receptor cell", "stem cell", "smooth muscle cell", "ovum"]

In [None]:
# Perform batch comparison
# This may take a while as the new embeddings are generated
results_df = await cyto.compare_batch(
    study_name="sample_study",              # Save and cache all the results to this directory. Serves as a unique run id.
    author_labels=author_labels,
    algo_comparison_data=[
        ("algorithm1", algorithm1_labels),
        ("algorithm2", algorithm2_labels)
    ],
)

In [27]:
# Show results
results_df

Unnamed: 0,study_name,algorithm,pair_index,author_label,algorithm_label,author_ontology_id,author_embedding_similarity,algorithm_ontology_id,algorithm_embedding_similarity,ontology_hierarchy_similarity,similarity_method
0,sample_study,algorithm1,0,animal stem cell,stem cell,CL:0000034,0.9169,CL:0000723,0.8233,0.9868,ontology_hierarchy
1,sample_study,algorithm1,1,BFU-E,blast forming unit erythroid,CL:0001066,0.8798,CL:0001066,0.8651,1.0,ontology_hierarchy
2,sample_study,algorithm1,2,CFU-M,erythroid stem cell,CL:0000049,0.7403,CL:0000038,0.8797,0.9121,ontology_hierarchy
3,sample_study,algorithm1,3,neutrophilic granuloblast,spermatogonium,CL:0000042,0.9121,CL:0000020,0.9056,0.7566,ontology_hierarchy
4,sample_study,algorithm2,0,animal stem cell,neuronal receptor cell,CL:0000034,0.9169,CL:0000006,0.9087,0.8564,ontology_hierarchy
5,sample_study,algorithm2,1,BFU-E,stem cell,CL:0001066,0.8798,CL:0000037,0.8089,0.9234,ontology_hierarchy
6,sample_study,algorithm2,2,CFU-M,smooth muscle cell,CL:0000049,0.7403,CL:0000027,0.9222,0.8703,ontology_hierarchy
7,sample_study,algorithm2,3,neutrophilic granuloblast,ovum,CL:0000042,0.9121,CL:0000025,0.9099,0.8222,ontology_hierarchy
