# CASSIA Analysis Tutorial

This Python Notebook demonstrates a complete workflow using CASSIA for cell type annotation of single-cell RNA sequencing data. We'll analyze an intestinal cell dataset containing six distinct populations:

1.monocyte

2.plasma cells

3.cd8-positive, alpha-beta t cell

4.transit amplifying cell of large intestine

5.intestinal enteroendocrine cell

6.intestinal crypt stem cell

## Setup and Environment Preparation

First, let's install and import the required packages:

In [1]:
import CASSIA
print(CASSIA.__file__)
print(CASSIA.__version__)

processed_markers = CASSIA.loadmarker(marker_type="processed")
unprocessed_markers = CASSIA.loadmarker(marker_type="unprocessed")
subcluster_results = CASSIA.loadmarker(marker_type="subcluster_results")

# List available marker sets
available_markers = CASSIA.list_available_markers()
print(available_markers) 

d:\anaconda3\envs\cassia_env\lib\site-packages\CASSIA\__init__.py
0.2.21
['processed', 'subcluster_results', 'unprocessed']


In [None]:
import os


custom_base_url = "https://api.deepseek.com"
custom_model = "deepseek-chat"

CASSIA.set_api_key("sk-f7c54e95a5e040589f41d83553b55861", provider="https://api.deepseek.com")
print("API key set:", os.getenv("CUSTERMIZED_API_KEY") is not None)

## Fast Mode

In [2]:
# Run the CASSIA pipeline in fast mode
CASSIA.runCASSIA_pipeline(
    output_file_name = "FastAnalysisResults",
    tissue = "large intestine",
    species = "human",
    marker_path = unprocessed_markers,
    max_workers = 6,  # Matches the number of clusters in dataset
    annotation_model = "google/gemini-2.5-flash-preview-05-20", #openai/gpt-4o-2024-11-20
    annotation_provider = "openrouter",
    score_model = "google/gemini-2.5-flash-preview-05-20",
    score_provider = "openrouter",
    score_threshold = 98,
    annotationboost_model="google/gemini-2.5-flash-preview-05-20",
    annotationboost_provider="openrouter",
    ranking_method="p_val_adj"
)


Created main folder: CASSIA_large_intestine_human_20250622_151232
Created subfolder: CASSIA_large_intestine_human_20250622_151232\01_annotation_results
Created subfolder: CASSIA_large_intestine_human_20250622_151232\02_reports
Created subfolder: CASSIA_large_intestine_human_20250622_151232\03_boost_analysis

=== Starting cell type analysis ===
Processing input dataframe to get top markers

Analyzing cd8-positive, alpha-beta t cell...

Analyzing intestinal crypt stem cell...

Analyzing intestinal enteroendocrine cell...

Analyzing monocyte...

Analyzing plasma cell...

Analyzing transit amplifying cell of large intestine...
Analysis for plasma cell completed.
Analysis for cd8-positive, alpha-beta t cell completed.
Analysis for intestinal crypt stem cell completed.
Analysis for transit amplifying cell of large intestine completed.
Analysis for monocyte completed.
Analysis for intestinal enteroendocrine cell completed.
All analyses completed. Results saved to 'FastAnalysisResults'.
Two CS

In [15]:
# Run the CASSIA pipeline in fast mode
CASSIA.runCASSIA_pipeline(
    output_file_name = "FastAnalysisResults",
    tissue = "large intestine",
    species = "human",
    marker_path = unprocessed_markers,
    max_workers = 6,  # Matches the number of clusters in dataset
    annotation_model = "deepseek-chat", #openai/gpt-4o-2024-11-20
    annotation_provider = custom_base_url,
    score_model = "deepseek-chat",
    score_provider = custom_base_url,
    score_threshold = 98,
    annotationboost_model="deepseek-chat",
    annotationboost_provider=custom_base_url,
    ranking_method="p_val_adj"
)

Created main folder: CASSIA_large_intestine_human_20250611_000057
Created subfolder: CASSIA_large_intestine_human_20250611_000057\01_annotation_results
Created subfolder: CASSIA_large_intestine_human_20250611_000057\02_reports
Created subfolder: CASSIA_large_intestine_human_20250611_000057\03_boost_analysis

=== Starting cell type analysis ===
Processing input dataframe to get top markers

Analyzing cd8-positive, alpha-beta t cell...

Analyzing intestinal crypt stem cell...

Analyzing intestinal enteroendocrine cell...

Analyzing monocyte...

Analyzing plasma cell...

Analyzing transit amplifying cell of large intestine...
Analysis for cd8-positive, alpha-beta t cell completed.
Analysis for intestinal enteroendocrine cell completed.
Analysis for intestinal crypt stem cell completed.
Analysis for plasma cell completed.
Analysis for transit amplifying cell of large intestine completed.
Analysis for monocyte completed.
All analyses completed. Results saved to 'FastAnalysisResults'.
Two CS

KeyboardInterrupt: 

In [3]:
output_name="intestine_detailed"

In [None]:
# Run the CASSIA pipeline in fast mode
CASSIA.runCASSIA_pipeline(
    output_file_name = "FastAnalysisResults",
    tissue = "large intestine",
    species = "human",
    marker_path = unprocessed_markers,
    max_workers = 6,  # Matches the number of clusters in dataset
    annotation_model = "google/gemini-2.5-flash-preview-05-20", #openai/gpt-4o-2024-11-20
    annotation_provider = "openrouter",
    score_model = "google/gemini-2.5-flash-preview-05-20",
    score_provider = "openrouter",
    score_threshold = 98,
    annotationboost_model="google/gemini-2.5-flash-preview-05-20",
    annotationboost_provider="openrouter"
)


Created main folder: CASSIA_large_intestine_human_20250603_004726
Created subfolder: CASSIA_large_intestine_human_20250603_004726\01_annotation_results
Created subfolder: CASSIA_large_intestine_human_20250603_004726\02_reports
Created subfolder: CASSIA_large_intestine_human_20250603_004726\03_boost_analysis

=== Starting cell type analysis ===
Processing input dataframe to get top markers

Analyzing cd8-positive, alpha-beta t cell...

Analyzing intestinal crypt stem cell...

Analyzing intestinal enteroendocrine cell...

Analyzing monocyte...

Analyzing plasma cell...

Analyzing transit amplifying cell of large intestine...
Analysis for intestinal crypt stem cell completed.

Analysis for monocyte completed.

Analysis for transit amplifying cell of large intestine completed.

Analysis for plasma cell completed.

Analysis for intestinal enteroendocrine cell completed.

Analysis for cd8-positive, alpha-beta t cell completed.

All analyses completed. Results saved to 'FastAnalysisResults'.


### Step 2: Detailed Batch Analysis

In [3]:
output_name="intestine_detailed"

In [4]:

# Run batch analysis
CASSIA.runCASSIA_batch( 
    marker = unprocessed_markers,
    output_name = output_name,
    model = "google/gemini-2.5-flash-preview-05-20",
    tissue = "large intestine",
    species = "human",
    max_workers = 6,  # Matching cluster count
    n_genes = 50,
    additional_info = None,
    provider = "openrouter")

Processing input dataframe to get top markers

Analyzing cd8-positive, alpha-beta t cell...

Analyzing intestinal crypt stem cell...

Analyzing intestinal enteroendocrine cell...

Analyzing monocyte...

Analyzing plasma cell...

Analyzing transit amplifying cell of large intestine...
Analysis for intestinal crypt stem cell completed.
Analysis for plasma cell completed.
Analysis for intestinal enteroendocrine cell completed.
Analysis for transit amplifying cell of large intestine completed.
Analysis for cd8-positive, alpha-beta t cell completed.
Error decoding JSON: Invalid control character at: line 6 column 151 (char 715)
monocyte generated an exception: Expecting value: line 1 column 1 (char 0)
Retrying analysis for monocyte (attempt 2/2)...
Error decoding JSON: Invalid control character at: line 6 column 151 (char 715)
monocyte failed after 2 attempts with error: Expecting value: line 1 column 1 (char 0)
monocyte failed: Expecting value: line 1 column 1 (char 0)
All analyses complet

In [3]:
output_name="intestine_detailed"

### Step 3: Quality Scoring

In [5]:
# Run quality scoring
CASSIA.runCASSIA_score_batch(
    input_file = output_name + "_full.csv",
    output_file = output_name + "_scored.csv",
    max_workers = 6,
    model = "google/gemini-2.5-flash-preview-05-20",
    provider = "openrouter"
)

# Generate quality report
CASSIA.runCASSIA_generate_score_report(
    csv_path = output_name + "_scored.csv",
    index_name = output_name + "_report.html"
)

Starting scoring process with 6 workers using openrouter (google/gemini-2.5-flash-preview-05-20)...
Processed row 6: Score = 98
Processed row 3: Score = 98
Processed row 2: Score = 98
Processed row 5: Score = 98
Processed row 4: Score = 95
Processed row 1: Score = 98

Scoring completed!

Summary:
Total rows: 6
Successfully scored: 6
Failed/Skipped: 0
Report saved to .\report_cd8-positive alpha-beta t cell.html
Report saved to .\report_intestinal crypt stem cell.html
Report saved to .\report_intestinal enteroendocrine cell.html
Report saved to .\report_monocyte.html
Report saved to .\report_plasma cell.html
Report saved to .\report_transit amplifying cell of large intestine.html
Index page saved to .\intestine_detailed_report.html.html


### Optional Step: Uncertainty Quantification

This could be useful to study the uncertainty of the annotation, and potentially improve the accurracy.

Note:This is step could be costy, since multiple iteration will be performed.


In [None]:
# Run multiple iterations
iteration_results = CASSIA.runCASSIA_batch_n_times(
    n=3,
    marker=unprocessed_markers,  # Changed from markers_unprocessed
    output_name=output_name + "_Uncertainty",  # Changed from paste0()
    model="google/gemini-2.5-flash-preview-05-20",
    provider="openrouter",
    tissue="large intestine",
    species="human",
    max_workers=6,
    batch_max_workers=3  # Conservative setting for API rate limits
)


Starting batch run 1/3
Starting batch run 2/3
Processing input dataframe to get top markers
Starting batch run 3/3
Processing input dataframe to get top markers
Processing input dataframe to get top markers

Analyzing cd8-positive, alpha-beta t cell...

Analyzing intestinal crypt stem cell...

Analyzing cd8-positive, alpha-beta t cell...

Analyzing intestinal enteroendocrine cell...

Analyzing intestinal crypt stem cell...

Analyzing monocyte...

Analyzing cd8-positive, alpha-beta t cell...

Analyzing intestinal enteroendocrine cell...

Analyzing plasma cell...

Analyzing intestinal crypt stem cell...

Analyzing monocyte...

Analyzing transit amplifying cell of large intestine...

Analyzing intestinal enteroendocrine cell...

Analyzing plasma cell...

Analyzing monocyte...

Analyzing transit amplifying cell of large intestine...

Analyzing plasma cell...

Analyzing transit amplifying cell of large intestine...
Analysis for transit amplifying cell of large intestine completed.
Analysis 

In [4]:

# Calculate similarity scores
similarity_scores = CASSIA.runCASSIA_similarity_score_batch(
    marker=unprocessed_markers,  # Changed from markers_unprocessed
    file_pattern=output_name + "_Uncertainty_*_full.csv",  # Changed from paste0()
    output_name="intestine_uncertainty",
    max_workers=6,
    model="google/gemini-2.5-flash-preview-05-20",
    provider="openrouter",
    main_weight=0.5,
    sub_weight=0.5
)

Results saved to intestine_uncertainty.csv
Similarity analysis completed: intestine_uncertainty


### Optional Step: Annotation Boost on Selected Cluster

The monocyte cluster is sometimes annotated as mixed population of immune cell and neuron/glia cells.

Here we use annotation boost agent to test these hypothesis in more detail.

In [5]:
output_name="intestine_detailed"

# Run validation plus for the high mitochondrial content cluster
CASSIA.runCASSIA_annotationboost(
    full_result_path = output_name + "_full.csv",
    marker = unprocessed_markers,
    output_name = "monocyte_annotationboost",
    cluster_name = "monocyte",
    major_cluster_info = "Human Large Intestine",
    num_iterations = 5,
    model = "google/gemini-2.5-flash-preview-05-20",
    provider = "openrouter",
    conversation_history_mode="final"
)

Using summarization agent to process conversation history for cluster monocyte
Conversation history summarized using openrouter (11957 -> 1766 characters)
Using conversation history for cluster monocyte (mode: final, 1766 characters)
Iteration 1 completed.
Iteration 2 completed.
Iteration 3 completed.
Iteration 4 completed.
Final annotation completed in iteration 5.
Raw conversation text saved to monocyte_annotationboost_raw_conversation.txt
Raw conversation text saved to monocyte_annotationboost_raw_conversation.txt
HTML report saved to monocyte_annotationboost_summary.html
Summary report saved to monocyte_annotationboost_summary.html
Summary report saved to monocyte_annotationboost_summary.html


{'status': 'success',
 'raw_text_path': 'monocyte_annotationboost_raw_conversation.txt',
 'summary_report_path': 'monocyte_annotationboost_summary.html',
 'execution_time': 79.11194586753845,
 'analysis_text': 'Evaluation\nThe data continues to strongly support a neural crest-derived glial cell. The high expression of SOX10, PLP1, S100B, NGFR, CDH19, L1CAM, and MYOT are all consistent with this lineage. The persistent and high expression of MPZ, PMP22, and PRX, coupled with the absence of MBP (from previous data) and KROX20 (not in list), strongly points towards **non-myelinating Schwann cells**. The presence of neuronal-associated markers (SYP, SNAP25) with `pct.1` values less than 1.00, alongside high SPP1, SFRP5, WNT6, GFRA2, and LGR5, suggests a highly interactive, potentially reactive, or progenitor-like non-myelinating Schwann cell population. The low expression of CD44 and PROM1 argues against a broad progenitor state, but LGR5 is still high. The evidence is converging towards a

### Optional Step: Retrieve Augmented Generation

This is particularly useful if you have a very specific and detialed annottaion to work with. It can significantly imrpove the granularity and accuracy of the annotation. It automatically extract marker information and genearte a report as additional informatyion for default CASSIA pipeline.

Intsall the package


In [None]:
!pip install cassia-rag


In [None]:
!pip install cassia-rag
from cassia_rag import run_complete_analysis
import os 

: 

Set up the API keys if you have not done so.


In [None]:
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
os.environ["OPENAI_API_KEY"] = "your-openai-key"
    

Run the wrapper function to trigger a multiagent pipeline.

In [None]:
run_complete_analysis(
        tissue_type="Liver", # tissue you are analyzing
        target_species="Tiger", # species you are analyzing
        reference_species="Human", # either Human or mouse, if other species, then use Human instead of mouse
        model_choice='claude', # either claude or gpt, highly recommend claude
        compare=True,  # if you want to compare with reference species, for example fetal vs human, then set to True
        db_path="~/Canonical_Marker (1).csv", # path to the database
        max_workers=8
)

All the outputs (intermediate and final) are saved as a txt file in the "TissueType_Species" folder. In our example, it is Liver_Tiger folder. 
Final output is in summary_clean.txt file. And the content in this file can be used as additional information in CASSIA pipeline later.

There are also some other files in the folder, which are intermediate outputs. 
Use the tutorial input as example, the files are:
1. liver_tiger_marker_analysis.txt # marker analysis and interpretation from the database
2. final_ontology.txt # ontology related to the tissue type and target species
3. cell_type_patterns_claude.txt # cell type patterns analysis from the
4. summary.txt # raw summary file
5. additional_considerations.txt # additional considerations if we have different species than reference species.

### Optional Step: Compare the Subtypes Using Multiple LLMs

This agent can be used after you finish the default CASSIA pipeline, and are still unsure about a celltype. You can use this agent to get a more confident subtype annotation. Here we use the Plasma Cells cluster as examples. To distinguish if it is more like a general plasma cell or other celltypes.

In [3]:
# The marker here are copy from CASSIA's previous results.
marker = "IGLL5, IGLV6-57, JCHAIN, FAM92B, IGLC3, IGLC2, IGHV3-7, IGKC, TNFRSF17, IGHG1, AC026369.3, IGHV3-23, IGKV4-1, IGKV1-5, IGHA1, IGLV3-1, IGLV2-11, MYL2, MZB1, IGHG3, IGHV3-74, IGHM, ANKRD36BP2, AMPD1, IGKV3-20, IGHA2, DERL3, AC104699.1, LINC02362, AL391056.1, LILRB4, CCL3, BMP6, UBE2QL1, LINC00309, AL133467.1, GPRC5D, FCRL5, DNAAF1, AP002852.1, AC007569.1, CXorf21, RNU1-85P, U62317.4, TXNDC5, LINC02384, CCR10, BFSP2, APOBEC3A, AC106897.1"

CASSIA.compareCelltypes(
    tissue = "large intestine",
    celltypes = ["Plasma Cells", "IgA-secreting Plasma Cells", "IgG-secreting Plasma Cells", "IgM-secreting Plasma Cells"],
    marker_set = marker,
    species = "human",
    output_file = "plasama_cell_subtype.html"
)

Model: anthropic/claude-3.7-sonnet
Extracted scores: {'Plasma Cells': {'score': '95', 'reasoning': "The marker set strongly indicates plasma cells. Multiple immunoglobulin genes are highly ranked, including IGLL5, IGLV6-57, IGLC3, IGLC2, IGHV3-7, IGKC, IGHV3-23, IGKV4-1, IGKV1-5, IGLV3-1, and IGLV2-11. These represent various light chain (kappa and lambda) and heavy chain components that are characteristic of antibody-producing plasma cells.\n\nAdditionally, JCHAIN (J chain protein) is highly ranked (#3), which is crucial for multimerization of IgA and IgM in plasma cells. MZB1 (Marginal Zone B and B1 Cell-Specific Protein) is a key marker of plasma cell differentiation. TNFRSF17 (also known as BCMA) is a receptor highly expressed in plasma cells that promotes their survival.\n\nOther supportive markers include DERL3 (involved in ER-associated degradation in secretory cells), MZB1 (required for IgM assembly), and TXNDC5 (protein disulfide isomerase involved in antibody folding). The pr

### Optional Step: Subclustering


This agent can be used to study subclustered population, such as a T cell population or a Fibroblast cluster. We recommend to apply the default cassia first, and on a target cluster, apply Seurat pipeline to subcluster the cluster and get the findallmarke results to be used here. Here we present the results for the cd8-positive, alpha-beta t cell cluster as example. This cluster is a cd8 population mixed with other celltypes.

In [7]:
##below are R code or can be done in Scanpy too

# large=readRDS("/Users/xie227/Downloads/seurat_object.rds")
# # Extract CD8+ T cells
# cd8_cells <- subset(large, cell_ontology_class == "cd8-positive, alpha-beta t cell")
# # Normalize and identify variable features
# cd8_cells <- NormalizeData(cd8_cells)
# cd8_cells <- FindVariableFeatures(cd8_cells, selection.method = "vst", nfeatures = 2000)
# # Scale data and run PCA
# all.genes <- rownames(cd8_cells)
# cd8_cells <- ScaleData(cd8_cells, features = all.genes)
# cd8_cells <- RunPCA(cd8_cells, features = VariableFeatures(object = cd8_cells),npcs = 30)
# # Run clustering (adjust resolution and dims as needed based on elbow plot)
# cd8_cells <- FindNeighbors(cd8_cells, dims = 1:20)
# cd8_cells <- FindClusters(cd8_cells, resolution = 0.3)
# # Run UMAP
# cd8_cells <- RunUMAP(cd8_cells, dims = 1:20)
# # Create visualization plots
# p1 <- DimPlot(cd8_cells, reduction = "umap", label = TRUE) +
#   ggtitle("CD8+ T Cell Subclusters")
# # Find markers for each subcluster
# cd8_markers <- FindAllMarkers(cd8_cells,
#                             only.pos = TRUE,
#                             min.pct = 0.1,
#                             logfc.threshold = 0.25)
# cd8_markers=cd8_markers %>% filter(p_val_adj<0.05)
# write.csv(cd8_markers, "cd8_subcluster_markers.csv")

CASSIA.runCASSIA_subclusters(marker = subcluster_results,
    major_cluster_info = "cd8 t cell",
    output_name = "subclustering_results",
    model = "google/gemini-2.5-flash-preview-05-20",
    provider = "openrouter")

Processing input dataframe to get top 50 markers
Analysis text is already in structured format (list of 6 dictionaries)
[{'Key marker': 'IL7R, KLRB1, CD2, CD8A, CD8B', 'Explanation': 'IL7R (CD127) is a key marker for naive and memory T cells, indicating their ability to respond to IL-7 for survival and proliferation. KLRB1 (CD161) is expressed on a subset of T cells, including NK T cells and some CD8+ T cells, often associated with innate-like functions. CD2 is a pan-T cell marker. CD8A and CD8B are coreceptors defining CD8+ T cells. The presence of IL7R and KLRB1, along with general CD8 markers, suggests a less differentiated or innate-like CD8 T cell population.', 'Most likely top2 cell types': ['Naive CD8+ T cells', 'Innate-like CD8+ T cells (e.g., MAIT cells or NK-like CD8+ T cells)']}, {'Key marker': 'HAVCR2, TIGIT, KLRC2, KLRC3, IKZF2', 'Explanation': 'HAVCR2 (TIM-3) and TIGIT are well-known immune checkpoint receptors, often upregulated on exhausted or dysfunctional T cells in c

It is recommend to run the CS score for the subclustering to get a more confident answer.

In [6]:
CASSIA.runCASSIA_n_subcluster(
    n=5, 
    marker=subcluster_results,
    major_cluster_info="cd8 t cell", 
    base_output_name="subclustering_results_n",
    model="google/gemini-2.5-flash-preview-05-20",
    temperature=0,
    provider="openrouter",
    max_workers=5,
    n_genes=50
)


Processing input dataframe to get top 50 markers
Processing input dataframe to get top 50 markers
Processing input dataframe to get top 50 markers
Processing input dataframe to get top 50 markers
Processing input dataframe to get top 50 markers
Analysis text is already in structured format (list of 6 dictionaries)
Results for iteration 3 have been written to subclustering_results_n_3.csv
Analysis text is already in structured format (list of 6 dictionaries)
Results for iteration 5 have been written to subclustering_results_n_5.csv
Analysis text is already in structured format (list of 6 dictionaries)
Results for iteration 2 have been written to subclustering_results_n_2.csv
Analysis text is already in structured format (list of 6 dictionaries)
Analysis text is already in structured format (list of 6 dictionaries)
Results for iteration 4 have been written to subclustering_results_n_4.csv
Results for iteration 1 have been written to subclustering_results_n_1.csv


In [2]:
# Calculate similarity scores
CASSIA.runCASSIA_similarity_score_batch(
    marker = subcluster_results,
    file_pattern = "subclustering_results_n_*.csv",
    output_name = "subclustering_uncertainty",
    max_workers = 6,
    model = "google/gemini-2.5-flash-preview-05-20",
    provider = "openrouter",
    main_weight = 0.5,
    sub_weight = 0.5
)

Organizing batch results...
Processing cell type results...

Processing cell type: 0
Number of predictions: 5
Number of valid predictions: 5
Starting the process of cell type batch variance analysis with openrouter...

Processing cell type: 1
Number of predictions: 5
Number of valid predictions: 5
Starting the process of cell type batch variance analysis with openrouter...

Processing cell type: 2
Number of predictions: 5
Number of valid predictions: 5
Starting the process of cell type batch variance analysis with openrouter...

Processing cell type: 3
Number of predictions: 5
Number of valid predictions: 5
Starting the process of cell type batch variance analysis with openrouter...

Processing cell type: 4
Number of predictions: 5
Number of valid predictions: 5
Starting the process of cell type batch variance analysis with openrouter...

Processing cell type: 5
Number of predictions: 5
Number of valid predictions: 5
Starting the process of cell type batch variance analysis with openro

: 

: 

### Optional Step: Annotation Boost with Additional Task

This can be used to study a given problem related to a cluster, such as infer the state of a cluster. Here we use the cd8-positive, alpha-beta t cell as an example. Note that the performance of this agent has not been benchmarked, so please be cautious with the results.

In [11]:
#only openrouter is supported as provider now.

CASSIA.runCASSIA_annotationboost_additional_task(
    full_result_path = output_name + "_full.csv",  # Changed from paste0() to Python string concatenation
    marker = unprocessed_markers,
    output_name = "T_cell_state",
    cluster_name = "cd8-positive, alpha-beta t cell",  # Cluster with high mitochondrial content
    major_cluster_info = "Human Large Intestine",
    num_iterations = 5,
    model = "anthropic/claude-3.5-sonnet",
    additional_task = "infer the state of this T cell cluster"
)

Using summarization agent to process conversation history for cluster cd8-positive, alpha-beta t cell
Conversation history summarized using openrouter (10978 -> 1331 characters)
Using conversation history for cluster cd8-positive, alpha-beta t cell (mode: final, 1331 characters)
Iteration 1 completed.
Iteration 2 completed.
Final annotation completed in iteration 3.
Raw conversation text saved to T_cell_state_raw_conversation.txt
Raw conversation text saved to T_cell_state_raw_conversation.txt
HTML report saved to T_cell_state_summary.html
Summary report saved to T_cell_state_summary.html
Summary report saved to T_cell_state_summary.html


{'status': 'success',
 'raw_text_path': 'T_cell_state_raw_conversation.txt',
 'summary_report_path': 'T_cell_state_summary.html',
 'execution_time': 51.82601761817932,
 'analysis_text': 'Based on all available data, I am now ready to make a final determination.\n\nFINAL ANALYSIS COMPLETED\n\nGeneral Cell Type: Natural Killer (NK) Cells\nSpecific Subtype: CD56bright tissue-resident NK cells\nConfidence Level: High\n\nKey Evidence:\n1. NK Cell Identity (Very Strong Evidence):\n- High NCAM1/CD56 expression (log2FC = 7.13, 81% positive)\n- NK receptor repertoire (KLRC2, KLRK1, KLRC4, KLRD1)\n- Absence of T cell markers (CD3D/E negative)\n- Strong IL2RB expression (characteristic of NK cells)\n\n2. CD56bright Subtype Classification:\n- High CD27 expression (log2FC = 3.91, 81% positive)\n- Strong chemokine production (XCL1, XCL2, CCL4, CCL5)\n- High NCAM1/CD56 expression\n- Cytokine-producing profile\n\n3. Tissue Residency/State Analysis:\n- CXCR6 positive (log2FC = 2.08)\n- Activated cytoto