# Pandasaurus CxG Extension Walkthrough 

## Overview
Welcome to this Jupyter notebook walkthrough for pandasaurus_cxg! This library provides powerful tools for analyzing and enriching AnnData objects, enabling you to gain deeper insights into your single-cell RNA sequencing (scRNA-seq) data.

In this notebook, we will explore two main classes: `AnndataEnricher` and `AnndataAnalyzer`. Let's dive in and see how these classes can help us in our scRNA-seq analysis.

Now, let's get started with an example workflow that demonstrates the capabilities of these classes. We'll load an example dataset, perform enrichment, analysis, and visualization steps to gain a better understanding of our scRNA-seq data.

## Test Data
The following files are used in the walkthrough. Please download them manually to a folder of your choice. Ensure that you adjust the file paths used in the examples to match your local file paths.
- [Time-resolved Systems Immunology Reveals a Late Juncture Linked to Fatal COVID-19: Adaptive Cells](https://cellxgene.cziscience.com/collections/db14ce52-5dd6-4649-a9e9-7fb2572d0605)
- [Integrated Single-nucleus and Single-cell RNA-seq of the Adult Human Kidney](https://cellxgene.cziscience.com/collections/36b8480d-114e-42fe-b6a9-bdf79a7eb1fc)

## AnndataEnricher Walkthrough

### Initialization
Let's import the necessary modules and initialize our AnndataEnricher

In [1]:
from pandasaurus_cxg.anndata_enricher import AnndataEnricher

In [2]:
# Using Time-resolved Systems Immunology Reveals a Late Juncture Linked to Fatal COVID-19: Adaptive Cells dataset
ade = AnndataEnricher.from_file_path("test/data/test_covid.h5ad")

Anndata Obs field details

In [3]:
ade._anndata.obs

Avialable slims for minimal and full slim enrichment methods

In [4]:
ade.slim_list

Avialable slims for contextual enrichment methods

In [5]:
ade._AnndataEnricher__context_list

### Enrichment

Show all cells that are instances of "memory T cell" - before enrichment

In [6]:
ade._anndata.obs[ade._anndata.obs["cell_type_ontology_term_id"] == "CL:0000813"]

#### Simple enrichment
Returns a DataFrame that is enriched with inferred relationships between terms in the seed. Subject and object terms are members of the seed terms.

In [7]:
simple_enrichment = ade.simple_enrichment()
simple_enrichment

Show all cells that are instances of "memory T cell" - after enrichment

In [8]:
t_cell_instances = simple_enrichment[simple_enrichment["o_label"] == "memory T cell"]["s"].tolist()
t_cell_instances.append("CL:0000813")
ade._anndata.obs[ade._anndata.obs["cell_type_ontology_term_id"].isin(t_cell_instances)]

### Discussion
**Background**
>The enrichment allows us to find all cells that are instances of any CL term in the enrichment table. For any set of non-overlapping enrichment terms, we can add a new obs field with the corresponding anotations.
>
>Before implementing - add example to Jupyter Notebook & chat with Evan & Mary about whether worth adding as a method
>
>User chooses a set of one or more terms from enrichment table and a name for a new obs field.
> - Test if these classes are subclasses of each other
> - if not add new obs field pair (name & ID following standard CxG naming convention).
otherwise fail with warning indicating which classes are subclasses of each other.

**Question**

Should the user add a new obs field based on the enrichment methods using the pandasaurus_cxg function, or would it be more suitable for them to manipulate the anndata directly to accomplish this task?

In [9]:
obs_df = ade._anndata.obs
new_value = "memory T cell"
new_obs_field = "parent" 
obs_df[new_obs_field] = ""
condition = obs_df["cell_type_ontology_term_id"].isin(t_cell_instances)
obs_df.loc[condition, new_obs_field] = new_value
filtered_columns = [col for col in obs_df.columns if "ontology_term_id" in col] + [new_obs_field]
obs_df[obs_df[new_obs_field] == new_value][filtered_columns]

#### Minimal slim enrichment
Returns a DataFrame that is enriched with inferred relationships between terms in the seed list and in an extended seed list. The extended seed list consists of terms from the seed list and terms from given slim lists, classes tagged with some specified ‘subset’ axiom.

In [10]:
ade.minimal_slim_enrichment(["blood_and_immune_upper_slim"])

#### Full slim enrichment
Returns a DataFrame that is enriched with inferred relationships between terms in the seed list and in an extended seed list. The extended seed list consists of terms from the seed list and terms from given slim lists, classes tagged with some specified ‘subset’ axiom, with inferred terms via transitive subClassOf queries.

In [11]:
ade.full_slim_enrichment(["blood_and_immune_upper_slim"])

#### Contextual enrichment
Returns a DataFrame that is enriched with inferred relationships between terms in the seed list and in an extended seed list. The extended seed list consists of terms from the seed list and all terms satisfied by some set of existential restrictions in the ubergraph (e.g. part_of some 'kidney').

In [12]:
ade.contextual_slim_enrichment()

### Secondary Example

In [13]:
#Using Integrated Single-nucleus and Single-cell RNA-seq of the Adult Human Kidney dataset
ade = AnndataEnricher.from_file_path("test/data/human_kidney.h5ad")
print(f"""Contexts of the dataset from tissue field are 
{ade._AnndataEnricher__context_list}""")
ade.contextual_slim_enrichment()

## AnndataAnalyzer walkthrough

### Initialization
Let's import the necessary modules and initialize our AnndataAnalyzer

In [14]:
from pandasaurus_cxg.anndata_analyzer import AnndataAnalyzer

In [15]:
# temporarily using a placeholder schema for free text cell types 
ada = AnndataAnalyzer("test/data/test_covid.h5ad", "pandasaurus_cxg/schema/schema.json")

### Analyzer

#### Co-annotation report
Generates a co-annotation report based on the provided schema.

In [16]:
ada.co_annotation_report()

### Secondary Example

In [17]:
ada = AnndataAnalyzer("test/data/human_kidney.h5ad", "pandasaurus_cxg/schema/schema.json")
ada.co_annotation_report()