# **Exploring average gene expression with atlasapprox API**

When investigating cell atlases, a common approach is to explore gene expression patterns across different cell types and organs. This tutorial will demonstrate how to use the `atlasapprox` Python API to query and analyze average gene expression data for any species in our database. We'll explore a range of use cases that illustrate various methods for accessing and comparing gene expression profiles. 

## **Initialize the API**

Import the atlasapprox in python and create an API object:

In [None]:
import atlasapprox

api = atlasapprox.API()

For detailed initial setup instructions, refer to the [Quick Start Tutorial](link to quick_start).

## **Querying average gene expression data**

A convenient way to query and fetch gene expression data is to use the `average` method of the `atlasapprox` API. It allows you to retrieve the the average gene expression for selected genes within a specific organ of an organism.

The following example demonstrates how to retrieve the average expression of four genes (*TP53*, *KRAS*, *EGFR*, *ALK*) in the human lung:

In [19]:
avg_gene_expr_lung = api.average(
    organism = "h_sapiens", 
    organ = "lung", 
    features = ["TP53", "KRAS", "EGFR", "ALK"],
    measurement_type = 'gene_expression'
)

# Display the result
display(avg_gene_expr_lung)

Unnamed: 0,neutrophil,basophil,monocyte,macrophage,dendritic,B,plasma,T,NK,plasmacytoid,...,capillary,CAP2,lymphatic,fibroblast,alveolar fibroblast,smooth muscle,vascular smooth muscle,pericyte,mesothelial,ionocyte
TP53,0.054815,0.119978,0.327787,0.132754,0.238697,0.178123,0.038301,0.202786,0.239634,0.074571,...,0.219169,0.126632,0.254856,0.175867,0.16092,0.110756,0.193365,0.252695,0.152536,0.227391
KRAS,1.529643,0.436303,0.977728,0.489622,0.443576,0.562167,0.243355,0.82648,0.76758,0.467747,...,1.357422,1.346849,0.794639,0.249971,0.388602,0.582125,0.251247,0.708356,0.319821,0.714599
EGFR,0.016721,0.024823,0.028325,0.011413,0.000949,0.138468,0.031597,0.064145,0.051909,0.0,...,0.579046,0.736705,0.085129,0.666291,0.720418,0.670286,0.528496,0.980535,0.2257,2.810568
ALK,0.0,0.0,0.001285,0.013633,0.0,0.0,0.0,0.002077,0.0,0.0,...,0.001188,0.0,0.0,0.000689,0.004787,0.0,0.0,0.0,0.0,0.0


#### Output
The function returns a *Pandas DataFrame* where:

* Each row represents a gene.
* Each column corresponds to a cell type.
* The values indicate the average gene expression, measured in counts per ten thousand (cptt).

## **Querying average gene expression for multiple organs**

For comprehensive analysis, you may want to explore gene expression across multiple organs within the same species. 

This example shows the average gene expression for four specified genes (*TP53*, *KRAS*, *EGFR*, *ALK*) across three human organs (*bladder*, *blood*, and *colon*).

In [4]:
# To select an organ_list, you can specify the desired organ.
organ_list = ['bladder','blood','colon']

# loop through organ_list and display the results
for organ in organ_list: 
    avg_gene_expr = api.average(
        organism = "h_sapiens", 
        organ = organ, 
        features = ["TP53", "KRAS", "EGFR", "ALK"],
    )

    print(f'Average gene expression in human {organ}:')
    display(avg_gene_expr)

Average gene expression in human bladder:


Unnamed: 0,mast,macrophage,B,plasma,T,NK,plasmacytoid,urothelial,venous,capillary,lymphatic,fibroblast,smooth muscle,pericyte
TP53,0.051055,0.18824,0.327816,0.053807,0.147462,0.314548,0.398251,0.162376,0.339704,0.217849,0.104213,0.111125,0.162751,0.112216
KRAS,0.564742,0.690973,1.319512,0.357356,1.065008,1.569626,0.354131,0.537687,0.719438,0.811878,0.722906,0.393044,0.612407,0.582806
EGFR,0.014139,0.0208,0.011188,0.006583,0.007657,0.006421,0.0,0.290386,0.041818,0.076526,0.0,0.897405,0.48993,0.349536
ALK,0.001072,0.007006,0.0,0.0,0.003232,0.0,0.0,0.000645,0.0,0.0,0.0,0.000287,0.001065,0.003441


Average gene expression in human blood:


Unnamed: 0,HSC,neutrophil,basophil,myeloid,monocyte,macrophage,dendritic,erythrocyte,B,plasma,T,NK,plasmacytoid,platelet
TP53,0.429484,0.019245,0.550442,0.757884,0.28239,0.40935,0.153117,0.004213,0.287588,0.174535,0.205015,0.241251,0.401704,0.060797
KRAS,0.701804,1.378338,1.040511,0.776177,0.804196,0.684039,1.118797,0.02061,0.494324,0.654123,0.790222,0.788223,0.67032,0.370046
EGFR,0.0,0.00019,0.0,0.0,0.000307,0.0,0.0,0.000325,0.0,0.0,0.0,1e-05,0.0,0.0
ALK,0.0,0.0,0.0,0.0,0.001128,0.008228,0.0,0.0,0.0,0.003425,0.001432,0.000239,0.0,0.0


Average gene expression in human colon:


Unnamed: 0,neutrophil,mast,monocyte,B,plasma,T,goblet,brush,crypt,transit amp,enterocyte,paneth,venous,capillary,fibroblast,enteroendocrine
TP53,0.111315,0.033383,0.085653,0.185189,0.025554,0.06861,0.063521,0.013328,0.267211,0.449279,0.089426,0.076705,0.239154,0.0,0.13657,0.236432
KRAS,0.864672,0.984021,0.556534,2.100426,0.726135,0.985572,0.522061,0.13228,0.424796,0.55747,0.619195,0.907783,0.504579,1.104401,0.388044,1.0116
EGFR,0.058211,0.0,0.0177,0.016101,0.00332,0.012897,0.183984,0.011498,0.225115,0.284618,0.174868,0.074162,0.0,0.111962,1.088221,0.146555
ALK,0.0,0.0,0.035261,0.0,0.001314,0.00154,0.0008,0.0,0.0,0.0,0.000854,0.0,0.0,0.0,0.002482,0.0


#### Output
* The function returns multiple *Pandas DataFrames*, one for each queried organ.

## **Querying average gene expression of the marker genes**

Marker genes are crucial for identifying and visualizing specific cell types within an organism and its organs.

The following example displays how to get the average gene expression of the top five marker genes for *neutrophils* in the *human lung*. First, call `markers` funciton from API to get the top 5 marker genes for neutrophils in human lung.

In [15]:
# Get markers for neutrophils in the human lung
markers_in_human_lung_neu = api.markers(
    organism="h_sapiens", 
    organ="lung", 
    cell_type="neutrophil", 
    number=5
)
display(markers_in_human_lung_neu)

['CXCR2', 'FCGR3B', 'IL1R2', 'G0S2', 'MTND5P32']

Then, call `average` function to get the average gene expression of the top 5 marker gene got from above:

In [16]:
# Get average gene expression for the markers
avg_gene_expr_markers = api.average(
    organism="h_sapiens",
    organ="lung",
    features=markers_in_human_lung_neu
)

display(avg_gene_expr_markers)

Unnamed: 0,neutrophil,basophil,monocyte,macrophage,dendritic,B,plasma,T,NK,plasmacytoid,...,capillary,CAP2,lymphatic,fibroblast,alveolar fibroblast,smooth muscle,vascular smooth muscle,pericyte,mesothelial,ionocyte
CXCR2,12.413691,0.01319,0.020326,0.068983,0.096474,0.0,0.0,0.016979,0.246515,0.188133,...,0.006449,0.0,0.0,0.0,0.005169,0.0,0.0,0.0,0.015745,0.0
FCGR3B,11.70941,0.0,0.020036,0.011601,1.3e-05,0.0,0.0,0.029604,0.023175,0.188133,...,0.006533,0.005917,0.0,0.004314,0.0,0.002679,0.0,0.012478,0.0,0.0
IL1R2,62.680073,0.008473,1.464313,0.055198,1.171577,0.0,0.034122,0.094902,0.068963,0.0,...,0.060423,0.037151,0.0,0.01131,0.011099,0.019712,0.020363,0.046019,0.0,0.22693
G0S2,128.728485,0.227576,1.875759,0.382874,2.597629,0.096599,0.349806,0.100881,0.013899,0.465268,...,0.066712,0.0204,0.117118,0.497634,1.780495,0.358832,1.845514,0.266992,0.256909,0.0
MTND5P32,7.489127,0.00929,0.060642,0.060736,0.002854,0.0,0.0,0.020875,0.0,0.0,...,0.002961,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03654,0.0


#### Output
* The function returns a *Pandas DataFrame* and displays the average gene expression of the top three marker genes for neutrophils in the human lung across all available cell types.

## **Start from scratch**

The API provides several functions that can show you all accessible organisms, organs, and cell types. If you're starting from scratch, the following steps will help you explore the API.

### 1. Get available organisms

The following example demonstrates how to retrieve a list of available organisms from the API. 

In [17]:
organisms = api.organisms()

display(organisms)

{'gene_expression': ['a_queenslandica',
  'a_thaliana',
  'c_elegans',
  'c_gigas',
  'c_hemisphaerica',
  'd_melanogaster',
  'd_rerio',
  'f_vesca',
  'h_miamia',
  'h_sapiens',
  'i_pulchra',
  'l_minuta',
  'm_leidyi',
  'm_murinus',
  'm_musculus',
  'n_vectensis',
  'o_sativa',
  'p_crozieri',
  'p_dumerilii',
  's_lacustris',
  's_mansoni',
  's_mediterranea',
  's_pistillata',
  's_purpuratus',
  't_adhaerens',
  't_aestivum',
  'x_laevis',
  'z_mays']}

### 2. Get available organs within your organism of interest:

The following example takes an organism (*human*) as a parameter and returns a list of available organs in the API.

In [8]:
# Check all available cell types in human lung
human_organs = api.organs(organism='h_sapiens')

display(human_organs)

['bladder',
 'blood',
 'colon',
 'eye',
 'fat',
 'gut',
 'heart',
 'kidney',
 'liver',
 'lung',
 'lymphnode',
 'mammary',
 'marrow',
 'muscle',
 'pancreas',
 'prostate',
 'salivary',
 'skin',
 'spleen',
 'thymus',
 'tongue',
 'trachea',
 'uterus']

### 3. Get available cell types

This function returns a list of available cell types under chosen organism(*human*) and organ(*lung*).

In [14]:
celltypes_human_lung = api.celltypes(organism='h_sapiens', organ='lung', measurement_type='gene_expression')

display(celltypes_human_lung)

['neutrophil',
 'basophil',
 'monocyte',
 'macrophage',
 'dendritic',
 'B',
 'plasma',
 'T',
 'NK',
 'plasmacytoid',
 'goblet',
 'AT1',
 'AT2',
 'club',
 'ciliated',
 'basal',
 'serous',
 'mucous',
 'arterial',
 'venous',
 'capillary',
 'CAP2',
 'lymphatic',
 'fibroblast',
 'alveolar fibroblast',
 'smooth muscle',
 'vascular smooth muscle',
 'pericyte',
 'mesothelial',
 'ionocyte']

You can also check if your cell type of interest is included:

In [None]:
aim_celltype = 'NK'

if aim_celltype in celltypes_human_lung:
    print(f"{aim_celltype} cell is available.")
else:
    print(f"{aim_celltype} cell is NOT available.")

NK cell is available.


### 4. Check available genes within your organism of interest:

The following example takes an organism (*human*) as a parameter and returns a list of available gene in the API.

In [12]:
# convert Pandas Index to list for case-insensitive searching
organism = 'h_sapiens'
human_genes = api.features(organism=organism).tolist()
aim_gene = 'MTRNR2L12'

# case-insensitive searching
if aim_gene.lower() in [element.lower() for element in human_genes]:
    print(f"{aim_gene} gene is available in {organism}.")
else:
    print(f"{aim_gene} gene is NOT available in {organism}.")

MTRNR2L12 gene is available in h_sapiens.




## **Conclusion**


This tutorial provide the some basic usage of `average` in *atlasapprox*. Thank you for using *atlasapprox* API, for more detailed information, please refer to the [official documentation](https://atlasapprox.readthedocs.io/en/latest/python/index.html).