# CRI iAtlas notebooks
## Exploring the Pseudobulk single-cell RNAseq data available in iAtlas.

Repo: https://github.com/CRI-iAtlas/iatlas-notebooks/ 

Notebook: query_iatlas_single_cell_datasets.ipynb 

Date: September 13, 2024 

Author: Carolina Heimann

---

notebook repo: https://github.com/CRI-iAtlas/iatlas-notebooks

landing page: https://www.cri-iatlas.org/

portal: https://isb-cgc.shinyapps.io/iatlas/

email: support@cri-iatlas.org

---

## Getting started

In [2]:
# We have a few libraries to install.
invisible(try({
    packages = c("magrittr", "dplyr", "tidyr", "dplyr", "tidyr", "ggplot2", "iatlasGraphQLClient")

    sapply(packages, function(x) {
      if (!require(x,character.only = TRUE))
        install.packages(x)
        suppressPackageStartupMessages(library(x,character.only = TRUE))
    })},
    silent=TRUE 
))

# Exploring the single-cell datasets and features


The iAtlas single-cell RNAseq data is stored in a database that can be queried with functions from the `iatlasGraphQLClient` package. 
We have clinical data, pseudobulk expression, and immune features.

As a first step, let's take a look at the available datasets and features.

## Datasets available

In [3]:
#single cell datasets that we have in the iAtlas database
sc_datasets <- iatlasGraphQLClient::query_datasets(types = "scrna")
sc_datasets

display,name,type
<chr>,<chr>,<chr>
Bi 2021 - ccRCC - PD-1,Bi_2021,scrna
"Krishna 2021 - ccRCC, PD-1",Krishna_2021,scrna
Li 2022 - ccRCC,Li_2022,scrna
MSK - SCLC,MSK,scrna
"Shiao 2024 - BRCA, PD-1",Shiao_2024,scrna
Vanderbilt - colon polyps,Vanderbilt,scrna


## Immune Features

In [3]:
#immune features for the Bi 2021 dataset
features_df <- iatlasGraphQLClient::query_features(cohorts = 'Bi_2021')
head(features_df)

name,display,class,order,unit,method_tag
<chr>,<chr>,<chr>,<int>,<chr>,<chr>
age_at_diagnosis,Age At Diagnosis,Clinical,,Year,
Module3_IFN_score,IFN-gamma Response,Core Expression Signature,1.0,Score,ExpSig
LIexpression_score,Lymphocyte Infiltration,Core Expression Signature,4.0,Score,ExpSig
CSF1_response,Macrophage Regulation,Core Expression Signature,3.0,Score,ExpSig
Module11_Prolif_score,Proliferation,Auxiliary Expression Signature,1.0,Score,ExpSig
TGFB_score_21050467,TGF-beta Response,Core Expression Signature,2.0,Score,ExpSig


## Clinical Annotation

In [5]:
#clinical annotation that is available for the Bi 2021 dataset
clinical_options <- iatlasGraphQLClient::query_tags(datasets = 'Bi_2021')
head(clinical_options)

tag_name,tag_long_display,tag_short_display,tag_characteristics,tag_color,tag_order,tag_type
<chr>,<chr>,<chr>,<chr>,<lgl>,<int>,<chr>
Biopsy_Site,Biopsy Site,Biopsy Site,Site where sample was collected from.,,18,parent_group
Cancer_Tissue,Cancer Tissue,Cancer Tissue,Original tumor tissue.,,14,parent_group
Clinical_Benefit,Clinical Benefit,Clinical Benefit,Patients have clinical benefit when mRECIST response is different than Progressive Disease.,,4,parent_group
Clinical_Stage,Clinical Stage,Clinical Stage,Clinical stage of cancer.,,17,parent_group
FFPE,FFPE Samples,FFPE Samples,Indicates whether the sample is FFPE or not.,,20,parent_group
ICI_Pathway,ICI Pathway,ICI Pathway,Pathway that is being targeted by the ICI treatment.,,6,parent_group


## Gene Expression

In [8]:
#genes that we have expression data for the Bi 2021 dataset (we will query expression values in the next section)
genes_df <- iatlasGraphQLClient::query_genes(cohorts = "Bi_2021")
head(genes_df)

hgnc,entrez,description,friendly_name,io_landscape_name,gene_family,gene_function,immune_checkpoint,pathway,super_category
<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
ABCB5,340273,"A protein highly expressed by melanoma cell, and it is also in the family of ABC transporter and P-glycoprotein family.",,ABCB5,,,,ABC-family proteins mediated transport,
ABCC1,4363,MRP1 is a membrane transporter and it allows tumor cells to transport many chemotherapeutic compounds out of cancer cells.,,MRP1,,,,ABC-family proteins mediated transport,
ACKR3,57007,CXCR7 is the receptor for chemokines CXCL11 and CXCL12.,,CXCR7,,,,Chemokine signaling pathway,
ACP3,55,A enzyme produced by the prostate and generally is elevated in men with prostate cancer.,,ACPP,,,,Innate Immune System,
ADAM17,6868,Belongs to metallopeptidase family and help the maturation of TNF.,,ADAM17,,,,Metallopeptidase,
ADAM9,8754,Belongs to metallopeptidase family.,,ADAM9,,,,Metallopeptidase,


# Querying the CRI iAtlas database

Each type of data has a query function in `iatlasGraphQLClient`, as summarized below:

- For immunefeatures for pseudobulk: `iatlasGraphQLClient::query_pseudobulk_feature_values()`
- For gene expression data: `iatlasGraphQLClient::query_pseudobulk_expression()`
- For clinical annotation: `iatlasGraphQLClient::query_tag_samples_parents()`

To illustrate how to get the single-cell RNA-seq data from the iAtlas database, we will get the data that follows the parameters listed below:

- *Dataset:* Bi 2021 - ccRCC - PD-1

- *Features:* IFN-gamma Response

- *Gene expression:* ADORA2A, CTLA4, EDNRB, TLR4

- *Clinical Annotation:* Response, Gender.

In [13]:
#For immune features, use names from features_df$name
features   <- iatlasGraphQLClient::query_pseudobulk_feature_values(cohorts = "Bi_2021", 
                                                      features = "Module3_IFN_score") 

#For gene expression, we need the gene Entrez ID to query the iAtlas database
genes_entrez <- c(135,
                1493,
                1910,
                7099)

genes <- iatlasGraphQLClient::query_pseudobulk_expression(cohorts = "Bi_2021", 
                                                  entrez = na.omit(as.numeric(genes_entrez)))

#The CRI iAtlas database also has precomputed statistics of frequency and average expression for genes in each cell type
cell_stats <- iatlasGraphQLClient::query_cell_stats(entrez = na.omit(as.numeric(genes_entrez)))%>%
                dplyr::filter(dataset_name == "Bi_2021")


#For clinical annotation, use names from clinical_options$tag_name that have tag_type == "parent_tag"

clinical_annotation  <- iatlasGraphQLClient::query_tag_samples_parents(cohorts = "Bi_2021", 
                                                       parent_tags = c("Response", "Gender")) 

#All the tables are in the long format.
head(features)
head(genes)
head(cell_stats)
head(clinical_annotation)

feature_name,feature_display,feature_order,feature_class,sample_name,cell_type,value
<chr>,<chr>,<int>,<chr>,<chr>,<chr>,<dbl>
Module3_IFN_score,IFN-gamma Response,1,Core Expression Signature,Li_ccRCC_PD45815_5739STDY8351225,NK,0.0
Module3_IFN_score,IFN-gamma Response,1,Core Expression Signature,Li_ccRCC_PD45814_5739STDY8351240,mast cell,0.0
Module3_IFN_score,IFN-gamma Response,1,Core Expression Signature,Li_ccRCC_PD47512_5739STDY9266993,mast cell,0.0
Module3_IFN_score,IFN-gamma Response,1,Core Expression Signature,Li_ccRCC_PD45814_5739STDY8351216,B cell,0.0
Module3_IFN_score,IFN-gamma Response,1,Core Expression Signature,Bi_ccRCC_P906,B cell,0.9800566
Module3_IFN_score,IFN-gamma Response,1,Core Expression Signature,Li_ccRCC_PD47512_5739STDY9266994,plasma cell,0.0


gene_entrez,gene_hgnc,sample_name,cell_type,single_cell_seq_sum
<int>,<chr>,<chr>,<chr>,<dbl>
7099,TLR4,Bi_ccRCC_P55,B cell,0.0
7099,TLR4,Bi_ccRCC_P55,Dendritic cell,0.0
7099,TLR4,Bi_ccRCC_P55,macrophage,0.249042146
7099,TLR4,Bi_ccRCC_P55,monocyte,0.141509434
7099,TLR4,Bi_ccRCC_P55,myeloid cell,0.085714286
7099,TLR4,Bi_ccRCC_P55,NK,0.001618123


type,count,avg_expr,perc_expr,dataset_name,gene_entrez
<chr>,<int>,<dbl>,<dbl>,<chr>,<int>
fibroblast,0,0.0,0.0,Bi_2021,1493
tumor,72,0.953229,0.0089552239,Bi_2021,7099
T cell,0,0.0,0.0,Bi_2021,135
plasma cell,3,0.2347807,0.0064794816,Bi_2021,1493
fibroblast,8,1.9022334,0.0879120879,Bi_2021,1910
macrophage,1,0.7027845,0.0001979022,Bi_2021,135


sample_name,parent_tag_name,parent_tag_long_display,parent_tag_short_display,parent_tag_characteristics,parent_tag_color,parent_tag_order,parent_tag_type,tag_name,tag_long_display,tag_short_display,tag_characteristics,tag_color,tag_order,tag_type
<chr>,<chr>,<chr>,<chr>,<chr>,<lgl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>
Bi_ccRCC_P912,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,na_response,Not available,Not available,Response information not available,#868A88,5,group
Bi_ccRCC_P90,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,na_response,Not available,Not available,Response information not available,#868A88,5,group
Bi_ccRCC_P76,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,na_response,Not available,Not available,Response information not available,#868A88,5,group
Bi_ccRCC_P916,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,na_response,Not available,Not available,Response information not available,#868A88,5,group
Bi_ccRCC_P55,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,partial_response_response,Partial Response,Partial Response,Partial Response (PR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#0072B2,2,group
Bi_ccRCC_P915,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,partial_response_response,Partial Response,Partial Response,Partial Response (PR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#0072B2,2,group
