# CRI iAtlas notebooks
## Exploring the Immune Checkpoint Inhibition data available in iAtlas in Python.

Repo: https://github.com/CRI-iAtlas/iatlas-notebooks/ 

Notebook: ici_query_iatlas_data_python.ipynb 

Date: July 1, 2024 

Author: Carolina Heimann

---

notebook repo: https://github.com/CRI-iAtlas/iatlas-notebooks

landing page: https://www.cri-iatlas.org/

portal: https://isb-cgc.shinyapps.io/iatlas/

email: support@cri-iatlas.org

---

The CRI iAtlas database is available in a database that can be queried using the R package [`iatlasGraphQLClient`](https://github.com/CRI-iAtlas/iatlasGraphQLClient). 

For manipulating the data in Python, one can perform the queries in R (for more information, check our notebook on [Exploring the Immune Checkpoint Inhibition data available in iAtlas in R](https://github.com/CRI-iAtlas/iatlas-notebooks/blob/main/ici_query_iatlas_data.ipynb) and [Querying TCGA features and expression in R](https://github.com/CRI-iAtlas/iatlas-notebooks/blob/main/querying_TCGA_features_and_expression.ipynb)) and export the data for further analysis in Python, or use the libraries for interoperability between Python and R. 

In this notebook, we will use the `rpy2` library to query the CRI iAtlas database and explore the data the datasets with bulk RNAseq data from studies of response to Immune Checkpoint Inhibitors (ICI).

## Getting started

In [36]:
#we'll need to use the R package that queries the iAtlas database, the libraries below are necessary to call R code
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
from rpy2.robjects.conversion import localconverter
from rpy2.robjects import pandas2ri


#libraries for data manipulation
import pandas as pd

We need to run a few functions to call the interface between R and Python and for output formatting. The code below will be used anytime we need to get a table from the CRI iAtlas database.

In [89]:
#Adding the reference to querying the production version of the API
ro.globalenv["API_URL"] = "https://api.cri-iatlas.org/api"

#Code to query the iAtlas database and converting output to a pandas dataframe
iatlasGraphQLClient = importr('iatlasGraphQLClient') 

def get_dataframe_from_query(query_result):
    with localconverter(ro.default_converter + pandas2ri.converter):       
        return(ro.conversion.rpy2py(query_result))

# Exploring the ICI datasets and features


The iAtlas ICI data is stored in a database that can be queried with functions from the `iatlasGraphQLClient` package. 
We have clinical data, immune features, scores of predictors of response to immunotherapy, and quantile normalized gene expression.

You can get more information in iAtlas on [immune features and predictors of response to ICI](https://isb-cgc.shinyapps.io/iatlas/?module=datainfo), and our annotation of [immunomodulators](https://isb-cgc.shinyapps.io/iatlas/?module=immunomodulators) genes. You can access more information about these datasets in [iAtlas - ICI Datasets Overview](https://isb-cgc.shinyapps.io/iatlas/?module=ioresponse_overview) module.

As a first step, let's take a look at the available datasets and features.

In [87]:
iatlas_datasets = get_dataframe_from_query(iatlasGraphQLClient.query_datasets())
iatlas_datasets

Unnamed: 0,display,name,type
1,"Chen 2016 - SKCM, Anti-CTLA4",Chen_CanDisc_2016,ici
2,"Choueiri 2016 - KIRC, PD-1",Choueiri_CCR_2016,ici
3,GTEX,GTEX,other
4,"Gide 2019 - SKCM, PD-1 +/- CTLA4",Gide_Cell_2019,ici
5,"Hugo 2016 - SKCM, PD-1",HugoLo_IPRES_2016,ici
6,"IMVigor210 - BLCA, PD-L1",IMVigor210,ici
7,"IMmotion150 - KIRC, PD-L1",IMmotion150,ici
8,"Kim 2018 - STAD, PD-1",Kim_NatMed_2018,ici
9,"Liu 2019 - SKCM, PD-1",Liu_NatMed_2019,ici
10,"Melero 2019 - GBM, Anti-PD-1",Melero_GBM_2019,ici


In [41]:
iatlas_datasets["type"].value_counts()

type
ici         15
scrna        6
analysis     2
other        1
Name: count, dtype: int64

In this notebook, we will explore the datasets of type `ici` , as those contain bulk RNAseq data from studies of response to Immune Checkpoint Inhibitors (ICI).

The datasets of type `scrna` contain single-cell RNAseq data with annotation of immune cells. A few of them have ICI treatments.

Finally, the other main type of dataset in CRI iAtlas is `analysis`, and these datasets are derived from cancer genomics efforts, The Cancer Genome Atlas (TCGA) and Pan-Cancer Analysis of Whole Genomes (PCAWG).

In [43]:
ici_datasets = iatlas_datasets[iatlas_datasets["type"] == "ici"]
ici_datasets

Unnamed: 0,display,name,type
2,"Chen 2016 - SKCM, Anti-CTLA4",Chen_CanDisc_2016,ici
3,"Choueiri 2016 - KIRC, PD-1",Choueiri_CCR_2016,ici
5,"Gide 2019 - SKCM, PD-1 +/- CTLA4",Gide_Cell_2019,ici
6,"Hugo 2016 - SKCM, PD-1",HugoLo_IPRES_2016,ici
7,"IMVigor210 - BLCA, PD-L1",IMVigor210,ici
8,"IMmotion150 - KIRC, PD-L1",IMmotion150,ici
9,"Kim 2018 - STAD, PD-1",Kim_NatMed_2018,ici
12,"Liu 2019 - SKCM, PD-1",Liu_NatMed_2019,ici
14,"Melero 2019 - GBM, Anti-PD-1",Melero_GBM_2019,ici
15,"Miao 2018 - KIRC, PD-1 +/- CTLA4, PD-L1",Miao_Science_2018,ici


The display name of the datasets makes reference to the publication associated with the data, and also summarises the tumor type and ICI target involved in the study.

## Immune Features

In [95]:
#immune features of all ICI samples.
with (ro.default_converter + pandas2ri.converter).context(): #we need this snippet of code when we want to send a parameter to the query
    features_df = (iatlasGraphQLClient.query_features(cohorts = ici_datasets["name"]))
features_df.head()

Unnamed: 0,name,display,class,order,unit,method_tag
1,B_cells_Aggregate2,B Cells,Immune Cell Proportion - Common Lymphoid and M...,3,Fraction,CIBERSORT
2,B_cells_Aggregate3,B Cells,Immune Cell Proportion - Differentiated Lympho...,4,Fraction,CIBERSORT
3,B_cells_memory,B Cells Memory,Immune Cell Proportion - Original,9,Fraction,CIBERSORT
4,B_cells_naive,B Cells Naive,Immune Cell Proportion - Original,8,Fraction,CIBERSORT
5,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,Predictor of Response to Immune Checkpoint Tre...,-2147483648,Fraction,NA_character_


## Clinical Annotation

In [97]:
#clinical annotation that is available for the ici datasets
with (ro.default_converter + pandas2ri.converter).context():
    clinical_options = iatlasGraphQLClient.query_tags(datasets = ici_datasets["name"])
clinical_options.head()

Unnamed: 0,tag_name,tag_long_display,tag_short_display,tag_characteristics,tag_color,tag_order,tag_type
1,Biopsy_Site,Biopsy Site,Biopsy Site,Site where sample was collected from.,-2147483648,18,parent_group
2,Cancer_Tissue,Cancer Tissue,Cancer Tissue,Original tumor tissue.,-2147483648,14,parent_group
3,Clinical_Benefit,Clinical Benefit,Clinical Benefit,Patients have clinical benefit when mRECIST re...,-2147483648,4,parent_group
4,Clinical_Stage,Clinical Stage,Clinical Stage,Clinical stage of cancer.,-2147483648,17,parent_group
5,FFPE,FFPE Samples,FFPE Samples,Indicates whether the sample is FFPE or not.,-2147483648,20,parent_group


## Gene Expression

In [99]:
#genes that we have expression data for all samples in the ici datasets (we will query expression values in the next section)
with (ro.default_converter + pandas2ri.converter).context():
    genes_df = iatlasGraphQLClient.query_genes(cohorts = ici_datasets["name"])
genes_df.head()

Unnamed: 0,hgnc,entrez,description,friendly_name,io_landscape_name,gene_family,gene_function,immune_checkpoint,pathway,super_category
1,ABCB5,340273,"A protein highly expressed by melanoma cell, a...",NA_character_,ABCB5,NA_character_,NA_character_,NA_character_,ABC-family proteins mediated transport,NA_character_
2,ABCC1,4363,MRP1 is a membrane transporter and it allows t...,NA_character_,MRP1,NA_character_,NA_character_,NA_character_,ABC-family proteins mediated transport,NA_character_
3,ACKR3,57007,CXCR7 is the receptor for chemokines CXCL11 an...,NA_character_,CXCR7,NA_character_,NA_character_,NA_character_,Chemokine signaling pathway,NA_character_
4,ACP3,55,A enzyme produced by the prostate and generall...,NA_character_,ACPP,NA_character_,NA_character_,NA_character_,Innate Immune System,NA_character_
5,ADAM17,6868,Belongs to metallopeptidase family and help th...,NA_character_,ADAM17,NA_character_,NA_character_,NA_character_,Metallopeptidase,NA_character_


## Annotation of response to immunotherapy

One key annotation in the ICI datasets is how a patient responds to immunotherapy. The response to therapy with Immune Checkpoint Inhitbitor is originally annotated following the guidelines of mRECIST, and has 4 different levels.

In [103]:
with (ro.default_converter + pandas2ri.converter).context():
    df = iatlasGraphQLClient.query_tags(parent_tags = "Response")
df

Unnamed: 0,tag_name,tag_long_display,tag_short_display,tag_characteristics,tag_color,tag_order,tag_type
1,complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Resp...,#009E73,1,group
2,na_response,Not available,Not available,Response information not available,#868A88,5,group
3,partial_response_response,Partial Response,Partial Response,Partial Response (PR) following modified Respo...,#0072B2,2,group
4,progressive_disease_response,Progressive Disease,Progressive Disease,Progressive Disease (PD) following modified Re...,#D55E00,4,group
5,stable_disease_response,Stable Disease,Stable Disease,Stable Disease (SD) following modified Respons...,#F0E442,3,group


The mRECIST annotation is used to annotate three other categories: Responder, Clinical Benefit and Progressor, which consist in grouping different levels of mRECIST into two categories.

In [101]:
outcome_variables  = ["Responder", "Clinical_Benefit", "Progression"]
get_dataframe_from_query(
    iatlasGraphQLClient.query_tags_with_parent_tags(parent_tags = outcome_variables)
    )[["parent_tag_name", "parent_tag_characteristics", "tag_name", "tag_short_display", "tag_characteristics"]]

Unnamed: 0,parent_tag_name,parent_tag_characteristics,tag_name,tag_short_display,tag_characteristics
1,Clinical_Benefit,Patients have clinical benefit when mRECIST re...,false_clinical_benefit,No Clinical Benefit,Patient with mRECIST of Progressive Disease
2,Progression,Progressors are defined as patients with mRECI...,false_progression,Non-Progressor,"Patient with mRECIST of Complete Response, Par..."
3,Responder,Responders are defined as patients with mRECIS...,false_responder,Non-Responder,Patient with mRECIST of Progressive Disease or...
4,Clinical_Benefit,Patients have clinical benefit when mRECIST re...,na_clinical_benefit,Not available,Clinical Benefit information not available
5,Progression,Progressors are defined as patients with mRECI...,na_progression,Not available,Progression information not available
6,Responder,Responders are defined as patients with mRECIS...,na_responder,Not available,Responder information not available
7,Clinical_Benefit,Patients have clinical benefit when mRECIST re...,true_clinical_benefit,Clinical Benefit,"Patient with mRECIST of Complete Response, Par..."
8,Progression,Progressors are defined as patients with mRECI...,true_progression,Progressor,Patient with mRECIST of Progressive Disease
9,Responder,Responders are defined as patients with mRECIS...,true_responder,Responder,Patient with mRECIST of Partial Response or Co...


## Samples and treatments available

Now, let's take a closer look at the ICI datasets that we have available. First, we will query our database and organize the results to see the TCGA Study and drug administered at each one of the studies:

In [119]:
#Treatment information
overview_treatment = ["TCGA_Study", "ICI_Rx"] #name of the groups of interest. Check the clinical_options df for more options

#Organize a dataframe with all patients IDs and samples IDs
with (ro.default_converter + pandas2ri.converter).context(): 
  all_ici_patients = iatlasGraphQLClient.query_dataset_samples(datasets = ici_datasets["name"]).merge( #add patient id info
    get_dataframe_from_query(iatlasGraphQLClient.query_sample_patients()))

#now we add the treatment information
with (ro.default_converter + pandas2ri.converter).context():
  #query the values of TCGA_Study and ICI_Rx for each sample
  treatment_df = iatlasGraphQLClient.query_tag_samples_parents(parent_tags = overview_treatment).merge( 
    all_ici_patients
  )

In [125]:
# We can summarise the information in this dataframe
summary = treatment_df.groupby(['dataset_display', 'parent_tag_name', 'tag_name']).agg(
    n_samples=pd.NamedAgg(column='sample_name', aggfunc='size'),
    n_patients=pd.NamedAgg(column='patient_name', aggfunc='nunique')
).reset_index()
summary


Unnamed: 0,dataset_display,parent_tag_name,tag_name,n_samples,n_patients
0,"Choueiri 2016 - KIRC, PD-1",ICI_Rx,nivolumab,32,16
1,"Gide 2019 - SKCM, PD-1 +/- CTLA4",ICI_Rx,ipilimumab_pembrolizumab,32,26
2,"Gide 2019 - SKCM, PD-1 +/- CTLA4",ICI_Rx,nivolumab,22,9
3,"IMVigor210 - BLCA, PD-L1",ICI_Rx,atezolizumab,348,348
4,"IMVigor210 - BLCA, PD-L1",TCGA_Study,BLCA,348,348
5,"IMmotion150 - KIRC, PD-L1",ICI_Rx,atezolizumab,174,174
6,"Liu 2019 - SKCM, PD-1",ICI_Rx,nivolumab,102,51
7,"Melero 2019 - GBM, Anti-PD-1",ICI_Rx,nivolumab,120,30
8,"Miao 2018 - KIRC, PD-1 +/- CTLA4, PD-L1",ICI_Rx,atezolizumab,2,2
9,"Miao 2018 - KIRC, PD-1 +/- CTLA4, PD-L1",ICI_Rx,nivolumab,22,11


Some of these datasets have more than one sample per patient - in those studies, some patients had samples collected before (pre_sample_treatment) and during (on_sample_treatment) ICI therapy.

In [133]:
with (ro.default_converter + pandas2ri.converter).context():
    timepoint = all_ici_patients.merge(
        iatlasGraphQLClient.query_tag_samples_parents(parent_tags = "Sample_Treatment"),
    ).groupby(['dataset_display', 'parent_tag_name', 'tag_name']).agg(
        n_samples=pd.NamedAgg(column='sample_name', aggfunc='size'),
        n_patients=pd.NamedAgg(column='patient_name', aggfunc='nunique')
    ).reset_index()

timepoint

Unnamed: 0,dataset_display,parent_tag_name,tag_name,n_samples,n_patients
0,"Chen 2016 - SKCM, Anti-CTLA4",Sample_Treatment,on_sample_treatment,15,15
1,"Chen 2016 - SKCM, Anti-CTLA4",Sample_Treatment,post_sample_treatment,7,7
2,"Chen 2016 - SKCM, Anti-CTLA4",Sample_Treatment,pre_sample_treatment,32,31
3,"Choueiri 2016 - KIRC, PD-1",Sample_Treatment,pre_sample_treatment,16,16
4,"Gide 2019 - SKCM, PD-1 +/- CTLA4",Sample_Treatment,on_sample_treatment,18,18
5,"Gide 2019 - SKCM, PD-1 +/- CTLA4",Sample_Treatment,pre_sample_treatment,73,73
6,"Hugo 2016 - SKCM, PD-1",Sample_Treatment,on_sample_treatment,1,1
7,"Hugo 2016 - SKCM, PD-1",Sample_Treatment,pre_sample_treatment,26,26
8,"IMVigor210 - BLCA, PD-L1",Sample_Treatment,pre_sample_treatment,348,348
9,"IMmotion150 - KIRC, PD-L1",Sample_Treatment,pre_sample_treatment,263,263


# Querying the CRI iAtlas database

Each type of data has a query function in `iatlasGraphQLClient`, as summarized below:

- For immunefeatures: `iatlasGraphQLClient.query_feature_values()`
- For gene expression data: `iatlasGraphQLClient.query_gene_expression()`
- For clinical annotation: `iatlasGraphQLClient.query_tag_samples_parents()`

To illustrate how to get the ICI data from the iAtlas database, we will get the data that follows the parameters listed below:

- *Dataset:* Hugo 2016 - SKCM, PD-1

- *Features:* IMPRES (Auslander et al, 2018), IPRES (Vincent lab analysis of Hugo et al data, unpublished), Cytolytic Score (Roufas et al, 2018), CTLA4/Th1 (Nishimura, 2004; Bindea et al., 2013)

- *Gene expression:* ADORA2A, CTLA4, EDNRB, TLR4

- *Clinical Annotation:* Response, Gender.

In [141]:
with (ro.default_converter + pandas2ri.converter).context():
    #For immune features, use names from features_df["name"]
    features = iatlasGraphQLClient.query_feature_values(cohorts = "HugoLo_IPRES_2016", 
                                                        features = ["IMPRES", 
                                                                    "Vincent_IPRES_NonResponder", 
                                                                    "Cytolytic_Score", 
                                                                    "BIOCARTA_CTLA4_V_Bindea_Th1_Cells"]) 

    #For gene expression, we need the gene Entrez ID to query the iAtlas database
    genes_entrez = [135,
                    1493,
                    1910,
                    7099]

    genes = iatlasGraphQLClient.query_gene_expression(cohorts = "HugoLo_IPRES_2016", 
                                                    entrez = genes_entrez)

    #For clinical annotation, use names from clinical_options["tag_name"] that have tag_type == "parent_tag"

    clinical_annotation  = iatlasGraphQLClient.query_tag_samples_parents(cohorts = "HugoLo_IPRES_2016", 
                                                        parent_tags = ["Response", "Gender"]) 


All tables are in long format

In [142]:
#All the tables are in the long format.
features.head()


Unnamed: 0,sample,feature_name,feature_display,feature_value,feature_order,feature_class
1,HugoLo_IPRES_2016-Pt01-ar-279,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.001012,-2147483648,Predictor of Response to Immune Checkpoint Tre...
2,HugoLo_IPRES_2016-Pt02-ar-280,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.190649,-2147483648,Predictor of Response to Immune Checkpoint Tre...
3,HugoLo_IPRES_2016-Pt04-ar-281,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.179757,-2147483648,Predictor of Response to Immune Checkpoint Tre...
4,HugoLo_IPRES_2016-Pt05-ar-282,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.171726,-2147483648,Predictor of Response to Immune Checkpoint Tre...
5,HugoLo_IPRES_2016-Pt06-ar-283,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.290074,-2147483648,Predictor of Response to Immune Checkpoint Tre...


In [143]:
genes.head()

Unnamed: 0,sample,entrez,hgnc,rna_seq_expr
1,HugoLo_IPRES_2016-Pt20-ar-294,7099,TLR4,1315.887
2,HugoLo_IPRES_2016-Pt22-ar-295,7099,TLR4,4520.116
3,HugoLo_IPRES_2016-Pt05-ar-282,7099,TLR4,394.303
4,HugoLo_IPRES_2016-Pt14-ar-290,7099,TLR4,2374.693
5,HugoLo_IPRES_2016-Pt06-ar-283,7099,TLR4,3383.161


In [144]:
clinical_annotation.head()

Unnamed: 0,sample_name,parent_tag_name,parent_tag_long_display,parent_tag_short_display,parent_tag_characteristics,parent_tag_color,parent_tag_order,parent_tag_type,tag_name,tag_long_display,tag_short_display,tag_characteristics,tag_color,tag_order,tag_type
1,HugoLo_IPRES_2016-Pt08-ar-285,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Respo...,-2147483648,1,parent_group,complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Resp...,#009E73,1,group
2,HugoLo_IPRES_2016-Pt27-ar-298,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Respo...,-2147483648,1,parent_group,complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Resp...,#009E73,1,group
3,HugoLo_IPRES_2016-Pt13-ar-289,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Respo...,-2147483648,1,parent_group,complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Resp...,#009E73,1,group
4,HugoLo_IPRES_2016-Pt09-ar-286,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Respo...,-2147483648,1,parent_group,complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Resp...,#009E73,1,group
5,HugoLo_IPRES_2016-Pt05-ar-282,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Respo...,-2147483648,1,parent_group,partial_response_response,Partial Response,Partial Response,Partial Response (PR) following modified Respo...,#0072B2,2,group
