# CRI iAtlas notebooks
## Exploring the Immune Checkpoint Inhibition data available in iAtlas.

Repo: https://github.com/CRI-iAtlas/iatlas-notebooks/ 

Notebook: ici_query_iatlas_data.ipynb 

Date: September 19, 2022 

Author: Carolina Heimann

---

notebook repo: https://github.com/CRI-iAtlas/iatlas-notebooks

landing page: https://www.cri-iatlas.org/

portal: https://isb-cgc.shinyapps.io/iatlas/

email: support@cri-iatlas.org

---

# Contents

- [Getting started](#Getting-started)
- [Exploring the ICI datasets and features](#Exploring-the-ICI-datasets-and-features)
    - [Datasets available](#Datasets-available)
    - [Immune Features](#Immune-Features)
    - [Clinical Annotation](#Clinical-Annotation)
    - [Gene Expression](#Gene-Expression)
    - [Annotation of response to immunotherapy](#Annotation-of-response-to-immunotherapy)
    - [Samples and treatments available](#Samples-and-treatments-available)
- [Querying the CRI iAtlas database](#Querying-the-CRI-iAtlas-database)

## Getting started

In [9]:
# We have a few libraries to install.
try({
    packages = c("magrittr", "dplyr", "tidyr", "dplyr", "tidyr", "ggplot2", "iatlasGraphQLClient")

    sapply(packages, function(x) {
      if (!require(x,character.only = TRUE))
        install.packages(x)
        suppressPackageStartupMessages(library(x,character.only = TRUE))
    })},
    silent=TRUE 
)

# helper functions
# git clone the notebook repo to get this file #
# or see: https://github.com/CRI-iAtlas/iatlas-notebooks/blob/main/functions/notebook_functions.R
source('functions/notebook_functions.R')

magrittr,dplyr,tidyr,dplyr.1,tidyr.1,ggplot2,iatlasGraphQLClient
iatlasGraphQLClient,iatlasGraphQLClient,iatlasGraphQLClient,iatlasGraphQLClient,iatlasGraphQLClient,iatlasGraphQLClient,iatlasGraphQLClient
ggplot2,ggplot2,ggplot2,ggplot2,ggplot2,ggplot2,ggplot2
tidyr,tidyr,tidyr,tidyr,tidyr,tidyr,tidyr
dplyr,dplyr,dplyr,dplyr,dplyr,dplyr,dplyr
magrittr,magrittr,magrittr,magrittr,magrittr,magrittr,magrittr
stats,stats,stats,stats,stats,stats,stats
graphics,graphics,graphics,graphics,graphics,graphics,graphics
grDevices,grDevices,grDevices,grDevices,grDevices,grDevices,grDevices
utils,utils,utils,utils,utils,utils,utils
datasets,datasets,datasets,datasets,datasets,datasets,datasets


# Exploring the ICI datasets and features


The iAtlas ICI data is stored in a database that can be queried with functions from the `iatlasGraphQLClient` package. 
We have clinical data, immune features, scores of predictors of response to immunotherapy, and quantile normalized gene expression.

You can get more information in iAtlas on [immune features and predictors of response to ICI](https://isb-cgc.shinyapps.io/iatlas/?module=datainfo), and our annotation of [immunomodulators](https://isb-cgc.shinyapps.io/iatlas/?module=immunomodulators) genes. You can access more information about these datasets in [iAtlas - ICI Datasets Overview](https://isb-cgc.shinyapps.io/iatlas/?module=ioresponse_overview) module.

As a first step, let's take a look at the available datasets and features.

## Datasets available

In [10]:
#getting ICI data
#datasets that we have in the iAtlas database
ici_datasets <- iatlasGraphQLClient::query_datasets(types = "ici")
ici_datasets

display,name,type
<chr>,<chr>,<chr>
"Chen 2016 - SKCM, Anti-CTLA4",Chen_CanDisc_2016,ici
"Choueiri 2016 - KIRC, PD-1",Choueiri_CCR_2016,ici
"Gide 2019 - SKCM, PD-1 +/- CTLA4",Gide_Cell_2019,ici
"Hugo 2016 - SKCM, PD-1",HugoLo_IPRES_2016,ici
"IMmotion150 - KIRC, PD-L1",IMmotion150,ici
"IMVigor210 - BLCA, PD-L1",IMVigor210,ici
"Kim 2018 - STAD, PD-1",Kim_NatMed_2018,ici
"Liu 2019 - SKCM, PD-1",Liu_NatMed_2019,ici
"Melero 2019 - GBM, Anti-PD-1",Melero_GBM_2019,ici
"Miao 2018 - KIRC, PD-1 +/- CTLA4, PD-L1",Miao_Science_2018,ici


The display name of the datasets makes reference to the publication associated with the data, and also summarises the tumor type and ICI target involved in the study.

## Immune Features

In [11]:
#immune features of all samples in the ici datasets
features_df <- iatlasGraphQLClient::query_features(cohorts = ici_datasets$name)
head(features_df)

name,display,class,order,unit,method_tag
<chr>,<chr>,<chr>,<int>,<chr>,<chr>
B_cells_Aggregate2,B Cells,Immune Cell Proportion - Common Lymphoid and Myeloid Cell Derivative Class,3.0,Fraction,CIBERSORT
B_cells_Aggregate3,B Cells,Immune Cell Proportion - Differentiated Lymphoid and Myeloid Cell Derivative Class,4.0,Fraction,CIBERSORT
B_cells_memory,B Cells Memory,Immune Cell Proportion - Original,9.0,Fraction,CIBERSORT
B_cells_naive,B Cells Naive,Immune Cell Proportion - Original,8.0,Fraction,CIBERSORT
BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,Predictor of Response to Immune Checkpoint Treatment,,Fraction,
Cytolytic_Score,Cytolytic Score,Predictor of Response to Immune Checkpoint Treatment,,Score,


## Clinical Annotation

In [12]:
#clinical annotation that is available for the ici datasets
clinical_options <- iatlasGraphQLClient::query_tags(datasets = ici_datasets$name)
head(clinical_options)

tag_name,tag_long_display,tag_short_display,tag_characteristics,tag_color,tag_order,tag_type
<chr>,<chr>,<chr>,<chr>,<lgl>,<int>,<chr>
Biopsy_Site,Biopsy Site,Biopsy Site,Site where sample was collected from.,,18.0,parent_group
Cancer_Tissue,Cancer Tissue,Cancer Tissue,Original tumor tissue.,,14.0,parent_group
Clinical_Benefit,Clinical Benefit,Clinical Benefit,Patients have clinical benefit when mRECIST response is different than Progressive Disease.,,4.0,parent_group
Clinical_Stage,Clinical Stage,Clinical Stage,Clinical stage of cancer.,,17.0,parent_group
FFPE,FFPE Samples,FFPE Samples,Indicates whether the sample is FFPE or not.,,20.0,parent_group
gender,Gender,Gender,,,,parent_group


## Gene Expression

In [13]:
#genes that we have expression data for all samples in the ici datasets (we will query expression values in the next section)
genes_df <- iatlasGraphQLClient::query_genes(cohorts = ici_datasets$name)
head(genes_df)

hgnc,entrez,description,friendly_name,io_landscape_name,gene_family,gene_function,immune_checkpoint,pathway,super_category
<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
A1BG,1,,,,,,,,
A1BG-AS1,503538,,,,,,,,
A1CF,29974,,,,,,,,
A2M,2,,,,,,,,
A2M-AS1,144571,,,,,,,,
A2ML1,144568,,,,,,,,


The independent variable to be used in the model corresponds to how a patient responds to immunotherapy. The response to therapy with Immune Checkpoint Inhitbitor is originally annotated following the guidelines of mRECIST, and has 4 different levels.

## Annotation of response to immunotherapy

The independent variable to be used in the model corresponds to how a patient responds to immunotherapy. The response to therapy with Immune Checkpoint Inhitbitor is originally annotated following the guidelines of mRECIST, and has 4 different levels.

In [14]:
iatlasGraphQLClient::query_tags(parent_tags = "Response") %>% 
    dplyr::arrange(tag_order)

tag_name,tag_long_display,tag_short_display,tag_characteristics,tag_color,tag_order,tag_type
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>
complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#009E73,1,group
partial_response_response,Partial Response,Partial Response,Partial Response (PR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#0072B2,2,group
stable_disease_response,Stable Disease,Stable Disease,Stable Disease (SD) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#F0E442,3,group
progressive_disease_response,Progressive Disease,Progressive Disease,Progressive Disease (PD) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#D55E00,4,group
na_response,Not available,Not available,Response information not available,#868A88,5,group


We will use as possible outcome variable Responder, Clinical Benefit and Progressor, which consist in grouping different levels of mRECIST into two categories.

In [15]:
outcome_variables  <- c("Responder", "Clinical_Benefit", "Progression")
iatlasGraphQLClient::query_tags_with_parent_tags(parent_tags = outcome_variables) %>%
    dplyr::select(parent_tag_name, parent_tag_characteristics, tag_name, tag_short_display, tag_characteristics) %>% 
    dplyr::arrange(parent_tag_name)

parent_tag_name,parent_tag_characteristics,tag_name,tag_short_display,tag_characteristics
<chr>,<chr>,<chr>,<chr>,<chr>
Clinical_Benefit,Patients have clinical benefit when mRECIST response is different than Progressive Disease.,false_clinical_benefit,No Clinical Benefit,Patient with mRECIST of Progressive Disease
Clinical_Benefit,Patients have clinical benefit when mRECIST response is different than Progressive Disease.,na_clinical_benefit,Not available,Clinical Benefit information not available
Clinical_Benefit,Patients have clinical benefit when mRECIST response is different than Progressive Disease.,true_clinical_benefit,Clinical Benefit,"Patient with mRECIST of Complete Response, Partial Response, or Stable Disease"
Progression,Progressors are defined as patients with mRECIST of Progressive Disease.,false_progression,Non-Progressor,"Patient with mRECIST of Complete Response, Partial Response, or Stable Disease"
Progression,Progressors are defined as patients with mRECIST of Progressive Disease.,na_progression,Not available,Progression information not available
Progression,Progressors are defined as patients with mRECIST of Progressive Disease.,true_progression,Progressor,Patient with mRECIST of Progressive Disease
Responder,"Responders are defined as patients with mRECIST of Partial Response or Complete Response, whereas Non-Responders are those with Progressive Disease or Stable Disease.",false_responder,Non-Responder,Patient with mRECIST of Progressive Disease or Stable Disease
Responder,"Responders are defined as patients with mRECIST of Partial Response or Complete Response, whereas Non-Responders are those with Progressive Disease or Stable Disease.",na_responder,Not available,Responder information not available
Responder,"Responders are defined as patients with mRECIST of Partial Response or Complete Response, whereas Non-Responders are those with Progressive Disease or Stable Disease.",true_responder,Responder,Patient with mRECIST of Partial Response or Complete Response


## Samples and treatments available

Now, let's take a closer look at the ICI datasets that we have available. First, we will query our database and organize the results to see the TCGA Study and drug administered at each one of the studies:

In [16]:
#Treatment information
overview_treatment <- c("TCGA_Study", "ICI_Rx") #name of the groups of interest. Check the clinical_options df for more options

#Organize a dataframe with all patients IDs and samples IDs 
all_ici_patients  <- iatlasGraphQLClient::query_dataset_samples(datasets = ici_datasets$name) %>% #get samples id for the ICI datasets
  dplyr::inner_join( #add patient id info
    iatlasGraphQLClient::query_sample_patients(),
    by = "sample_name"
  ) 

all_ici_patients %>% 
  dplyr::inner_join(
    iatlasGraphQLClient::query_tag_samples_parents(parent_tags = overview_treatment), #query the values of TCGA_Study and ICI_Rx for each sample
    by = "sample_name") %>% 
  get_wide_df(., #this function converts the dataframe from a long to a wide format
              names_from_column = "parent_tag_name", 
              values_from_column = "tag_name", 
              columns_to_keep = c("patient_name", "sample_name", "dataset_display")) %>% 
  dplyr::group_by(dataset_display, dplyr::across(dplyr::all_of(overview_treatment))) %>%
  dplyr::summarise(
    n_patients = n_distinct(patient_name),
    n_samples = dplyr::n_distinct(sample_name),
  .groups = "drop")

dataset_display,TCGA_Study,ICI_Rx,n_patients,n_samples
<chr>,<chr>,<chr>,<int>,<int>
"Choueiri 2016 - KIRC, PD-1",KIRC,nivolumab,16,16
"Gide 2019 - SKCM, PD-1 +/- CTLA4",SKCM,ipilimumab_nivolumab,8,9
"Gide 2019 - SKCM, PD-1 +/- CTLA4",SKCM,ipilimumab_pembrolizumab,26,32
"Gide 2019 - SKCM, PD-1 +/- CTLA4",SKCM,nivolumab,9,11
"Gide 2019 - SKCM, PD-1 +/- CTLA4",SKCM,pembro_ici_rx,32,39
"Hugo 2016 - SKCM, PD-1",SKCM,pembro_ici_rx,27,27
"IMmotion150 - KIRC, PD-L1",KIRC,atezolizumab,174,174
"IMmotion150 - KIRC, PD-L1",KIRC,none_ICI_Rx,89,89
"IMVigor210 - BLCA, PD-L1",BLCA,atezolizumab,348,348
"Kim 2018 - STAD, PD-1",STAD,pembro_ici_rx,45,45


Some of these datasets have more than one sample per patient - in those studies, some patients had samples collected before (pre_sample_treatment) and during (on_sample_treatment) ICI therapy.

In [17]:
all_ici_patients %>%
  dplyr::inner_join(
    iatlasGraphQLClient::query_tag_samples_parents(parent_tags = "Sample_Treatment"),
    by = "sample_name") %>% 
  get_wide_df(., 
              names_from_column = "parent_tag_name", 
              values_from_column = "tag_name", 
              columns_to_keep = c("patient_name", "sample_name", "dataset_display")) %>% 
  dplyr::group_by(dataset_display, dplyr::across(dplyr::all_of("Sample_Treatment"))) %>%
  dplyr::summarise(
    n_patients = n_distinct(patient_name),
    n_samples = dplyr::n_distinct(sample_name),
  .groups = "drop")

dataset_display,Sample_Treatment,n_patients,n_samples
<chr>,<chr>,<int>,<int>
"Chen 2016 - SKCM, Anti-CTLA4",on_sample_treatment,15,15
"Chen 2016 - SKCM, Anti-CTLA4",post_sample_treatment,7,7
"Chen 2016 - SKCM, Anti-CTLA4",pre_sample_treatment,31,32
"Choueiri 2016 - KIRC, PD-1",pre_sample_treatment,16,16
"Gide 2019 - SKCM, PD-1 +/- CTLA4",on_sample_treatment,18,18
"Gide 2019 - SKCM, PD-1 +/- CTLA4",pre_sample_treatment,73,73
"Hugo 2016 - SKCM, PD-1",on_sample_treatment,1,1
"Hugo 2016 - SKCM, PD-1",pre_sample_treatment,26,26
"IMmotion150 - KIRC, PD-L1",pre_sample_treatment,263,263
"IMVigor210 - BLCA, PD-L1",pre_sample_treatment,348,348


# Querying the CRI iAtlas database

Each type of data has a query function in `iatlasGraphQLClient`, as summarized below:

- For immunefeatures: `iatlasGraphQLClient::query_feature_values()`
- For gene expression data: `iatlasGraphQLClient::query_gene_expression()`
- For clinical annotation: `iatlasGraphQLClient::query_tag_samples_parents()`

To illustrate how to get the ICI data from the iAtlas database, we will get the data that follows the parameters listed below:

- *Dataset:* Hugo 2016 - SKCM, PD-1

- *Features:* IMPRES (Auslander et al, 2018), IPRES (Vincent lab analysis of Hugo et al data, unpublished), Cytolytic Score (Roufas et al, 2018), CTLA4/Th1 (Nishimura, 2004; Bindea et al., 2013)

- *Gene expression:* ADORA2A, CTLA4, EDNRB, TLR4

- *Clinical Annotation:* Response, Gender.

In [18]:
#For immune features, use names from features_df$name
features   <- iatlasGraphQLClient::query_feature_values(cohorts = "HugoLo_IPRES_2016", 
                                                      features = c("IMPRES", 
                                                                   "Vincent_IPRES_NonResponder", 
                                                                   "Cytolytic_Score", 
                                                                   "BIOCARTA_CTLA4_V_Bindea_Th1_Cells")) 

#For gene expression, we need the gene Entrez ID to query the iAtlas database
genes_entrez <- c(135,
                1493,
                1910,
                7099)

genes <- iatlasGraphQLClient::query_gene_expression(cohorts = "HugoLo_IPRES_2016", 
                                                  entrez = na.omit(as.numeric(genes_entrez)))

#For clinical annotation, use names from clinical_options$tag_name that have tag_type == "parent_tag"

clinical_annotation  <- iatlasGraphQLClient::query_tag_samples_parents(cohorts = "HugoLo_IPRES_2016", 
                                                       parent_tags = c("Response", "Gender")) 

#All the tables are in the long format.
head(features)
head(genes)
head(clinical_annotation)

sample,feature_name,feature_display,feature_value,feature_order,feature_class
<chr>,<chr>,<chr>,<dbl>,<lgl>,<chr>
HugoLo_IPRES_2016-Pt01-ar-279,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.001012,,Predictor of Response to Immune Checkpoint Treatment
HugoLo_IPRES_2016-Pt02-ar-280,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.190649,,Predictor of Response to Immune Checkpoint Treatment
HugoLo_IPRES_2016-Pt04-ar-281,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.179757,,Predictor of Response to Immune Checkpoint Treatment
HugoLo_IPRES_2016-Pt05-ar-282,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.171726,,Predictor of Response to Immune Checkpoint Treatment
HugoLo_IPRES_2016-Pt06-ar-283,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,1.290074,,Predictor of Response to Immune Checkpoint Treatment
HugoLo_IPRES_2016-Pt07-ar-284,BIOCARTA_CTLA4_V_Bindea_Th1_Cells,CTLA4 vs Th1,0.924285,,Predictor of Response to Immune Checkpoint Treatment


sample,entrez,hgnc,rna_seq_expr
<chr>,<int>,<chr>,<dbl>
HugoLo_IPRES_2016-Pt08-ar-285,135,ADORA2A,55.307
HugoLo_IPRES_2016-Pt06-ar-283,135,ADORA2A,57.222
HugoLo_IPRES_2016-Pt05-ar-282,135,ADORA2A,58.891
HugoLo_IPRES_2016-Pt12-ar-288,135,ADORA2A,61.801
HugoLo_IPRES_2016-Pt19-ar-293,135,ADORA2A,98.325
HugoLo_IPRES_2016-Pt31-ar-302,135,ADORA2A,100.276


sample_name,parent_tag_name,parent_tag_long_display,parent_tag_short_display,parent_tag_characteristics,parent_tag_color,parent_tag_order,parent_tag_type,tag_name,tag_long_display,tag_short_display,tag_characteristics,tag_color,tag_order,tag_type
<chr>,<chr>,<chr>,<chr>,<chr>,<lgl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>
HugoLo_IPRES_2016-Pt08-ar-285,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#009E73,1,group
HugoLo_IPRES_2016-Pt09-ar-286,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#009E73,1,group
HugoLo_IPRES_2016-Pt13-ar-289,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#009E73,1,group
HugoLo_IPRES_2016-Pt27-ar-298,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,complete_response_response,Complete Response,Complete Response,Complete Response (CR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#009E73,1,group
HugoLo_IPRES_2016-Pt02-ar-280,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,partial_response_response,Partial Response,Partial Response,Partial Response (PR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#0072B2,2,group
HugoLo_IPRES_2016-Pt04-ar-281,Response,mRECIST Response,mRECIST Response,Response to treatment following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines.,,1,parent_group,partial_response_response,Partial Response,Partial Response,Partial Response (PR) following modified Response Evaluation Criteria in Solid Tumors (mRECIST) guidelines,#0072B2,2,group
