# Conservation-based Synthetic Lethal Search

## Introduction

### Rationale

### Use-cases:
* Prioritize human candidate synthetic lethal interactions based on prior evidence of interaction in yeast SL screens
* _de novo_ discovery of SL interactions

### Approach
This notebook re-implements the approach outlined in Srivas et al. (2016)

### Usage:
Add genes of interest to "inputGenes" value, then run the next step.
Example: inputGenes = "'DDX3X','DICER1','DROSHA','TNFRSF14','TRAF7','TSC1','POLG','FBXO11','PRDM1','RFWD3','AMER1','LZTR1','ATP2B3'"

### Workflow Overview

### Datasets
#### Yeast Synthetic Lethal Interactions
Constanzo et al. (2016)
#### Human to Yeast Ortholog Mapping
detailed treatement in the accompanying notebook (Mapping human to yeast orthologs)
#### Human Tumor Suppressor Genes


### References
* Costanzo M, VanderSluis B, Koch EN, Baryshnikova A, Pons C, Tan G, Wang W, Usaj M, Hanchard J, Lee SD, Pelechano V, Styles EB, Billmann M, van Leeuwen J, van Dyk N, Lin ZY, Kuzmin E, Nelson J, Piotrowski JS, Srikumar T, Bahr S, Chen Y, Deshpande R, Kurat CF, Li SC, Li Z, Usaj MM, Okada H, Pascoe N, San Luis BJ, Sharifpoor S, Shuteriqi E, Simpkins SW, Snider J, Suresh HG, Tan Y, Zhu H, Malod-Dognin N, Janjic V, Przulj N, Troyanskaya OG, Stagljar I, Xia T, Ohya Y, Gingras AC, Raught B, Boutros M, Steinmetz LM, Moore CL, Rosebrock AP, Caudy AA, Myers CL, Andrews B, Boone C. **A global genetic interaction network maps a wiring diagram of cellular function.** Science. 2016 Sep 23;353(6306). pii: aaf1420. PubMed PMID: 27708008; PubMed Central PMCID: PMC5661885.
* Srivas R, Shen JP, Yang CC, Sun SM, Li J, Gross AM, Jensen J, Licon K, Bojorquez-Gomez A, Klepper K, Huang J, Pekin D, Xu JL, Yeerna H, Sivaganesh V, Kollenstart L, van Attikum H, Aza-Blanc P, Sobol RW, Ideker T. **A Network of Conserved Synthetic Lethal Interactions for Exploration of Precision Cancer Therapy**. Mol Cell. 2016 Aug 4;63(3):514-25. doi:10.1016/j.molcel.2016.06.022.Epub 2016 Jul 21. PubMed PMID: 27453043; PubMed Central PMCID: PMC5209245. 

## Preamble
This section describes how to setup the analysis environment appropriately, including google cloud platform authentication and importing all the relevant python libraries.

### Setup Analysis Environment

In [1]:
! pip install google-cloud-bigquery



In [2]:
# google cloud authentication
from google.cloud import bigquery


In [3]:
# import modules
import sys
import matplotlib.pyplot as plt
import pandas as pd
import scipy
from scipy import stats 
import numpy as np
import json
import statsmodels.stats.multitest as multi
import matplotlib.pyplot as plt
import math
import ipywidgets as widgets
import plotly
import plotly.express as px


In [4]:
# !gcloud auth login
!gcloud auth application-default login

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=kUr9qkiXSWEoWl4HVcjBcmcpX0dKjw&access_type=offline&code_challenge=lb58RLC2m0r6YYScV7YrmzIsQwNrLp9ay1ZOxWarNYU&code_challenge_method=S256


Credentials saved to file: [C:\Users\salta\AppData\Roaming\gcloud\application_default_credentials.json]

These credentials will be used by any library that requests Application Default Credentials (ADC).

Quota project "isb-cgc-04-0010" was added to ADC which can be used by Google client libraries for billing and quota. Note that some services may still bill the project owning the resource.


In [5]:
# Choose the project to be used for bigquery
project_id='syntheticlethality'
client = bigquery.Client(project_id) # Replace XXXXXXXX with your project ID

In [6]:
%load_ext google.cloud.bigquery

## Define a set of cancer-relevant tumor suppressor genes (TSGs)

In this workflow, the search for relevant synthetic lethal interactions is seeded by defining a set of tumor suppressor genes (TSGs) of interest. There are various strategies for obtaining such a list, here we give an example of mining the [COSMIC Cancer Gene Census](https://cancer.sanger.ac.uk/census) for TSG annotations and then prioritizing the list based on driver status or frequency of alteration in a cancer type of interest.

If you want to get the SL interactions for genes of interest, please add the genes to "inputGenes".

In [7]:
%%bigquery tsg
SELECT *
FROM `isb-cgc.COSMIC_v90_grch38.Cancer_Gene_Census` 
WHERE Role_in_Cancer = "TSG"

In [8]:
tsg.head()

Unnamed: 0,Gene_Symbol,Name,Entrez_GeneId,Genome_Location,Tier,Hallmark,Chr_Band,Somatic,Germline,Tumour_Types_Somatic,Tumour_Types_Germline,Cancer_Syndrome,Tissue_Type,Molecular_Genetics,Role_in_Cancer,Mutation_Types,Translocation_Partner,Other_Germline_Mut,Other_Syndrome,Synonyms
0,ACVR2A,activin A receptor type 2A,92,2:147844517-147930824,1,,23.1,yes,,"large intestine carcinoma, stomach carcinoma, ...",,,E,Rec,TSG,"Mis, N, F",,,,"92,ACTRII,ACVR2,ACVR2A,ENSG00000121989.14,P27037"
1,APC,adenomatous polyposis of the colon gene,324,5:112737888-112846239,1,Yes,22.2,yes,yes,"colorectal, pancreatic, desmoid, hepatoblastom...","colorectal, pancreatic, desmoid, hepatoblastom...",adenomatous polyposis coli; Turcot syndrome,"E, M, O",Rec,TSG,"D, Mis, N, F, S",,,,"324,APC,DP2,DP2.5,DP3,ENSG00000134982.16,P2505..."
2,ARHGEF10,Rho guanine nucleotide exchange factor 10,9639,8:1824015-1958641,2,,23.3,yes,,colon cancer,,,E,,TSG,D,,,,"9639,ARHGEF10,ENSG00000104728.15,Gef10,KIAA029..."
3,ARHGEF10L,Rho guanine nucleotide exchange factor 10 like,55160,1:17539835-17697869,2,,36.13,yes,,lymphoma,,,L,,TSG,D,,,,"55160,ARHGEF10L,ENSG00000074964.16,FLJ10521,KI..."
4,ARID1B,AT rich interactive domain 1B,57492,6:156777374-157210779,1,,25.3,yes,,"breast, hepatocellular carcinoma, clear cell o...",,,E,Rec,TSG,"Mis, F, N, O",,,,"57492,6A3-5,ARID1B,BAF250b,DAN15,ELD/OSA1,ENSG..."


In [9]:
# generate a list for inputGenes (Please go to the next block if you want to use your genes instead of tumor suppressor genes)
tumor_suppressor_genes = tsg["Gene_Symbol"].tolist() 
inputGenes = ["'"+x+"'" for x in tumor_suppressor_genes]
inputGenes = ','.join(inputGenes)
inputGenes

"'ACVR2A','APC','ARHGEF10','ARHGEF10L','ARID1B','ARID2','ASXL1','ASXL2','ATM','ATR','ATRX','AXIN1','AXIN2','B2M','BAP1','BARD1','BAX','BAZ1A','BRCA1','BRCA2','CASP3','CASP8','CDC73','CDH1','CDK12','CDKN1B','CDKN2A','CDKN2C','CHD2','CNTNAP2','CSMD3','CTCF','CYLD','DDX3X','DICER1','DROSHA','EED','ELF3','ETNK1','FAS','FAT1','FAT4','FBLN2','FBXW7','GPC5','GRIN2A','HNF1A','ID3','IGF2BP2','KDM5C','KEAP1','KLF6','KMT2C','LARP4B','LATS1','LATS2','LEPROTL1','LRP1B','MAX','MED12','MEN1','MLH1','MSH2','NF2','NFKBIE','PBRM1','PHF6','PHOX2B','PIK3R1','POLE','PPP2R1A','PRDM2','PTCH1','PTEN','PTPN13','PTPRB','PTPRD','PTPRT','RAD17','RB1','RBM10','RNF43','ROBO2','SDHA','SETD1B','SETD2','SFRP4','SH2B3','SIRPA','SMAD2','SMAD3','SMAD4','SMARCA4','SMARCB1','SOCS1','SOX21','SPEN','SPOP','STAG1','STAG2','STK11','SUFU','TET2','TGFBR2','TNFAIP3','TNFRSF14','TRAF7','TSC1','TSC2','USP44','VHL','WNK2','ZFHX3','ZMYM3','ZNRF3','ZRSR2','BLM','BRIP1','BUB1B','CHEK2','ERCC2','ERCC3','ERCC4','ERCC5','EXT2','FANCA','FA

In [10]:
# please skip this block if you want to keep using tumor suppressor genes as an input
#inputGenes = ""

## Map Yeast Orthologs & Get SL insteractions

In [10]:
sql = '''
WITH
--- Retreive YeastSymbols mapped to HumanSymbols for the input genes
INPUT_H2Y AS (
  SELECT YeastSymbol
    FROM `syntheticlethality.gene_information.human2Yeast`
   WHERE HumanSymbol IN (__INPUTGENES__) AND
         AlgorithmsMatch >= __ALGORITHMCUTOFF__
),

--- Identify protein-protein interactions using the YeastSymbols (left match)
Yeast_ITX1 AS (
  SELECT UPPER(Query_allele_name)       AS Interactor1, 
         UPPER(Array_allele_name)       AS Interactor2,
         Genetic_interaction_score_____ AS Interaction_score,
         P_value
    FROM `syntheticlethality.CellMap.CellMap`
   WHERE (Genetic_interaction_score_____ < __SCORECUTOFF__ AND P_value < __PvalueCUTOFF__) AND
         (UPPER(Query_allele_name) IN (SELECT YeastSymbol FROM INPUT_H2Y))
   
),

--- Identify protein-protein interactions using the YeastSymbols (right match)
Yeast_ITX2 AS (
  SELECT UPPER(Array_allele_name)       AS Interactor1, 
         UPPER(Query_allele_name)       AS Interactor2,
         Genetic_interaction_score_____ AS Interaction_score,
         P_value
    FROM `syntheticlethality.CellMap.CellMap`
   WHERE (Genetic_interaction_score_____ < __SCORECUTOFF__ AND P_value < __PvalueCUTOFF__) AND
         (UPPER(Array_allele_name) IN (SELECT YeastSymbol FROM INPUT_H2Y))
   
),

--- Union interaction tables
Union_ITX AS (
  SELECT * FROM Yeast_ITX1
   UNION ALL
  SELECT * FROM Yeast_ITX2
)

--- Convert YeastSymbols to HumanSymbols in the protein-protein interations
SELECT DISTINCT 
       GINFO1.EntrezID        AS EntrezID_Input,
       H2Y1.HumanSymbol       AS Gene_Input,
---       Add if you want to know what yeast genes are involved
---       YITX.Interactor1       AS Gene_Input_Yeast,
       GINFO2.EntrezID        AS EntrezID_SL_Candidate,
       H2Y2.HumanSymbol       AS Gene_SL_Candidate,
---       Add if you want to know what yeast genes are involved
---       YITX.Interactor2       AS Gene_SL_Candidate_Yeast,
       YITX.Interaction_score AS Interaction_score,
       YITX.P_value           AS P_value
       
  FROM Union_ITX AS YITX
       LEFT JOIN `syntheticlethality.gene_information.human2Yeast`                       AS H2Y1   ON YITX.Interactor1 = H2Y1.YeastSymbol
       LEFT JOIN `syntheticlethality.gene_information.human2Yeast`                       AS H2Y2   ON YITX.Interactor2 = H2Y2.YeastSymbol
       LEFT JOIN `syntheticlethality.gene_information.gene_info_human_HGNC` AS GINFO1 ON H2Y1.HumanID = GINFO1.HGNCID
       LEFT JOIN `syntheticlethality.gene_information.gene_info_human_HGNC` AS GINFO2 ON H2Y2.HumanID = GINFO2.HGNCID
       
 WHERE (H2Y1.HumanSymbol IS NOT NULL AND YITX.Interactor1 IS NOT NULL) AND
       (H2Y2.HumanSymbol IS NOT NULL AND YITX.Interactor2 IS NOT NULL)

'''
# select the thresholds to be used
cutoff_algorithmMatchNo = "3"
cutoff_score = "-0.35"
cutoff_p = "0.01"

sql = sql.replace("__INPUTGENES__", inputGenes)
sql = sql.replace("__ALGORITHMCUTOFF__", cutoff_algorithmMatchNo)
sql = sql.replace("__SCORECUTOFF__", cutoff_score)
sql = sql.replace("__PvalueCUTOFF__", cutoff_p)

res = client.query(sql).to_dataframe()



## Get Yeast SL Interactions

In [11]:
# shore the SL partner genes for the input genes
res


Unnamed: 0,EntrezID_Input,Gene_Input,EntrezID_SL_Candidate,Gene_SL_Candidate,Interaction_score,P_value
0,4436.0,MSH2,271,AMPD2,-0.4561,5.070000e-11
1,4436.0,MSH2,272,AMPD3,-0.4561,5.070000e-11
2,4436.0,MSH2,270,AMPD1,-0.4561,5.070000e-11
3,9739.0,SETD1A,55666,NPLOC4,-0.3786,1.475000e-16
4,23067.0,SETD1B,55666,NPLOC4,-0.3786,1.475000e-16
...,...,...,...,...,...,...
452,2237.0,FEN1,1195,CLK1,-0.4042,3.388000e-11
453,2237.0,FEN1,1196,CLK2,-0.4042,3.388000e-11
454,2237.0,FEN1,1198,CLK3,-0.4042,3.388000e-11
455,2237.0,FEN1,57396,CLK4,-0.4042,3.388000e-11


## Write to file & bigQuery Table

In [12]:
res.to_csv(path_or_buf='conserved_SL_output.csv', index=False)

## Assess Gene Druggability

In [None]:
# Available gene categories in current release:
!curl http://dgidb.org/api/v2/gene_categories.json

In [None]:
# Get all genes that are in the druggable genome
# parameterization
# category_of_interest=['kinase', 'phospolipase']
!curl http://dgidb.org/api/v2/genes_in_category.json?category=druggable%20genome