<table style="border:2px solid white;" cellspacing="0" cellpadding="0" border-collapse: collapse; border-spacing: 0;>
  <tr> 
    <th style="background-color:white"> <img src="../media/ccal-logo-D3.png" width=225 height=225></th>
    <th style="background-color:white"> <img src="../media/logoMoores.jpg" width=175 height=175></th>
    <th style="background-color:white"> <img src="../media/GP.png" width=200 height=200></th>
    <th style="background-color:white"> <img src="../media/UCSD_School_of_Medicine_logo.png" width=175 height=175></th> 
    <th style="background-color:white"> <img src="../media/Broad.png" width=130 height=130></th> 
  </tr>
</table>

<hr style="border: none; border-bottom: 3px solid #88BBEE;">

# **Onco-*GPS* Methodology**
## **Chapter 1.  Generating Oncogenic Activation Signature** 

**Authors:** William Kim$^{1}$, Huwate Yeerna$^{2}$, Taylor Cavazos$^{2}$, Kate Medet-Ernar$^{2}$, Clarence Mah$^{3}$, Stephanie Ting$^{2}$, Jason Park$^{2}$, Jill P. Mesirov$^{2, 3}$ and Pablo Tamayo$^{2,3}$.

1. Eli and Edythe Broad Institute      
2. UCSD Moores Cancer Center
3. UCSD School of Medicine 

**Date:** April 17, 2017

**Article:** [*Kim et al.* Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States](https://drive.google.com/file/d/0B0MQqMWLrsA4b2RUTTAzNjFmVkk/view?usp=sharing)

**Analysis overview**

In this chapter we will execute the first step in the Onco-GPS methodology: generating the oncogenic activation signature.

<img src="../media/method_chap1.png" width=2144 height=1041>
 
The Onco-GPS method makes use of a signature from an isogenic system that provides clean and direct transcriptional information relevant to the transcriptional changes associated with the activation of an oncogene; while at the same time incorporating diverse regulatory circuits inherently represented across multiple cellular contexts in a reference dataset. This deconvolves the functional consequences of oncogene activation in a more direct and unambiguous way. 

In this notebook we will generate a KRAS signature based on RNASeq profiling of lentiviral constructs of KRAS mut G12 vs. controls in lung SALE epithelial cell lines. We performed pilot experiments to identify optimal set of conditions (time, viral titer) to carry out the  experiments. This KRAS signature will contain the set of top 1,000 differentially expressed genes, (top 500, bottom 500), according to the Information Coefficient (IC).  The Information Coefficient (IC) ([*Linfoot 1957*](http://www.sciencedirect.com/science/article/pii/S001999585790116X); [*Joe 1989*](https://www.jstor.org/stable/2289859?seq=1#page_scan_tab_contents); [*Kim, J.W., Botvinnik 2016*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4868596/)) is a normalized version of the mutual information defined as,

$$IC(x, y) = sign(\rho(x,y)) \sqrt[]{ (1 - \exp(2I(x,y))} $$

where $I(x, y)$  is the differential mutual information between $x$, the KRAS mut vs. cntrl binary phenotype, and $y$, the expression profile for each gene. This quantity  lies in the range [-1, 1], in analogy with the correlation coefficient. The sign of the correlation coefficient $\rho(x, y)$ is used to provide directionality. The differential [mutual Information](https://en.wikipedia.org/wiki/Mutual_information) $I(x, y)$  is a function of the ratio of joint and marginal probabilities, 

$$I(x,y) = \int \int P(x, y) \log \frac{P(x,y)}{P(x)P(y)} dx dy = H(x, y) - H(x) - H(y).$$

The $H(x,y)$, $H(x)$ and $H(y)$ are the joint and marginal [entropies](https://en.wikipedia.org/wiki/Entropy_(information_theory). Estimating the mutual information between a phenotype and gene expression profiles requires the empirical approximation of continuous probability density distributions using kernel [density estimators](https://en.wikipedia.org/wiki/Density_estimation) ([*Sheather 2004*](http://www.stat.washington.edu/courses/stat527/s13/readings/Sheather_StatSci_2004.pdf)).

Go to the [next chapter (2)](2 Decomposing Signature and Defining Transcriptional Components.ipynb).
Back to the [introduction chapter (0)](0 Introduction and Overview.ipynb).


<hr style="border: none; border-bottom: 3px solid #88BBEE;">
### 1. Set up notebook and import Computational Cancer Analysis Library ([CCAL](https://github.com/KwatME/ccal))

In [10]:
from environment import *

%matplotlib inline
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### 2. Read gene expression dataset 

In [25]:
%%HTML
<!--!AUTO_EXEC-->
<paper-material class="task-widget" elevation="1"><div class="task-widget-header item-header"><h2>Read gene expression</h2></div><iron-collapse class="task-widget-inner"><div class="task-widget-content item-content"><div class="widget-info"></div><form is="iron-form" class="task-widget-form"><div class="widget-form-panel"><div class="field-group field-required_args-group"><div class="item-header"><h3>Input</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for filepath" name="required_args" value="'../data/kras_isogenic_vs_imortalized.gct'" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content">Filepath to file to be read</div></iron-collapse></div></iron-collapse></div><div class="field-group field-returns-group"><div class="item-header"><h3>Output</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for gene_exp" name="returns" value="gene_exp" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse></div></iron-collapse></div></div><button class="form-submit-button-wrapper"><paper-button class="form-submit-button" raised="">run<iron-icon icon="assessment"></iron-icon></paper-button></button></form></div></iron-collapse></paper-material>
<!--{"Read gene expression":{"description":"","library_path":"../tools/","library_name":"ccal.support.file","function_name":"read_gct","required_args":[{"label":"Label for filepath","description":"Filepath to file to be read","name":"filepath","value":"'../data/kras_isogenic_vs_imortalized.gct'"}],"default_args":[],"optional_args":[],"returns":[{"label":"Label for gene_exp","description":"","value":"gene_exp"}]}}-->

### 3. Generate oncogenic signature 
As mentioned in the introduction the signature will consist of the genes that have expression profiles that are associated, i.e. share information as estimated by the IC, with the KRAS mut vs. cntrl phenotype. 

In [26]:
%%HTML
<!--!AUTO_EXEC-->
<paper-material class="task-widget" elevation="1"><div class="task-widget-header item-header"><h2>Define KRAS mutant phenotype versus the control phenotype</h2></div><iron-collapse class="task-widget-inner"><div class="task-widget-content item-content"><div class="widget-info">This is a vector of 1 and -1 indicating which samples are KRAS mut and which are controls (see the following article for details: https://drive.google.com/file/d/0B0MQqMWLrsA4b2RUTTAzNjFmVkk/view)</div><form is="iron-form" class="task-widget-form"><div class="widget-form-panel"><div class="field-group field-required_args-group"><div class="item-header"><h3>Input</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for iterable" name="required_args" value="[1, 1, 1, 1, 1, 1, -1, -1, -1, -1]" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse></div></iron-collapse></div><div class="field-group field-optional_args-group"><div class="item-header"><h3>Optional Input</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for index" name="optional_args" value="gene_exp.columns"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse><div class="input-parent"><paper-input label="Label for name" name="optional_args" value="'KRAS mut vs. cntrl'"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse></div></iron-collapse></div><div class="field-group field-returns-group"><div class="item-header"><h3>Output</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for target" name="returns" value="target" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse></div></iron-collapse></div></div><button class="form-submit-button-wrapper"><paper-button class="form-submit-button" raised="">run<iron-icon icon="assessment"></iron-icon></paper-button></button></form></div></iron-collapse></paper-material>
<!--{"Define KRAS mutant phenotype versus the control phenotype":{"description":"This is a vector of 1 and -1 indicating which samples are KRAS mut and which are controls (see the following article for details: https://drive.google.com/file/d/0B0MQqMWLrsA4b2RUTTAzNjFmVkk/view)","library_path":"../tools/","library_name":"ccal.support.d1","function_name":"make_series","required_args":[{"label":"Label for iterable","description":"","name":"iterable","value":"[1, 1, 1, 1, 1, 1, -1, -1, -1, -1]"}],"default_args":[],"optional_args":[{"label":"Label for index","description":"","name":"index","value":"gene_exp.columns"},{"label":"Label for name","description":"","name":"name","value":"'KRAS mut vs. cntrl'"}],"returns":[{"label":"Label for target","description":"","value":"target"}]}}-->

In [27]:
%%HTML
<!--!AUTO_EXEC-->
<paper-material class="task-widget" elevation="1"><div class="task-widget-header item-header"><h2>Find top differentially expressed genes between KRAS mutant and control</h2></div><iron-collapse class="task-widget-inner"><div class="task-widget-content item-content"><div class="widget-info">This is the main function used in this notebook. It computes the association between the phenotype and the gene expression profiles as described in the introduction above. At completion this function will produce a heatmap (SIGNATURE.pdf) and a text file (SIGNATURE.txt) where the genes have been sorted by their association with the phenotype as measured by the IC. The function also computes a bootstrap confidence interval for the IC (shown in parenthesis) and the p-values and False Discovery Rates (FDR) using an empirical permutations test (using n_permutations times the number of genes). The heatmap below shows the 20 genes on top (UP) and at the bottom (DOWN) of the list. The gene names are on the left of the heatmap. This computation takes a few hours and therefore it is desirable to run overnight.</div><form is="iron-form" class="task-widget-form"><div class="widget-form-panel"><div class="field-group field-required_args-group"><div class="item-header"><h3>Input</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for target" name="required_args" value="target" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse><div class="input-parent"><paper-input label="Label for features" name="required_args" value="gene_exp" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content">Data matrix with input data</div></iron-collapse></div></iron-collapse></div><div class="field-group field-optional_args-group"><div class="item-header"><h3>Optional Input</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for target_type" name="optional_args" value="'binary'"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content">Target profile type</div></iron-collapse><div class="input-parent"><paper-input label="Label for n_permutations" name="optional_args" value="200"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content">Number of random permutations</div></iron-collapse><div class="input-parent"><paper-input label="Label for filepath_prefix" name="optional_args" value="'../results/kras_signature'"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content">Output files (.txt and .pdf)</div></iron-collapse><div class="input-parent"><paper-input label="Label for max_n_features" name="optional_args" value="20"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content">Max. number of features shown in heatmap</div></iron-collapse><div class="input-parent"><paper-input label="Label for random_seed" name="optional_args" value="12345"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content">Random number generation seed</div></iron-collapse></div></iron-collapse></div><div class="field-group field-returns-group"><div class="item-header"><h3>Output</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for gene_scores" name="returns" value="gene_scores" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse></div></iron-collapse></div></div><button class="form-submit-button-wrapper"><paper-button class="form-submit-button" raised="">run<iron-icon icon="assessment"></iron-icon></paper-button></button></form></div></iron-collapse></paper-material>
<!--{"Find top differentially expressed genes between KRAS mutant and control":{"description":"This is the main function used in this notebook. It computes the association between the phenotype and the gene expression profiles as described in the introduction above. At completion this function will produce a heatmap (SIGNATURE.pdf) and a text file (SIGNATURE.txt) where the genes have been sorted by their association with the phenotype as measured by the IC. The function also computes a bootstrap confidence interval for the IC (shown in parenthesis) and the p-values and False Discovery Rates (FDR) using an empirical permutations test (using n_permutations times the number of genes). The heatmap below shows the 20 genes on top (UP) and at the bottom (DOWN) of the list. The gene names are on the left of the heatmap. This computation takes a few hours and therefore it is desirable to run overnight.","library_path":"../tools/","library_name":"ccal.computational_cancer_biology.association","function_name":"make_association_panel","required_args":[{"label":"Label for target","description":"","name":"target","value":"target"},{"label":"Label for features","description":"Data matrix with input data","name":"features","value":"gene_exp"}],"default_args":[],"optional_args":[{"label":"Label for target_type","description":"Target profile type","name":"target_type","value":"'binary'"},{"label":"Label for n_permutations","description":"Number of random permutations","name":"n_permutations","value":"200"},{"label":"Label for filepath_prefix","description":"Output files (.txt and .pdf)","name":"filepath_prefix","value":"'../results/kras_signature'"},{"label":"Label for max_n_features","description":"Max. number of features shown in heatmap","name":"max_n_features","value":"20"},{"label":"Label for random_seed","description":"Random number generation seed","name":"random_seed","value":"12345"}],"returns":[{"label":"Label for gene_scores","description":"","value":"gene_scores"}]}}-->

In [28]:
%%HTML
<!--!AUTO_EXEC-->
<paper-material class="task-widget" elevation="1"><div class="task-widget-header item-header"><h2>Generate oncogenic signature</h2></div><iron-collapse class="task-widget-inner"><div class="task-widget-content item-content"><div class="widget-info">This computation selects the top 500 UP and bottom 500 DOWN genes</div><form is="iron-form" class="task-widget-form"><div class="widget-form-panel"><div class="field-group field-required_args-group"><div class="item-header"><h3>Input</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for matrix" name="required_args" value="gene_scores" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse><div class="input-parent"><paper-input label="Label for n_up_features" name="required_args" value="500" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse><div class="input-parent"><paper-input label="Label for n_dn_features" name="required_args" value="500" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse></div></iron-collapse></div><div class="field-group field-returns-group"><div class="item-header"><h3>Output</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for kras_relevant_genes" name="returns" value="kras_relevant_genes" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse></div></iron-collapse></div></div><button class="form-submit-button-wrapper"><paper-button class="form-submit-button" raised="">run<iron-icon icon="assessment"></iron-icon></paper-button></button></form></div></iron-collapse></paper-material>
<!--{"Generate oncogenic signature":{"description":"This computation selects the top 500 UP and bottom 500 DOWN genes","library_path":"../tools/","library_name":"ccal.support.simple_funcs","function_name":"extract_top_bottom_features","required_args":[{"label":"Label for matrix","description":"","name":"matrix","value":"gene_scores"},{"label":"Label for n_up_features","description":"","name":"n_up_features","value":"500"},{"label":"Label for n_dn_features","description":"","name":"n_dn_features","value":"500"}],"default_args":[],"optional_args":[],"returns":[{"label":"Label for kras_relevant_genes","description":"","value":"kras_relevant_genes"}]}}-->

In [29]:
%%HTML
<!--!AUTO_EXEC-->
<paper-material class="task-widget" elevation="1"><div class="task-widget-header item-header"><h2>Display all signature member genes in a heatmap</h2></div><iron-collapse class="task-widget-inner"><div class="task-widget-content item-content"><div class="widget-info">Make a heatmap showing the profiles of the resulting signature genes (this heatmap is shown on the left of Fig 3 in the article).</div><form is="iron-form" class="task-widget-form"><div class="widget-form-panel"><div class="field-group field-required_args-group"><div class="item-header"><h3>Input</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for dataframe" name="required_args" value="gene_exp.ix[kras_relevant_genes, :]" auto-validate="" error-message="Required!" required="required"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse></div></iron-collapse></div><div class="field-group field-optional_args-group"><div class="item-header"><h3>Optional Input</h3></div><iron-collapse><div class="field-group-content item-content"><div class="input-parent"><paper-input label="Label for normalization_axis" name="optional_args" value="1"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse><div class="input-parent"><paper-input label="Label for normalization_method" name="optional_args" value="'-0-'"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse><div class="input-parent"><paper-input label="Label for column_annotation" name="optional_args" value="[1, 1, 1, 1, 1, 1, -1, -1, -1, -1]"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse><div class="input-parent"><paper-input label="Label for title" name="optional_args" value="'KRAS Oncogenic Activation Signature'"></paper-input><paper-icon-button icon="info" class="info-toggle"></paper-icon-button></div><iron-collapse><div class="item-content"></div></iron-collapse></div></iron-collapse></div></div><button class="form-submit-button-wrapper"><paper-button class="form-submit-button" raised="">run<iron-icon icon="assessment"></iron-icon></paper-button></button></form></div></iron-collapse></paper-material>
<!--{"Display all signature member genes in a heatmap":{"description":"Make a heatmap showing the profiles of the resulting signature genes (this heatmap is shown on the left of Fig 3 in the article).","library_path":"../tools/","library_name":"ccal.support.plot","function_name":"plot_heatmap","required_args":[{"label":"Label for dataframe","description":"","name":"dataframe","value":"gene_exp.ix[kras_relevant_genes, :]"}],"default_args":[],"optional_args":[{"label":"Label for normalization_axis","description":"","name":"normalization_axis","value":"1"},{"label":"Label for normalization_method","description":"","name":"normalization_method","value":"'-0-'"},{"label":"Label for column_annotation","description":"","name":"column_annotation","value":"[1, 1, 1, 1, 1, 1, -1, -1, -1, -1]"},{"label":"Label for title","description":"","name":"title","value":"'KRAS Oncogenic Activation Signature'"}],"returns":[]}}-->