<hr style="border: none; border-bottom: 3px solid #88BBEE;">

# **Onco-*GPS* Methodology**
## **Chapter 0. Introduction and Overview** 

**Authors:** William Kim$^{1}$, Huwate (Kwat) Yeerna$^{2}$, Taylor Cavazos$^{2}$, Kate Medetgul-Ernar$^{2}$, Clarence Mah$^{3}$, Stephanie Ting$^{2}$, Jason Park$^{2}$, Jill P. Mesirov$^{2, 3}$ and Pablo Tamayo$^{2,3}$.

1. Eli and Edythe Broad Institute      
2. UCSD Moores Cancer Center
3. UCSD School of Medicine 

**Date:** April 17, 2017

**Article:** [*Kim et al.* Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States](https://drive.google.com/file/d/0B0MQqMWLrsA4b2RUTTAzNjFmVkk/view?usp=sharing)

**Analysis overview:**

In this series of notebook chapters, we introduce Onco-*GPS* (OncoGenic Positioning System), a data-driven analysis framework and associated experimental and computational methodology that makes use of an oncogenic activation signature to identify multiple cellular states associated with oncogene activation. 

The Onco-GPS methodology decomposes an oncogenic activation signature  into its constituent components in such way that the context dependencies and different modalities of oncogenic activation are made explicit and taken into account. Once characterized and annotated, these components are used to deconstruct and define cellular states, and to map individual samples onto a novel visual paradigm: a two-dimensional Onco-*GPS* “map.” This resulting model facilitates further molecular characterization and provides an effective analysis and summarization tool that can be applied to explore complex oncogenic states.


The Onco-*GPS* approach is executed in 3 major modular steps as shown in the Figure below. 

<img src="../media/method_chap0.png" width=2144 height=1041>

Step I involves the experimental generation of a representative gene expression signature reflecting the activation of an oncogene of interest. In step II, the resulting signature is decomposed into a set of coherent transcriptional components using a large reference dataset that represents multiple cellular states relevant to the oncogene of interest. These components are also biologically annotated and characterized through further analysis and experimental validation (see article). In step III, a representative subset of samples and components are selected to define cellular states using a clustering procedure. The selected components are also used as transcriptional coordinates to generate a two-dimensional map where the selected individual samples are projected relative to these transcriptional coordinates in analogy to a geographical *GPS* system as shown below.

<img src="../media/GPS.png" width=500 height=500>
 
The Onco-*GPS* map can also be used to display the association of samples with various genomic features, such as genetic lesions, pathway activation, individual gene expression, genetic dependencies and drug sensitivities. We will use the Onco-*GPS* approach to explore the complex functional landscape of cancer cell lines with alterations in the RAS/MAPK pathway. 


**The Onco-GPS methodology is organized in a series of 8 chapters:**

Chapter 1: [**Generating Oncogenic Activation Signature**](1 Generating Oncogenic Activation Signature.ipynb). This chapter shows how to generate the oncogenic signature (step 1 above). This is useful if one is interested in creating an Onco-GPS map for a given oncogene (for which one has a dataset or at least a gene set representing its activation).

Chapter 2: [**Decomposing Signature and Defining Transcriptional Components**](2 Decomposing Signature and Defining Transcriptional Components.ipynb). This chapter shows how to take the oncogenic signature from chapter 1, or any other signature or gene set of interest, and decomposed it into transcriptional components using Non-Negative Matrix Factorization (NMF).

Chapter 3: [**Annotating the Transcriptional Components**](3 Annotating the Transcriptional Components.ipynb). This chapter annotates, or characterizes, the transcriptional components found in chapter 2 by matching many types of genomic features to the component profiles (i.e. the rows of the "H" matrix generated in chapter 2). The full results sets produced by this analysis are also stored under the directory "../results" in subfolder: component_annotation.

Chapter 4: [**Defining Cellular States and Generating Onco-GPS Map**](4 Defining Cellular States and Generating Onco-GPS Map.ipynb). This chapter defines the oncogenic states by clustering the KRAS mutant subset of  the "H" matrix obtained in chapter 2. It also defines a triangular or ternary Onco-GPS map using components C1, C7 and C2, and then projects the KRAS mutant samples on it.

Chapter 5: [**Annotating the Oncogenic States**](5 Annotating the Oncogenic States.ipynb). This chapter is similar to chapter 3 but it annotates and characterizes the oncogenic states defined in chapter 4. The full results sets produced by this analysis are also stored under the directory "../results" in subfolder: state_annotation.

Chapter 6: [**Displaying Selected Genomic Features in the KRAS mut Onco-GPS Map**](6 Displaying Selected Genomic Features in the KRAS mut Onco-GPS Map.ipynb). This chapter displays selected genomic features of interest on the KRAS mutants Onco-GPS map including gene, protein and pathway expression, mutations, tissue types etc.

Chapter 7: [**Defining Global Cellular States and Onco-GPS Map**](7 Defining Global Cellular States and Onco-GPS Map.ipynb). This chapter defines the global oncogenic states (S1-S15) and corresponding Onco-GPS map using all the KRAS components (C1-C9) defined in chapter 2.

Chapter 8: [**Displaying Genomic Features in the Global Onco-GPS Map**](8 Displaying Genomic Features in the Global Onco-GPS Map.ipynb).  This chapter displays selected genomic features of interest on the global Onco-GPS map including gene, protein and pathway expression, mutations, tissue types etc.

**Additional Notes on Using the Notebooks/Chapters**

*  To reproduce the entire analysis one runs the 8 chapters in sequence. If one is interested in applying the methodology to a different oncogene, one would start by generating the oncogenic signature (chapter 1) using an appropriate dataset e.g. one that you generate in your laboratory, one taken from the literature, or a relevant gene set.      

* If one is interested  in exploring the original KRAS mutant or the global Onco-GPS presented in the article, e.g. display your favorite gene mRNA or mutations status, you would go directly to chapters 6 or 8 and modify these chpaters to display the gene or feature of interest.      

* The chapters (notebooks) are organized as a *notebook package (NB),* a collection of subfolders that contains the following subfolders:

 1.  **notebooks:** contains the Jupyter notebooks corresponding to each chapter (0-8). An additional **environment.py** notebook help to set up the environemnt and is imported by each notebook.

 2.  **data:** contains the input data to the notebooks.
 
 3. **results:** contains the intermediate and final results produced by the chapters (notebooks).

 3.  **tools:** the analysis libraries and source code that implements the Onco-GPS method.    

 4. **media:** the images, logos and other supplementary files used by the notebooks.
 

* The analysis in most chapters will run in under a couple of hours of computer execution time. However, because chapters 3 and 5 execute a full annotation sweep using all components and all states against many datasets of genomic features they could take a few days of computer time to execute.      




