<div>
    <table style="border:2px solid white;" cellspacing="0" cellpadding="0" border-collapse: collapse; border-spacing: 0;>
      <tr> 
        <th style="background-color:white"> <img src="../media/ccal-logo-D3.png" width=225 height=225></th>
        <th style="background-color:white"> <img src="../media/logoMoores.jpg" width=175 height=175></th>
        <th style="background-color:white"> <img src="../media/GP.png" width=200 height=200></th>
        <th style="background-color:white"> <img src="../media/UCSD_School_of_Medicine_logo.png" width=175 height=175></th> 
        <th style="background-color:white"> <img src="../media/Broad.png" width=130 height=130></th> 
      </tr>
    </table>
</div>

<hr style="border: none; border-bottom: 3px solid #88BBEE;">
# **Onco-*GPS* Methodology**
## **Chapter 4. Annotating the Transcriptional Components**

<div>
    <img style="float: left" src="../media/authors.png" width=800 height=40>
</div>

**Date:** April 17, 2017


**Article:** [*Kim et al.* Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States](https://drive.google.com/file/d/0B0MQqMWLrsA4b2RUTTAzNjFmVkk/view?usp=sharing)

**Analysis overview:** In this chapter we perform a detailed analysis of the KRAS transcriptional components produced by the NMF decomposition in chapter 3 in order to assign a biological interpretation to each component. 

<div>
    <img src="../media/method_chap3.png" width=2144 height=1041>
</div>

The analysis consists of the following steps:
* Define a target profile for each component in the CCLE Reference Dataset using the amplitudes of the $H$ matrix. This matrix represents the magnitude of each NMF component per sample. 
* Using the Information Coefficient (IC) ([*Kim, J.W., Botvinnik 2016*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4868596/)) to estimate the degree of association of each component target profile and several types of genomic features.

The genomic features include the following:

1. **Mutations and Copy Number Alterations (CNA).** CCLE mutation and copy number datasets (www.broadinstitute.org/ccle, [*Barretina et al. 2012*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320027/)).
2.	 **Gene expression.** CCLE RNA Seq dataset (http://www.broadinstitute.org/ccle, [*Barretina et al. 2012*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320027/)).
3.	  **Pathway expression** (single sample GSEA of MSigDB gene sets) MSigDB v5.1 sub-collections c2, c5, c6 and h www.msigdb.org, (Liberzon et al. 2011; [*Liberzon et al. 2016. Cell Systems, 1(6), pp.417–425.*](https://www.ncbi.nlm.nih.gov/pubmed/26771021). and a few additional gene sets (see supplementary information in the article).
4.	**Transcription factors and master regulators expression** (single sample GSEA of gene sets) MSigDB v5.1, ([*Liberzon et al. 2011*](https://www.ncbi.nlm.nih.gov/pubmed/21546393)) http://www.msigdb.org, sub-collection c3 and 1,598 IPA gene sets, http://www.ingenuity.com.
5.	 **Protein expression.** CCLE Reverse Phased Protein Array (RPPA) dataset (http://www.broadinstitute.org/ccle, [*Barretina et al. 2012*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320027/)).
6.	 **Drug sensitivity.** CCLE dataset (http://www.broadinstitute.org/ccle, [*Barretina et al. 2012*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320027/))
7.	**Gene dependency.** RNAi Achilles dataset, http://www.broadinstitute.org/achilles, ([*Cowley et al. 2014*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4432652/)).

Go to the [next chapter (5)](5 Defining Cellular States and Generating Onco-GPS Map.ipynb).
Back to the [introduction chapter (0)](0 Introduction and Overview.ipynb).

<hr style="border: none; border-bottom: 3px solid #88BBEE;">
### 1. Set up notebook and import Computational Cancer Analysis Library ([CCAL](https://github.com/KwatME/ccal))

In [2]:
from environment import *

%matplotlib inline
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### 2. Read the annotation datasets table and $H$  matrix

The table describing the datasets that will be used in the annotation analysis (annotation.data_table.txt) is included in the directory "../data."

In [3]:
pd.read_csv('../data/annotation.data_table.txt', sep='\t')

Unnamed: 0,Data Name,Data Type,Emphasis,Filepath
0,drug_sensitivity,continuous,low,../data/ccle_drug_sensitivity.gct
1,gene_expression,continuous,high,../data/ccle_gene_expression.gct
2,gene_dependency,continuous,low,../data/ExpandedGeneZSolsCleaned.gct
3,mutation,binary,high,../data/ccle_mut_CNA.gct
4,pathway_expression,continuous,high,../data/ccle_pathway_expression_all.gct
5,protein_expression,continuous,high,../data/ccle_protein_expression.gct
6,regulator,continuous,high,../data/ccle_regulator.gct
7,tissue,binary,high,../data/ccle_tissue.gct


This function below will read that table and the datasets defined in the "Filepath" column

In [4]:
data_table = ccal.load_data_table('../data/annotation.data_table.txt')

Making data bundle for drug_sensitivity ...
	Loaded ../data/ccle_drug_sensitivity.gct.
Making data bundle for gene_expression ...
	Loaded ../data/ccle_gene_expression.gct.
Making data bundle for gene_dependency ...
	Loaded ../data/ExpandedGeneZSolsCleaned.gct.
Making data bundle for mutation ...
	Loaded ../data/ccle_mut_CNA.gct.
Making data bundle for pathway_expression ...
	Loaded ../data/ccle_pathway_expression_all.gct.
Making data bundle for protein_expression ...
	Loaded ../data/ccle_protein_expression.gct.
Making data bundle for regulator ...
	Loaded ../data/ccle_regulator.gct.
Making data bundle for tissue ...
	Loaded ../data/ccle_tissue.gct.


 Read the $H$ matrix produced in notebook 3

In [5]:
h_matrix = ccal.read_gct('../results/nmf_cc/nmf/nmf_k9_h.gct')

In [6]:
h_matrix

Unnamed: 0,A101D_SKIN,A172_CENTRAL_NERVOUS_SYSTEM,A204_SOFT_TISSUE,A2058_SKIN,A2780_OVARY,A375_SKIN,A498_KIDNEY,A549_LUNG,A673_BONE,A704_KIDNEY,...,WM88_SKIN,WM983B_SKIN,YAPC_PANCREAS,YD10B_UPPER_AERODIGESTIVE_TRACT,YD38_UPPER_AERODIGESTIVE_TRACT,YD8_UPPER_AERODIGESTIVE_TRACT,YH13_CENTRAL_NERVOUS_SYSTEM,YKG1_CENTRAL_NERVOUS_SYSTEM,ZR751_BREAST,ZR7530_BREAST
C1,87.390561,93.35083,393.899758,255.625865,12.230046,125.904286,15.489376,5.201425e-08,0.163371,0.001205,...,415.948914,305.351132,2248.5638,685.682262,1798.525914,104.278153,199.214356,0.3819411,4721.947367,4957.456757
C3,518.647456,7.308027,199.238501,176.804193,545.601644,391.325939,452.722609,816.735,444.89188,1664.636732,...,295.629218,559.708338,2155.739326,355.439796,302.706434,0.000576,328.560693,111.2408,357.322949,653.472173
C9,11.208338,622.861218,717.757144,3.069694,446.187704,0.000439,740.86104,604.0093,975.677929,1.341833,...,463.865834,118.997647,1890.725829,5669.601718,3901.938851,3309.332406,600.933737,1.969634e-10,457.725732,272.206384
C8,239.003422,479.108337,693.482964,478.286114,1696.595193,503.049222,269.120017,651.0322,2021.718516,372.320844,...,363.270689,482.473882,14.761625,512.785549,401.354229,23.818909,445.879251,948.7706,942.540098,814.310434
C6,7166.174137,1208.779409,1774.414891,6276.135642,1362.720325,4779.65907,838.082289,0.008212941,1611.742127,1417.850903,...,7112.681988,7116.940554,543.418521,209.558478,493.361221,24.717198,1032.356299,2382.286,285.983913,146.797171
C7,1016.009451,1672.966975,433.423134,955.066087,0.003192,1688.565623,961.965537,1294.276,45.169949,28.547109,...,156.910209,617.164602,2113.148083,1606.060845,2266.279767,1820.623602,1502.27466,1324.849,0.016081,0.001204
C5,764.479691,725.646766,517.555359,459.776995,1e-05,0.112397,4271.867752,3473.658,23.952947,5576.769071,...,952.997119,568.321975,719.250074,2.775659,541.816602,1600.013238,509.370896,245.488,505.321577,310.379079
C2,64.82965,1569.836315,3713.903057,1189.301135,4612.165638,1692.563251,1041.812928,2293.922,2825.029588,777.419387,...,737.636509,493.994389,220.968519,436.847693,112.12003,474.16624,277.577697,1885.415,1270.077326,1061.667687
C4,1048.429865,3514.982837,1599.53372,949.324498,1395.774975,1317.373754,1505.001742,860.2564,2186.326459,488.753903,...,427.337725,656.160837,0.500336,580.722452,129.765634,2539.073702,4918.55367,3186.26,940.169658,1203.297652


### 3. Find the top genomic features that match each component profile
The annotation consists of running the association analysis for each component against all the genomic datasets. Because this a double iteration over componets and feature datasets it will take hours to complete. As the program runs it will display the specific target vs. features comparison being made.

In [None]:
ccal.association.make_association_panels(
    target=h_matrix,
    data_bundle=data_table,
    dropna='all',
    target_ascending=False,
    target_prefix='',
    data_prefix='',
    target_type='continuous',
    n_jobs=1,
    n_features=20,
    n_samplings=30,
    n_permutations=50,
    random_seed=12345,
    directory_path='../results/component_annotation')

### 4. Make summary panels for selected component associations

Here we will select specific sets of top scoring features of particular biological interest for many of the components

#### 4.1  Selected associations for component C3 

The results of the association analysis shows that component C3 is the most associated with KRAS mutation status. As can be seen below in addition to KRAS mutation status, C3 is also associated with a KRAS dependency signature and with a profile of KRAS RNAi dependency from Project Achilles. These findings suggest that the transcriptional activity of KRAS represented by C3 underlies the KRAS dependence phenotype [*Singh et al. 2009*](https://www.ncbi.nlm.nih.gov/pubmed/19690457) and helps to explain the fact that KRAS mutant cancers with low C3 scores, i.e., samples on the right side of the heatmap below do not depend on KRAS for their survival.

Selected KRAS-Related Features for KRAS Component C3 (Figure 4A)

In [None]:
# Select component
component = 'C3'
target = h_matrix.ix[component, :]
target.name = 'KRAS Component C3'

# Select specific annotation features
annotation_features = {
    'mutation': {
        'index': ['KRAS_MUT'],
        'alias': ['KRAS mut']
    },
    'pathway_expression': {
        'index': ['SINGH_KRAS_DEPENDENCY_SIGNATURE_'],
        'alias': ['KRAS Dependency']
    },
    'gene_dependency': {
        'index': ['KRAS'],
        'alias': ['KRAS']
    }
}

# Load annotaion bundle
annotation_bundle = ccal.support.file.load_data_table(
    '../data/annotation.data_table.txt', annotation_features)

# Load annotation files
annotation_files = {
    'mutation':
    '../results/component_annotation/c3_vs_mutation.txt',
    'pathway_expression':
    '../results/component_annotation/c3_vs_pathway_expression.txt',
    'gene_dependency':
    '../results/component_annotation/c3_vs_gene_dependency.txt'
}

# Make summary panels
ccal.association.make_association_summary_panel(
    target,
    annotation_bundle,
    annotation_files,
    order=['mutation', 'pathway_expression', 'gene_dependency'],
    title='Selected KRAS-Related Features for KRAS Component C3',
    filepath='../results/C3.WNT.vignette.pdf')

Selected WNT-Related Features for KRAS Component C3 (Figure 4E)

In [None]:
# Select component
component = 'C3'
target = h_matrix.ix[component, :]
target.name = 'KRAS Component C3'

# Select specific annotation features
annotation_features = {
    'mutation': {
        'index': ['APC_MUT', 'CTNNB1_MUT'],
        'alias': ['APC mut', 'CTNNB1 mut']
    },
    'pathway_expression': {
        'index': ['BCAT_GDS748'],
        'alias': ['beta-catenin activation']
    },
    'gene_dependency': {
        'index': ['CTNNB1'],
        'alias': ['CTNNB1']
    }
}

# Load annotation bundle
annotation_bundle = ccal.support.file.load_data_table(
    '../data/annotation.data_table.txt', annotation_features)

# Load annotation files
annotation_files = {
    'mutation':
    '../results/component_annotation/c3_vs_mutation.txt',
    'pathway_expression':
    '../results/component_annotation/c3_vs_pathway_expression.txt',
    'gene_dependency':
    '../results/component_annotation/c3_vs_gene_dependency.txt'
}

# Make summary panels
ccal.association.make_association_summary_panel(
    target,
    annotation_bundle,
    annotation_files,
    order=['mutation', 'pathway_expression', 'gene_dependency'],
    title='Selected WNT-Related Features for KRAS Component C3',
    filepath='../results/C3.WNT.vignette.pdf')

#### 4.2  Selected associations for component C6

Component C6 is associated with other known alteration downstream of KRAS, the BRAF/MAPK pathway.  In this case, BRAF mutation status was the top hit associated with component C6 out of 37,276 genomic alterations. As can be seen below the component is also strongly associated with BRAF V600E and ETV1 activation signatures. ETV1 is a well-established transcription factor, downstream of the MAPK pathway, and further suggests that C6 indeed reflects a transcriptional program associated with MAPK activation. The heatmap below also shows the association of the component with the sensitivity profiles for 3 MAPK pathway inhibitors (PLX4720, PD318088 and selumetinib).

In [None]:
# Select component
component = 'C6'
target = h_matrix.ix[component, :]
target.name = 'KRAS Component C6'

# Select specific annotation features
annotation_features = {
    'mutation': {
        'index': ['BRAF.V600E_MUT'],
        'alias': ['BRAF V600E']
    },
    'pathway_expression': {
        'index': ['BRAF_UP', 'ETV1_UP'],
        'alias': ['BRAF Oncogenic Signature', 'ETV1 Oncogenic Signature']
    },
    'drug_sensitivity': {
        'index': ['PLX-4720', 'selumetinib', 'PD318088'],
        'alias': [
            'PLX4720 (BRAF Inhibitor)',
            'Selumetinib (MEK1 and MEK2 Inhibitor)',
            'PD318088 (MEK1 and MEK2 Inhibitor)'
        ]
    }
}

# Load annotaion bundle
annotation_bundle = ccal.support.file.load_data_table(
    '../data/annotation.data_table.txt', annotation_features)

# Load annotation files
annotation_files = {
    'mutation':
    '../results/component_annotation/c6_vs_mutation.txt',
    'pathway_expression':
    '../results/component_annotation/c6_vs_pathway_expression.txt',
    'drug_sensitivity':
    '../results/component_annotation/c6_vs_drug_sensitivity.txt'
}

# Make summary panels
ccal.association.make_association_summary_panel(
    target,
    annotation_bundle,
    annotation_files,
    title='Selected Features for KRAS Component C6',
    filepath='../results/C6.vignette.pdf')

#### 4.3  Selected associations for component C7

The component C7 is significantly associated with features representing NF-κB,  a well-established pathway downstream of KRAS. This is consistent with the results of our earlier studies of KRAS synthetic lethality ([*Barbie et al. 2009*](https://www.ncbi.nlm.nih.gov/pubmed/19847166)),and RAS-driven cytokine autocrine circuits ([*Zhu et al.* 2014](https://www.ncbi.nlm.nih.gov/pubmed/24444711)).


We show below the profiles of a gene set representing the NF-κB  motif,  an independent gene set representing p50/p65 and a profile of NF-κB protein expression. Among the genes most significantly associated with C7 was FOSL1 (FRA1) , a member of the AP-1 transcription factor family. We also observed high association of a gene set representing the targets of AP1 and the profile of the protein FRA1 pS265.

In [None]:
# Select component
component = 'C7'
target = h_matrix.ix[component, :]
target.name = 'KRAS Component C7'

# Select specific annotation features
annotation_features = {
    'regulator': {
        'index': ['GGGNNTTTCC_V$NFKB_Q6_01', 'V$AP1_Q4'],
        'alias': ['NFKB TF Targets', 'AP1 TF Targets']
    },
    'pathway_expression': {
        'index': ['HINATA_NFKB_TARGETS_FIBROBLAST_UP'],
        'alias': ['Genes Up-Regulated by p50 and p65']
    },
    'protein_expression': {
        'index': ['NF-kB-p65_pS536-R-C', 'FRA1_pS265-R-E'],
        'alias': ['NF-kB p65 pS536', 'FRA1 pS265']
    },
    'gene_expression': {
        'index': ['FOSL1'],
        'alias': ['FOSL1']
    }
}

# Load annotation bundle
annotation_bundle = ccal.support.file.load_data_table(
    '../data/annotation.data_table.txt', annotation_features)

# Load annotation files
annotation_files = {
    'regulator':
    '../results/component_annotation/c7_vs_regulator.txt',
    'pathway_expression':
    '../results/component_annotation/c7_vs_pathway_expression.txt',
    'protein_expression':
    '../results/component_annotation/c7_vs_protein_expression.txt',
    'gene_expression':
    '../results/component_annotation/c7_vs_gene_expression.txt'
}

# Make summary panels
ccal.association.make_association_summary_panel(
    target,
    annotation_bundle,
    annotation_files,
    title='Selected Features for KRAS Component C7',
    filepath='../results/C7.vignette.pdf')

#### 4.4  Selected associations for component C4 

As can be seen below component C4 appears to be associated with ZEB1/EMT.

In [None]:
# Select component
component = 'C4'
target = h_matrix.ix[component, :]
target.name = 'KRAS Component C4'

# Select specific annotation features
annotation_features = {
    'regulator': {
        'index': ['V$AREB6_03', 'IPA_ZEB1'],
        'alias': ['Targets of TCF8', 'Targets of ZEB1']
    },
    'pathway_expression': {
        'index': ['TAUBE_EMT_UP', 'GROGER_EMT_UP'],
        'alias': ['EMT Inducing TFs', 'EMT Core Gene Set']
    },
    'protein_expression': {
        'index': ['N-Cadherin-R-V', 'E-Cadherin-R-V'],
        'alias': ['N-Cadherin', 'E-Cadherin']
    }
}

# Load annotation bundle
annotation_bundle = ccal.support.file.load_data_table(
    '../data/annotation.data_table.txt', annotation_features)
# Load annotation files
annotation_files = {
    'regulator':
    '../results/component_annotation/c4_vs_regulator.txt',
    'pathway_expression':
    '../results/component_annotation/c4_vs_pathway_expression.txt',
    'protein_expression':
    '../results/component_annotation/c4_vs_protein_expression.txt'
}

# Make summary panels
ccal.association.make_association_summary_panel(
    target,
    annotation_bundle,
    annotation_files,
    title='Selected Features for KRAS Component C4',
    filepath='../results/C4.vignette.pdf')

#### 4.5  Selected associations for component C2

Component C2 is associated with MYC/E2F activation.

In [None]:
# Select component
component = 'C2'
target = h_matrix.ix[component, :]
target.name = 'KRAS Component C2'

# Select specific annotation features
annotation_features = {
    'regulator': {
        'index': ['V$E2F_02', 'V$MAX_01', 'V$MYCMAX_01', 'IPA_MYC'],
        'alias': [
            'Targets of E2F', 'Targets of MAX', 'Targets of MYC and MAX',
            'Targets of MYC'
        ]
    }
}

# Load annotaion bundle
annotation_bundle = ccal.support.file.load_data_table(
    '../data/annotation.data_table.txt', annotation_features)

# Load annotation files
annotation_files = {
    'regulator': '../results/component_annotation/c2_vs_regulator.txt',
}

# Make summary panels
ccal.association.make_association_summary_panel(
    target,
    annotation_bundle,
    annotation_files,
    title='Selected Features for KRAS Component C2',
    filepath='../results/C2.vignette.pdf')

#### 4.6  Selected associations for component C5

Component C5 is associated with patterns of overexpression and dependency of HNF1 and PAX8 across multiple cancer types including subsets of ovary [*Cheung et al. 2011*](https://www.ncbi.nlm.nih.gov/pubmed/21746896), kidney, endometrial and liver.

In [None]:
# Select component
component = 'C5'
target = h_matrix.ix[component, :]
target.name = 'KRAS Component C5'

# Select specific annotation features
annotation_features = {
    'gene_expression': {
        'index': ['PAX8', 'HNF1B'],
        'alias': ['PAX8', 'HNF1B']
    },
    'gene_dependency': {
        'index': ['PAX8', 'HNF1B'],
        'alias': ['PAX8', 'HNF1B']
    },
    'tissue': {
        'index': ['kidney', 'ovary', 'endometrium', 'liver'],
        'alias': ['Kidney', 'Ovary', 'Endometrium', 'Liver']
    }
}

# Load annotaion bundle
annotation_bundle = ccal.support.file.load_data_table(
    '../data/annotation.data_table.txt', annotation_features)

# Load annotation files
annotation_files = {
    'gene_expression':
    '../results/component_annotation/c5_vs_gene_expression.txt',
    'gene_dependency':
    '../results/component_annotation/c5_vs_gene_dependency.txt',
    'tissue':
    '../results/component_annotation/c5_vs_tissue.txt',
}

# Make summary panels
ccal.association.make_association_summary_panel(
    target,
    annotation_bundle,
    annotation_files,
    title='Selected Features for KRAS Component C5',
    filepath='../results/C5.vignette.pdf')