# Correlations with Repair Genes

Here we will see if our event correlates with any of the following known DNA repair genes:

* **TP53** tumor suppressor, regulates cell division
* **KMT2C** histone methyltransferase, regulates gene transcription by modifying chromatin structure
* **ATM** responds to DNA damage by phosphorylating key substrates involved in DNA repair
* **BAP1** Binds BRCA1, deubiquitinating enzyme
* **BRCA2** deficiency known to impair completion of cell division by cytokinesis
* **POLQ** contains RAD51 binding motifs, blocks RAD51-mediated recombination
* **PRKDC** involved in DNA NHEJ during DNA double-strand break repair
* **ATR** regulator of genomic integrity, controls and coordinates DNA-replication origin firing, replication-fork stability, cell-cycle checkpoints, and DNA repair
* **BRCA1** critical roles in DNA repair, cell cycle checkpoint control, and maintenance of genomic stability
* **MSH6** part of a large multisubunit protein complex of tumor suppressors
* **POLE** catalytic subunit of DNA polymerase epsilon, involved in DNA repair
* **FANCM** direct role in DNA processing
* **TP53BP1** binds to WT p53, but not mutatant p53, also regulates ATM-dependent phosphorylation events
* **CDK12** forms protein complexes with cyclin K, regulates expression of long complex genes, several of which have roles in DNA maintenance and repair
* **FANCD2** links FA and ATM damage response pathways
* **SLX4** critical scaffold element for the assembly of a multiprotein complex containing enzymes involved in DNA maintenance and repair

## Setup

In [1]:
import cptac
import pandas as pd
import numpy as np
import scipy.stats as stats
import statsmodels.stats.multitest



In [2]:
EVENT_COLUMN = 'loss_event'

In [3]:
cancer_types = {
    'brca': cptac.Brca(),
    'ccrcc': cptac.Ccrcc(),
    'colon': cptac.Colon(),
    'endo': cptac.Endometrial(),
    'gbm': cptac.Gbm(),
    'hnscc': cptac.Hnscc(),
    'luad': cptac.Luad(),
    'lscc': cptac.Lscc(),
    'ovarian': cptac.Ovarian()
}

Checking that hnscc index is up-to-date...      



Checking that luad index is up-to-date... 



Checking that ovarian index is up-to-date...



                                            

In [4]:
dna_repair_genes = ['TP53', 'KMT2C', 'ATM', 'BAP1', 'BRCA2', 'POLQ', 'PRKDC', 'ATR', 'BRCA1', 'MSH6', 'POLE', 'FANCM', 'TP53BP1', 'CDK12', 'FANCD2', 'SLX4']

## Run Fisher Tests

In [5]:
repair_gene_dict = dict()
cancers = list()
genes = list()
pvalues = list()
odds_list = list()
for cancer in cancer_types.keys():
    has_event = pd.read_csv(f'{cancer}_has_event.tsv', sep='\t', index_col=0)
    mutations = cancer_types[cancer].get_somatic_mutation()
    drivers = mutations[mutations.Gene.isin(dna_repair_genes)]
    drivers['has_mutation'] = True
    driver_table = pd.pivot_table(drivers, index=drivers.index, values='has_mutation', columns=['Gene']).fillna(False)
    joined = has_event.join(driver_table).fillna(False)
    for gene in dna_repair_genes:
        if gene in joined.columns:
            event_table = pd.crosstab(joined['loss_event'], joined[gene])
            try:
                odds, pvalue = stats.fisher_exact(event_table)
                odds_list.append(odds)
                pvalues.append(pvalue)
                genes.append(gene)
                cancers.append(cancer)
            except:
                genes.append(gene)
                cancers.append(cancer)
                odds_list.append(None)
                pvalues.append(None)
results = pd.DataFrame({"gene": genes,
                        "cancer": cancers,
                        "pvalue": pvalues,
                        "odds": odds_list})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice 

In [6]:
results

Unnamed: 0,gene,cancer,pvalue,odds
0,TP53,brca,0.000076,13.783784
1,KMT2C,brca,0.591559,0.000000
2,ATM,brca,1.000000,0.000000
3,BRCA2,brca,1.000000,0.000000
4,POLQ,brca,0.455765,2.125000
...,...,...,...,...
120,FANCM,ovarian,1.000000,0.000000
121,TP53BP1,ovarian,0.570965,0.000000
122,CDK12,ovarian,1.000000,0.583333
123,FANCD2,ovarian,0.439394,3.083333


## Correct Pvalues

In [7]:
# Correct for Multiple Tests
corrected_data = pd.DataFrame()
for cancer in results.cancer.unique():
    cancer_results = results[results.cancer == cancer]
    pvalues = cancer_results.pvalue
    corrected = statsmodels.stats.multitest.multipletests(pvalues, method='fdr_bh')
    cancer_results['pvalues'] = corrected[1]
    if corrected_data.empty:
        corrected_data = cancer_results
    else:
        corrected_data = pd.concat([corrected_data, cancer_results])


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
  reject = pvals_sorted <= ecdffactor*alpha
  pvals_corrected[pvals_corrected>1] = 1
  pvals_corrected[pvals_corrected>1] = 1


In [8]:
corrected_data[corrected_data.pvalues < 0.1 ]

Unnamed: 0,gene,cancer,pvalue,odds,pvalues
0,TP53,brca,7.6e-05,13.783784,0.000985
39,TP53,endo,1.3e-05,44.923077,0.000209
96,KMT2C,lscc,0.001751,9.428571,0.026259
