# Define Event

Before we can analyze the event, we need to identify the boundaries of the event. We will accomplish this by

1. Defining the values to be classified as gains and losses
2. Creating a counts table defining the various events
3. Defining the porportion of patients with gain or loss to be considered significant
4. Identifying regions of gain and loss
5. Identifying regions where all cancer types meet criteria for gain or loss event.

## Setup

In [1]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import cptac
import numpy as np
import cnvutils



## Part 1: Define Parameters

These are the parameters that must be set for each analysis.

In [None]:
# Any value lower than this will be considered a loss
COPY_NUMBER_LOSS = -0.2
# Any value above this will be considered a gain
COPY_NUMBER_GAIN = 0.2
# The porportion of patients with gain or loss of a given gene to be considered
PATIENT_CUTOFF = 0.2
# The chromosome to be analyzed (should be a string)
CHROMOSOME = '8'
# The arm to be analyzed. Options are: 'p', 'q' or 'both'.
ARM = 'both'

In [3]:
# Here you will need to load in all the cancer types to be considered. We have listed all for convenience, simply comment
# out the cancer types you do not want to consider.
cnv = {
    'BRCA': cptac.Brca().get_CNV(),
    'CCRCC': cptac.Ccrcc().get_CNV(),
#     'COLON': cptac.Colon().get_CNV(),
#     'ENDO': cptac.Endometrial().get_CNV(),
#     'GBM': cptac.Gbm().get_CNV(),
#     'HNSCC': cptac.Hnscc().get_CNV(),
#     'LSCC': cptac.Lscc().get_CNV(),
#     'LUAD': cptac.Luad().get_CNV(),
#     'OVARIAN': cptac.Ovarian().get_CNV()
}

                                          

## Part 2: Create Counts Tables

In [7]:
def get_gain_counts(row):
    return np.sum(row > COPY_NUMBER_GAIN)

In [10]:
def get_loss_counts(row):
    return np.sum(row < COPY_NUMBER_LOSS)

In [15]:
counts_list = list()
for cancer_type in cnv.keys():
    df = cnv[cancer_type].transpose()
    gain = df.apply(get_gain_counts, axis=1)
    loss = df.apply(get_loss_counts, axis=1)
    df['gain'] = gain
    df['loss'] = loss
    df['cancer'] = cancer_type
    counts_list.append(df[['gain', 'loss', 'cancer']])

In [17]:
counts = pd.concat(counts_list)

In [22]:
counts.columns

Index(['gain', 'loss', 'cancer'], dtype='object', name='Patient_ID')

## Part 3: Find for given chromosome and arm(s)

In [18]:
locations = cnvutils.get_gene_locations()

In [19]:
counts.join(locations)

TypeError: '<' not supported between instances of 'float' and 'str'

In [24]:
counts.index.get_level_values('Database_ID')

Index(['ENSG00000232512.2', 'ENSG00000249352.3', 'ENSG00000254144.2',
       'ENSG00000260682.2', 'ENSG00000271765.1', 'ENSG00000271818.1',
       'ENSG00000121410.7', 'ENSG00000148584.9', 'ENSG00000175899.10',
       'ENSG00000166535.15',
       ...
       'ENSG00000174442.11', 'ENSG00000122952.16', 'ENSG00000198205.6',
       'ENSG00000198455.4', 'ENSG00000070476.14', 'ENSG00000203995.9',
       'ENSG00000162378.12', 'ENSG00000159840.15', 'ENSG00000074755.14',
       'ENSG00000036549.12'],
      dtype='object', name='Database_ID', length=42977)

In [21]:
locations

Unnamed: 0_level_0,Unnamed: 1_level_0,chromosome,start_bp,end_bp,arm
Name,Database_ID,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
SPTLC1P3,,6,63227485.0,63227743.0,q
LLGL1,ENSG00000131899.6,17,18225635.0,18244875.0,p
DUX4L1,,4,190084412.0,190085686.0,q
RBM18,ENSG00000119446.9,9,122237622.0,122264840.0,q
KRT18P53,,X,545236.0,545352.0,p
...,...,...,...,...,...
PRPF38B,,1,108692310.0,108702928.0,p
OR7E29P,,3,125712139.0,125713045.0,q
TK1,ENSG00000167900.11,17,78174091.0,78187233.0,q
U52112.2,,X,153872221.0,153909223.0,q
