# RNA cis-effects with t tests for 8q

Here we look for genes IN the event discussed that are affected by the arm-level event. We will find these effects by performing a series of t-tests comparing the transcriptomics values of the patients with the event against those without the event.

## Setup (Install necessary packages)

We will start by importing necessary packages and collecting all of the transcriptomics data we will need to run the tests. The cancer types analyzed should have been determined in 01_event_basic_info where we determine which types of cancer seem to have the event we are looking at.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import cptac
from scipy import stats
import cnvutils
import cptac.utils

In [2]:
# These variables specify which chromosome and arm we're working on, and whether to do cis or trans
CHROMOSOME = '8'
ARM = 'q'
CIS_OR_TRANS = "cis"

In [3]:
if ARM == "p":
    EVENT_COLUMN = "loss_event"
    EXCLUDE_COLUMN = "gain_event"
    EVENT_START = 0
    EVENT_END = 30794385
    
elif ARM == "q":
    EVENT_COLUMN = "gain_event"
    EXCLUDE_COLUMN = "loss_event"
    EVENT_START = 80794385
    EVENT_END = 130794385

else:
    raise ValueError("Invalid value for ARM variable.")

In [4]:
cancer_types = {
    "brca": cptac.Brca,
    "colon": cptac.Colon,
    "hnscc": cptac.Hnscc,
    "lscc": cptac.Lscc,
    "luad": cptac.Luad,
    "ovarian": cptac.Ovarian
}

## Select the RNAs we're interested in

If we're looking at cis effects, we select RNAs within the event. If we're looking at trans effects, we select RNAs outside of the event.

In [5]:
selected_genes = cnvutils.\
get_event_genes(
    chrm=CHROMOSOME,
    event_start=EVENT_START,
    event_end=EVENT_END,
    cis_or_trans=CIS_OR_TRANS
)["Name"].\
drop_duplicates(keep="first")

def load_transcriptomics_tumor(dataset_func):
    return dataset_func().get_transcriptomics(tissue_type="tumor")

transcriptomics = {}

for cancer_type in cancer_types.keys():
    df = load_transcriptomics_tumor(cancer_types[cancer_type]).transpose()

    if df.index.nlevels == 1:
        df = df[df.index.isin(selected_genes)]
    else:
        df = df[df.index.isin(selected_genes, level="Name")]

    transcriptomics[cancer_type] = df

Checking that luad index is up-to-date... 



                                            

## Append Event Data

We now append the data from the event table that should have been created in a previous notebook.

In [6]:
has_event = dict()
for cancer_type in transcriptomics.keys():
    df = transcriptomics[cancer_type]
    df = df.transpose()
    event = pd.read_csv(f'{cancer_type}_has_event.tsv', sep='\t', index_col=0)
    if EXCLUDE_COLUMN:
        event.drop(EXCLUDE_COLUMN, axis=1, inplace=True)
    event.index.rename('Name')
    df = df.join(event)
    df = df.dropna(subset=[EVENT_COLUMN])
    has_event[cancer_type] = df[EVENT_COLUMN]
    transcriptomics[cancer_type] = df

## Run T-Tests

In [7]:
results_df = None
for cancer_type in transcriptomics.keys():
    prot_df = transcriptomics[cancer_type]
    results = cptac.utils.wrap_ttest(
        df=prot_df, 
        label_column=EVENT_COLUMN,
        correction_method="fdr_bh",
        return_all=True,
        quiet=True
    )   
    results.set_index('Comparison', inplace=True)
    if isinstance(results.index[0], tuple):
        results[['Name', f'{cancer_type}_Database_ID']] = pd.DataFrame(
            results.index.values.tolist(),
            index=results.index
        )
        results.set_index(['Name', f'{cancer_type}_Database_ID'], inplace=True)
    else:
        results.index.name='Name'
    results.rename(columns={'P_Value': f'{cancer_type}_pvalue'}, inplace=True)
    if results_df is None:
        results_df = results
    else:
        results_df = results_df.join(results)

## Append Difference Data

We want to collect one more piece of information about the data. We want to find the difference between the averages of the two groups. This will tell us if there is a positive or negative coorelation between the event and the proteomic data and can give us some indication of the strength of the coorelation. 

In [8]:
def get_diff(col, event):
    has_event = col[event]
    invert_list = [not x for x in event]
    no_event = col[invert_list]
    event_avg = has_event.mean()
    no_event_avg = no_event.mean()
    return event_avg - no_event_avg

In [9]:
diff_df = None
for cancer_type in transcriptomics.keys():
    df = transcriptomics[cancer_type]
    df = df.drop(EVENT_COLUMN, axis=1)
    results = df.apply(lambda x: get_diff(x, has_event[cancer_type]))
    df = pd.DataFrame(results)
    if isinstance(df.index[0], tuple):
        df[['Name', f'{cancer_type}_Database_ID']] = pd.DataFrame(df.index.values.tolist(), index=df.index)
        df.set_index(['Name', f'{cancer_type}_Database_ID'], inplace=True)
    else:
        df.index.name='Name'
    df.rename(columns={0: f'{cancer_type}_diff'}, inplace=True)
    if diff_df is None:
        diff_df = df
    else:
        diff_df = diff_df.join(df)

## Join the tables and save

We now join the difference table and the results table together. We also save the table to a tsv for use in future analyses. 

In [10]:
results_df = results_df.join(diff_df)

In [11]:
results_df

Unnamed: 0_level_0,brca_pvalue,colon_pvalue,hnscc_pvalue,lscc_pvalue,luad_pvalue,ovarian_pvalue,brca_diff,colon_diff,hnscc_diff,lscc_diff,luad_diff,ovarian_diff
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
MIR7705,7.020736e-11,,,,,,1.215883,,0.000000,,,
PABPC1,7.020736e-11,1.972736e-09,1.460832e-02,8.253546e-04,6.230831e-05,,1.215883,0.701779,0.290728,0.463542,0.661256,
RPL30,1.416907e-08,4.067977e-06,4.628068e-07,3.417095e-03,6.348469e-04,0.043776,0.945084,0.576439,0.405452,0.331225,0.428509,943.968018
YWHAZ,1.825122e-08,2.512327e-05,3.463188e-04,9.965026e-06,1.193697e-03,0.013890,1.053275,0.367394,0.436486,0.481283,0.400134,91.237246
DCAF13,2.554904e-08,1.286945e-06,7.027431e-08,3.393859e-07,5.708224e-05,0.028446,1.239230,0.593220,0.540716,0.521040,0.564146,8.818968
PTDSS1,1.006822e-07,1.476374e-09,6.140054e-06,5.786716e-03,1.247156e-05,0.000051,1.013563,0.532508,0.462051,0.427965,0.444150,19.412090
UBR5,1.884289e-07,5.007370e-06,1.894506e-07,9.820648e-09,1.193697e-03,0.061988,0.927482,0.559859,0.373194,0.485821,0.345186,5.678937
EMC2,6.327708e-07,2.224229e-06,2.420456e-07,8.154749e-06,6.601363e-05,0.007477,0.794761,0.475144,0.428597,0.395411,0.413738,20.622266
LAPTM4B,6.327708e-07,3.645197e-05,3.685183e-04,1.453619e-05,7.010932e-02,0.001166,1.734389,0.976726,0.494540,0.695947,0.479058,264.133367
SLC25A32,6.327708e-07,8.036730e-06,2.531341e-06,8.991207e-04,6.794965e-05,0.006284,0.875555,0.438432,0.367432,0.265882,0.395868,2.923498


In [12]:
# This will save the resulting table in the same directory as this notebook.
# Modify if you would like to save to a different location.
results_df.to_csv(f"{CHROMOSOME}{ARM}_{CIS_OR_TRANS}RNAeffects_ttest.tsv", sep='\t')