# Diabetes: Correlation with protein abundance of IRS1 and PIK3CA 

### The purpose of this analysis is to find out if diabetes is correlated with mutations in PIK3CA. 

### IRS1 and PIK3CA are part of a pathway that help maintain insulin levels. Studies have shown that lack of IRS1 tend to lead to diabetes (https://www.sciencedirect.com/science/article/pii/S0968000404002932)

<b> Standard imports for playing with and plotting data frames. </b>

In [76]:
import pandas as pd
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns

<b> Import CPTAC data </b>

In [77]:
import CPTAC

In [78]:
somatic_mutations = CPTAC.get_somatic()
proteomics = CPTAC.get_proteomics()
#proteomics = proteomics[:100]
phos = CPTAC.get_phosphoproteomics()
clinical = CPTAC.get_clinical()

## Create a database of IRS1 protein abundance and diabetic information. We can do that by merging columns from the clinical and proteomics dataframes.

In [79]:
diabetic_data = CPTAC.compare_clinical(clinical, proteomics, "Diabetes")
diabetic_data = diabetic_data[['Diabetes', 'IRS1']]
diabetic_data = diabetic_data[:100]

## Divide data into two groups: diabetic and non-diabetic patients

In [80]:
diabetic = diabetic_data.loc[diabetic_data["Diabetes"] == "Yes"]
non_diabetic = diabetic_data.loc[diabetic_data["Diabetes"] == "No"]

## Perform a t test to compare IRS1 abundance levels 

In [81]:
pcutoff = 0.05
protein = 'IRS1'
ttest = scipy.stats.ttest_ind(diabetic[protein], non_diabetic[protein])
if(ttest[1] <= pcutoff):
    sigResults.append(protein)
    print("Test for " + protein + ": ")
    print(ttest) 
else: 
    print("Unsignificant results: ")
    print(ttest)

Unsignificant results: 
Ttest_indResult(statistic=-0.26829426355707425, pvalue=0.7890484901861422)


# PIK3CA mutation correlation with diabetes

## Create a database of PIK3CA protein abundance and diabetic information.

In [82]:
diabetic_data = CPTAC.compare_clinical(clinical, proteomics, "Diabetes")
diabetic_data = diabetic_data[['Diabetes', 'PIK3CA']]
diabetic_data = diabetic_data[:100]

## Divide data into two groups: diabetic and non-diabetic patients

In [83]:
diabetic = diabetic_data.loc[diabetic_data["Diabetes"] == "Yes"]
non_diabetic = diabetic_data.loc[diabetic_data["Diabetes"] == "No"]

## Perform a t test to compare IRS1 abundance levels 

In [84]:
pcutoff = 0.05
protein = 'PIK3CA'
ttest = scipy.stats.ttest_ind(diabetic[protein], non_diabetic[protein])
if(ttest[1] <= pcutoff):
    sigResults.append(protein)
    print("Test for " + protein + ": ")
    print(ttest) 
else: 
    print("Unsignificant results: ")
    print(ttest)

Unsignificant results: 
Ttest_indResult(statistic=-1.2806933282044277, pvalue=0.2033868977823787)


## Conclusion: The data does not seem to show a significant difference in protein abundance levels for IRS1 or PIK3CA