# MUC Concurrent Mutations and Clinical Outcomes

### Standard imports, including CPTAC package

In [1]:
import pandas as pd
import numpy as np
import re
import scipy.stats
import statsmodels.stats.multitest
import matplotlib.pyplot as plt
import seaborn as sns
import CPTAC.Ovarian as CPTAC

Welcome to the CPTAC data service package. Available datasets may be
viewed using CPTAC.list(). In order to access a specific data set,
import a CPTAC subfolder using either 'import CPTAC.Dataset' or 'from
CPTAC import Dataset'.
******
Version: 0.2.5
******
Loading Ovarian CPTAC data:
Loading somatic_19 data...
Loading clinical data...
Loading cnv data...
Loading proteomics data...
Loading transcriptomics data...
Loading somatic_38 data...
Loading phosphoproteomics data...

 ******PLEASE READ******
CPTAC is a community resource project and data are made available
rapidly after generation for community research use. The embargo
allows exploring and utilizing the data, but the data may not be in a
publication until June 1, 2019. Please see
https://proteomics.cancer.gov/data-portal/about/data-use-agreement or
enter embargo() to open the webpage for more details.


## Set up initial dataframes and variables

In [118]:
clinical = CPTAC.get_clinical()
pd.options.display.max_columns = None
display(clinical)

Unnamed: 0_level_0,Short Title,Event,Modified Time,Modified By,Status,CRF Name,Date of Last Contact (Do not answer if patient is deceased),Vital Status (at time of last contact),Date of Death,Tumor Status at Time of Last Contact or Death,Was a Review of the Initial Pathological Findings Done?,Was the Pathology Review consistent with the Diagnosis?,Adjuvant (Post-Operative) Radiation Therapy,Adjuvant (Post-Operative) Pharmaceutical Therapy,Adjuvant (Post-Operative) Immunotherapy,Adjuvant (Post-Operative) Hormone Therapy,Adjuvant (Post-Operative) Targeted Molecular Therapy,Measure of Success of Outcome at the Completion of Initial First Course Treatment (surgery and adjuvant therapies),New Tumor Event After Initial Treatment?,Type of New Tumor Event,Anatomic Site of New Tumor Event,Other Site of New Tumor Event or Lymph Node Location,Date of New Tumor Event,Method Of Diagnosis of New Tumor Event,Other Method Of Diagnosis of New Tumor Event,Additional Surgery for New Tumor Event,Date of Additional Surgery for New Tumor Event,Additional Chemotherapy Treatment of New Tumor Event,Additional Immunotherapy Treatment of New Tumor Event,Additional Hormone Therapy Treatment of New Tumor Event,Additional Targeted Molecular Therapy Treatment of New Tumor Event,Radiation Type,Location of Radiation Treatment,Number of Days from Date of Initial Pathologic Diagnosis to the Date Radiation Therapy Started,Total Dose,Units,Total Number of Fractions,Radiation Treatment Ongoing,Number of Days from Date of Initial Pathologic Diagnosis to the Date Radiation Therapy Ended,Measure of Best Response of Radiation Treatment,Was Patient Treated on a Clinical Trial?,Drug Name (Brand or Generic),Clinical Trial Drug Classification,Pharmaceutical Type,Number of Days from Date of Initial Pathologic Diagnosis to Date of Therapy Start,Therapy Ongoing,Number of Days from Date of Initial Pathologic Diagnosis to Date of Therapy End,Measure of Best Response of Pharmaceutical Treatment,What Type of Malignancy was This?,Primary Site of Disease,Laterality of the Disease,Histological Type,Number of Days from Date of Initial Pathologic Diagnosis of the Tumor Submitted for CPTAC to the Date of Initial Diagnosis of Other Malignancy,Did the patient have surgery for this malignancy?,Type of Surgery,Number of Days from Date of Initial Pathologic Diagnosis of the Tumor Submitted for CPTAC to the Date of Surgical Resection for this Other Malignancy,Did the patient receive pharmaceutical therapy for this malignancy?,Extent of Pharmaceutical Therapy,Drug Name(s) (Brand or Generic),Number of Days from Date of Initial Pathologic Diagnosis of the Tumor Submitted for CPTAC to Date Pharmaceutical Therapy Started for this Other Malignancy,Did the patient receive radiation therapy for this malignancy?,Extent of Radiation Therapy,"If the patient received locoregional radiation, was the radiation therapy received in the same field as the tumor submitted for CPTAC?",Number of Days from Date of Initial Pathologic Diagnosis of the Tumor Submitted to CPTAC to Date Radiation Therapy Started for this Other Malignancy,Was the patient staged using FIGO?,FIGO Staging System (Gynecologic Tumors Only),FIGO Stage,Was the patient staged using AJCC?,AJCC Cancer Staging Edition,Pathologic Spread: Primary Tumor (pT),Pathologic Spread: Lymph Nodes (pN),Distant Metastases (M),AJCC Tumor Stage,patient_key
PPID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1
01OV002,CPTAC-WU-01OV,Collection,4/22/2016 0:00,"Khandekar, Divya",Locked,One Year,3/18/2015,Living,,Tumor free,Yes,Yes,No,Yes,No,No,No,Complete Response,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S1
01OV007,CPTAC-WU-01OV,Collection,4/22/2016 0:00,"Khandekar, Divya",Locked,One Year,3/10/2015,Living,,Tumor free,Yes,Yes,No,Yes,No,No,No,Complete Response,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S2
01OV008,CPTAC-WU-01OV,Collection,4/22/2016 0:00,"Khandekar, Divya",Locked,One Year,3/10/2015,Living,,Tumor free,Yes,Yes,No,Yes,No,No,No,Complete Response,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S3
01OV010,CPTAC-WU-01OV,Collection,4/22/2016 0:00,"Khandekar, Divya",Locked,One Year,4/15/2014,Living,,Unknown tumor status,Yes,Yes,No,No,No,No,No,Unknown,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S4
01OV013,CPTAC-WU-01OV,Collection,4/22/2016 0:00,"Khandekar, Divya",Locked,One Year,5/26/2015,Living,,Tumor free,Yes,Yes,No,Yes,No,No,No,Complete Response,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S5
01OV017,CPTAC-WU-01OV,Collection,4/22/2016 0:00,"Khandekar, Divya",Locked,One Year,5/22/2015,Living,,Tumor free,Yes,Yes,No,Yes,No,No,No,Complete Response,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S6
01OV018,CPTAC-WU-01OV,Collection,4/22/2016 0:00,"Khandekar, Divya",Locked,One Year,5/21/2015,Living,,With tumor,Yes,Yes,No,Yes,No,No,No,Complete Response,Yes,Metastatic,Other (specify);,"vaginal cuff, right hepatorenal recess, liver,...",3/17/2015,Convincing imaging,,No,,Yes,No,No,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S7
01OV019,CPTAC-WU-01OV,Collection,4/22/2016 0:00,"Khandekar, Divya",Locked,One Year,5/5/2015,Living,,Tumor free,Yes,Yes,No,Yes,No,No,No,Complete Response,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S8
01OV023,CPTAC-WU-01OV,Collection,4/22/2016 0:00,"Khandekar, Divya",Locked,One Year,8/25/2015,Living,,Unknown tumor status,Yes,Yes,No,Yes,No,No,No,Partial Response,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S9
01OV024,CPTAC-WU-01OV,Collection,7/13/2015 0:00,"Guo, Lei",Active,One Year,7/18/2014,Living,,Unknown tumor status,Yes,Yes,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,S10


In [110]:
gene='MUC4'
clinical_outcome = 'Measure of Success of Outcome at the Completion of Initial First Course Treatment (surgery and adjuvant therapies)'
somatic = CPTAC.get_somatic_mutations()
proteomics = CPTAC.get_proteomics()
phos = CPTAC.get_phosphoproteomics()
transcriptomics = CPTAC.get_transcriptomics()
outcome = pd.DataFrame(clinical[clinical_outcome])

## MUC4 Mutations Only

In [111]:
muc4_mut_prot = CPTAC.compare_mutations(proteomics, gene)
muc4_mut_prot['Concurrent_Mutations'] = 'Single'
muc4_mut_prot.loc[muc4_mut_prot['Mutation'] == 'Wildtype_Tumor', 'Concurrent_Mutations'] = 'Wildtype'
muc4_mut_prot = muc4_mut_prot.loc[muc4_mut_prot['Mutation'] != 'Wildtype_Normal']
combined = pd.concat([outcome, muc4_mut_prot], axis=1, sort=False)
combined = combined[[clinical_outcome, 'Concurrent_Mutations']].dropna()
clinical_table = pd.crosstab(combined[clinical_outcome], combined['Concurrent_Mutations'])
print(clinical_table)
chi_test = scipy.stats.chi2_contingency(observed = clinical_table)
chi_test

Concurrent_Mutations                                Single  Wildtype
Measure of Success of Outcome at the Completion...                  
Complete Response                                       17        29
Not Reported/ Unknown                                    2         4
Partial Response                                         2         4
Progressive Disease                                      4         3
Stable Disease                                           7         6
Unknown                                                  1         0


(3.6429553957909353, 0.6018754874151304, 5, array([[19.21518987, 26.78481013],
        [ 2.50632911,  3.49367089],
        [ 2.50632911,  3.49367089],
        [ 2.92405063,  4.07594937],
        [ 5.43037975,  7.56962025],
        [ 0.41772152,  0.58227848]]))

## Combination of mutations in MUC4 and MUC17

## Find patients that have both mutations

In [112]:
second_gene = 'MUC17'
muc4_mut = CPTAC.compare_mutations(proteomics, 'CDH2', gene)
muc16_mut = CPTAC.compare_mutations(proteomics, 'CDH2', second_gene)
new_mutation = pd.DataFrame()
new_mutation[gene + ' Mutation'] = muc4_mut['Mutation']
new_mutation[second_gene + ' Mutation'] = muc16_mut['Mutation']
new_mutation['CDH2'] = muc4_mut['CDH2']
new_mutation = new_mutation.loc[new_mutation[gene + ' Mutation'] != 'Wildtype_Normal']
new_mutation = new_mutation.loc[new_mutation[gene + ' Mutation'] != 'Wildtype_Tumor'].loc[new_mutation[second_gene + ' Mutation'] != 'Wildtype_Tumor']
patients_both_mut_17 = new_mutation.index
patients_both_mut_17

Index(['02OV005', '02OV006', '04OV004', '04OV036', '17OV002', '17OV010',
       '17OV029', '17OV030', '17OV039'],
      dtype='object', name='patient_id')

In [113]:
muc4_mut_prot = CPTAC.compare_mutations(proteomics, gene)
muc4_mut_prot['Concurrent_Mutations'] = 'Single'
muc4_mut_prot.loc[patients_both_mut_17, 'Concurrent_Mutations'] = 'Both'
muc4_mut_prot.loc[muc4_mut_prot['Mutation'] == 'Wildtype_Tumor', 'Concurrent_Mutations'] = 'Wildtype'
muc4_mut_prot = muc4_mut_prot.loc[muc4_mut_prot['Mutation'] != 'Wildtype_Normal']
combined = pd.concat([outcome, muc4_mut_prot], axis=1, sort=False)
combined = combined[[clinical_outcome, 'Concurrent_Mutations']].dropna()
clinical_table = pd.crosstab(combined[clinical_outcome], combined['Concurrent_Mutations'])
print(clinical_table)
chi_test = scipy.stats.chi2_contingency(observed = clinical_table)
chi_test

Concurrent_Mutations                                Both  Single  Wildtype
Measure of Success of Outcome at the Completion...                        
Complete Response                                      4      13        29
Not Reported/ Unknown                                  0       2         4
Partial Response                                       1       1         4
Progressive Disease                                    2       2         3
Stable Disease                                         2       5         6
Unknown                                                0       1         0


(7.093671547594042,
 0.7165731875236268,
 10,
 array([[ 5.24050633, 13.97468354, 26.78481013],
        [ 0.6835443 ,  1.82278481,  3.49367089],
        [ 0.6835443 ,  1.82278481,  3.49367089],
        [ 0.79746835,  2.12658228,  4.07594937],
        [ 1.48101266,  3.94936709,  7.56962025],
        [ 0.11392405,  0.30379747,  0.58227848]]))

## MUC16 results

## Find patients that have both mutations

In [114]:
second_gene = 'MUC16'
muc4_mut = CPTAC.compare_mutations(proteomics, 'CDH2', gene)
muc16_mut = CPTAC.compare_mutations(proteomics, 'CDH2', second_gene)
new_mutation = pd.DataFrame()
new_mutation[gene + ' Mutation'] = muc4_mut['Mutation']
new_mutation[second_gene + ' Mutation'] = muc16_mut['Mutation']
new_mutation['CDH2'] = muc4_mut['CDH2']
new_mutation = new_mutation.loc[new_mutation[gene + ' Mutation'] != 'Wildtype_Normal']
new_mutation = new_mutation.loc[new_mutation[gene + ' Mutation'] != 'Wildtype_Tumor'].loc[new_mutation[second_gene + ' Mutation'] != 'Wildtype_Tumor']
patients_both_mut_16 = new_mutation.index
patients_both_mut_16

Index(['02OV005', '04OV011', '04OV036', '04OV037', '14OV011', '17OV001',
       '17OV002', '17OV011', '26OV009'],
      dtype='object', name='patient_id')

In [115]:
muc4_mut_prot = CPTAC.compare_mutations(proteomics, gene)
muc4_mut_prot['Concurrent_Mutations'] = 'Single'
muc4_mut_prot.loc[patients_both_mut_16, 'Concurrent_Mutations'] = 'Both'
muc4_mut_prot.loc[muc4_mut_prot['Mutation'] == 'Wildtype_Tumor', 'Concurrent_Mutations'] = 'Wildtype'
muc4_mut_prot = muc4_mut_prot.loc[muc4_mut_prot['Mutation'] != 'Wildtype_Normal']
combined = pd.concat([outcome, muc4_mut_prot], axis=1, sort=False)
combined = combined[[clinical_outcome, 'Concurrent_Mutations']].dropna()
clinical_table = pd.crosstab(combined[clinical_outcome], combined['Concurrent_Mutations'])
print(clinical_table)
chi_test = scipy.stats.chi2_contingency(observed = clinical_table)
chi_test

Concurrent_Mutations                                Both  Single  Wildtype
Measure of Success of Outcome at the Completion...                        
Complete Response                                      6      11        29
Not Reported/ Unknown                                  0       2         4
Partial Response                                       0       2         4
Progressive Disease                                    1       3         3
Stable Disease                                         1       6         6
Unknown                                                1       0         0


(12.49319664005392,
 0.2534030840551113,
 10,
 array([[ 5.24050633, 13.97468354, 26.78481013],
        [ 0.6835443 ,  1.82278481,  3.49367089],
        [ 0.6835443 ,  1.82278481,  3.49367089],
        [ 0.79746835,  2.12658228,  4.07594937],
        [ 1.48101266,  3.94936709,  7.56962025],
        [ 0.11392405,  0.30379747,  0.58227848]]))

## All three mutations

In [116]:
second_gene = 'MUC16'
third_gene = 'MUC17'
muc4_mut = CPTAC.compare_mutations(proteomics, 'CDH2', gene)
muc16_mut = CPTAC.compare_mutations(proteomics, 'CDH2', second_gene)
muc17_mut = CPTAC.compare_mutations(proteomics, 'CDH2', third_gene)
new_mutation = pd.DataFrame()
new_mutation[gene + ' Mutation'] = muc4_mut['Mutation']
new_mutation[second_gene + ' Mutation'] = muc16_mut['Mutation']
new_mutation[third_gene + ' Mutation'] = muc17_mut['Mutation']
new_mutation['CDH2'] = muc4_mut['CDH2']
new_mutation = new_mutation.loc[new_mutation[gene + ' Mutation'] != 'Wildtype_Normal']
new_mutation = new_mutation.loc[new_mutation[gene + ' Mutation'] != 'Wildtype_Tumor'].loc[new_mutation[second_gene + ' Mutation'] != 'Wildtype_Tumor'].loc[new_mutation[third_gene + ' Mutation'] != 'Wildtype_Tumor']
patients_all_mut = new_mutation.index
patients_all_mut

Index(['02OV005', '04OV036', '17OV002'], dtype='object', name='patient_id')

In [117]:
muc4_mut_prot = CPTAC.compare_mutations(proteomics, gene)
muc4_mut_prot['Concurrent_Mutations'] = 'Single'
muc4_mut_prot.loc[patients_all_mut, 'Concurrent_Mutations'] = 'All'
muc4_mut_prot.loc[muc4_mut_prot['Mutation'] == 'Wildtype_Tumor', 'Concurrent_Mutations'] = 'Wildtype'
muc4_mut_prot = muc4_mut_prot.loc[muc4_mut_prot['Mutation'] != 'Wildtype_Normal']
combined = pd.concat([outcome, muc4_mut_prot], axis=1, sort=False)
combined = combined[[clinical_outcome, 'Concurrent_Mutations']].dropna()
clinical_table = pd.crosstab(combined[clinical_outcome], combined['Concurrent_Mutations'])
print(clinical_table)
chi_test = scipy.stats.chi2_contingency(observed = clinical_table)
chi_test

Concurrent_Mutations                                All  Single  Wildtype
Measure of Success of Outcome at the Completion...                       
Complete Response                                     3      14        29
Not Reported/ Unknown                                 0       2         4
Partial Response                                      0       2         4
Progressive Disease                                   0       4         3
Stable Disease                                        0       7         6
Unknown                                               0       1         0


(6.983337718781954,
 0.7270170333597092,
 10,
 array([[ 1.74683544, 17.46835443, 26.78481013],
        [ 0.2278481 ,  2.27848101,  3.49367089],
        [ 0.2278481 ,  2.27848101,  3.49367089],
        [ 0.26582278,  2.65822785,  4.07594937],
        [ 0.49367089,  4.93670886,  7.56962025],
        [ 0.03797468,  0.37974684,  0.58227848]]))