# KRAS Mutation Frequency

This notebook creates a bar chart that shows the frequency of different KRAS mutations found across the 6 cancers with data for KRAS.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import cptac
import cptac.utils as u
import plot_utils as p
import cptac.pancan as pc

In [2]:
print('cptac version:', cptac.version())

cptac version: 1.1.0


In [3]:
import warnings
warnings.filterwarnings('ignore')

# Step 1: Create dataframe with Mutation Types

For each cancer type, create a data frame that has the mutation type for each sample.

First, load in the cancer data sets from cptac.

In [None]:
g = pc.PancanGbm()
hn = pc.PancanHnscc()
l = pc.PancanLuad()
o = pc.PancanOv()
c = pc.PancanCcrcc()
col = pc.PancanCoad()
b = pc.PancanBrca()
ls = pc.PancanLscc()
en = pc.PancanUcec()

Loading washuccrcc v1.0..........                

Next, call get_geneotype_all_vars for PTEN for each cancer type. This returns a df with columns for Mutation (type of mutation), Location (location of the mutation), and Mutation_Status (wildtype, single, or multiple mutations).  For samples with multiple mutations, a single mutation was reported based on the following priority: deletion, truncation, missense, amplification, inframe, silent, and wildtype. 

In [None]:
gene = "KRAS" 

In [None]:
endo = en.get_genotype_all_vars(gene, omics_source = "washu") 


#ovar = o.get_genotype_all_vars(gene, omics_source = "washu")

#colon = col.get_genotype_all_vars(gene, omics_source = "washu")
ld = l.get_genotype_all_vars(gene, omics_source = "washu")
lscc = ls.get_genotype_all_vars(gene, omics_source = "washu")
brca = b.get_genotype_all_vars(gene, omics_source = "washu")

In [None]:
# example


# Step 2: Match Mutation Labels

The mutation types in the Colon dataset were named differently. We changed these names to match the other cancers. Nonframeshift insertion was changed to match In_Frame_Ins. Frameshift deletion was canged to match Frame_Shift_Del. Nonsynonymous snv represents a missense mutation in this case. 

In [None]:
colon["Mutation"] = colon['Mutation'].replace(['nonsynonymous SNV'], 'Missense')
colon["Mutation"] = colon['Mutation'].replace(['nonframeshift insertion'], 'In_Frame_Ins')
colon["Mutation"] = colon['Mutation'].replace(['frameshift deletion'], 'Frame_Shift_Del')
colon["Mutation"] = colon['Mutation'].replace(['frameshift insertion'], 'Frame_Shift_Ins')


colon.Mutation.value_counts()

We simplified labels for the final figure. We grouped mutations together to create the Indel and Truncation categories. In_Frame_Indel includes: In_Frame_Ins and In_Frame_Del. Truncation includes: Nonsense_Mutation, Frame_Shift_Del, and Frame_Shift_Ins.

In [None]:
# Simplify mutation names, create truncation and indel groups
dfs = [endo, ovar, colon, lscc, ld, brca]
for df in dfs:
    df['Mutation'].where(df['Mutation'] != 'Missense_Mutation', 'Missense', inplace = True) # replace when false
    df['Mutation'].where(df['Mutation'] != 'Wildtype_Tumor', 'Wildtype', inplace = True)
    df["Mutation"].where(df['Mutation'] != 'In_Frame_Del', 'Indel', inplace = True)
    df["Mutation"].where(df['Mutation'] != 'In_Frame_Ins', 'Indel', inplace = True)
    df["Mutation"].where(df['Mutation'] != 'Nonsense_Mutation', 'Truncation', inplace = True)
    df["Mutation"].where(df['Mutation'] != 'Frame_Shift_Del', 'Truncation', inplace = True)
    df["Mutation"].where(df['Mutation'] != 'Frame_Shift_Ins', 'Truncation', inplace = True)
    df = df.loc[df['Mutation'] != 'Silent'] # Drop silent

The get_genotype_all_var function created the No_Mutation label when no somatic mutations were found for PTEN in the Luad dataset. This is the same as Wildtype_Tumor. 

In [None]:
ld["Mutation"] = ld['Mutation'].replace(['No_Mutation'], 'Wildtype')

# Step 3: Create Figure

Create a list of the mutation dataframes. Create a list of cancer names for the figure legend. Call the plot_mutations function.

In [None]:
dfs = [endo, ld, brca, lscc]
for df in dfs:
    df = df.loc[df['Mutation'] != 'Silent']
names = ['Endometrial','Lung Adenocarcinoma', 'Breast', 'Lung Squamous']

p.figure1_plot_mutations(dfs, names)

# Calculate percentage of missense mutations in tumors

In [None]:
cancer_dfs = {'Endo':endo,'Ov':ovar,'Colon':colon, 
              'Luad':ld, 'Brca':brca, 'Lscc':lscc}
for cancer in cancer_dfs:
    df = cancer_dfs[cancer]
    vc = df.Mutation.value_counts()
    total = len(df)
    mut = vc.Missense
    print(cancer)
    print('total_tumor_samples:',total)
    print(mut,'/',total,'=', mut/total,'\n')