The goal of this script is to search for patients in the pancreatic cancer dataset that do not have any known driver mutations.

In [None]:
# Import packages

import cptac
import pandas as pd

In [None]:
# Check all available cancer datasets

cptac.list_datasets()['Cancer'].unique()

array(['ucec', None, 'brca', 'ccrcc', 'coad', 'gbm', 'hnscc', 'lscc',
       'luad', 'ov', 'pdac', 'all_cancers'], dtype=object)

In [None]:
# Load the pancreatic cancer dataset

panc = cptac.Pdac()

In [None]:
# Get all patient ids - clinical data is most likely to contain all patients

clinical = panc.get_clinical("mssm")
patient_ids = set(clinical.index)
len(patient_ids)

140

There are 140 total patients in the pancreatic cancer dataset. The next step is to filter these down to include patients who do not contain known driver mutations.

In [None]:
# Get pancreatic somatic mutation data

mutations = panc.get_somatic_mutation("washu")
mutations.head()

In [None]:
# Load driver mutation data into a dataframe

driver_mutations_df = pd.read_csv("panc_driver_mutations.tsv", sep="\t")
driver_mutations_df.head()

Unnamed: 0,Symbol,Mutations,Samples,Samples (%),Cohorts
0,KRAS,775,763,82.75,7
1,TP53,605,526,57.05,7
2,SMAD4,197,165,17.9,7
3,CDKN2A,190,144,15.62,7
4,ARID1A,69,51,5.53,7


In [11]:
len(driver_mutations_df)

64

In [None]:
# Save driver mutations into a list

driver_genes = driver_mutations_df["Symbol"]
driver_genes = list(driver_genes)

This shows that there are 64 known driver mutations for pancreatic cancer. We'll filter out all patients that have none of these.

In [None]:
# Find number patients with driver mutations

driver_mutations = mutations[mutations["Gene"].isin(driver_genes)]
patients_with_driver_mutations = set(driver_mutations.index)
len(patients_with_driver_mutations)

140

The number of patients with known driver mutations is 140, which means every patient in the dataset contains at least one driver mutation. Our goal of finding patients without driver mutations has come to an end with the pancreatic cancer patients has come to an end. Our next step is to try this again with breast cancer.