# Produce CCLF report with all information for each specified cell line
The goal of this notebook is to be able to create a unified HTML report for either:
1. All CN and SNV data for a single participant (e.g. PEDS172) across the targeted probe data and WES data
    + Different culture conditions, passage number, tumor tissue vs cell line, etc.
2. All CN and SNV data for a single patient ID across the targeted probe data and WES data

Both of these will make it easier for collaborators and Moony Tseng to analyse the existing data and determine what the next steps should be. The goal is to best serve these individuals and groups.

## Acquire / produce all the data for mutations and copy number
Pull from CCLF_WES and the most updated TSCA workspace. Currently, trying to transition to CCLF_targeted. 

In [None]:
from __future__ import print_function
import os.path
# import os
import dalmatian as dm
import pandas as pd
import numpy as np
import sys
sys.path.insert(0, '../../')
from JKBio import TerraFunction as terra
from CCLF_TWIST import CCLF_processing
%load_ext autoreload
%autoreload 2
%load_ext rpy2.ipython
from IPython.display import Image, display, HTML
import ipdb

# Import requirements for making CNV plots
from matplotlib import pyplot as plt
import cnvlib

In [None]:
import qgrid # interactive tables
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
import gcsfs # to be able to read in files from GCS in Python
import re # used for regex

# Extra options
qgrid.set_grid_option('maxVisibleRows', 10)

# # Show all code cells outputs
# from IPython.core.interactiveshell import InteractiveShell
# InteractiveShell.ast_node_interactivity = 'all'

In [None]:
cwd = os.getcwd()
print(cwd)

In [None]:
specificSamples_both = ["CCLF_PEDS1012",
                   "PEDS172",
                   "PEDS182",
                   "PEDS196",
                   "PEDS204"]
specificSamples_onlyWES = ["PEDS012",
                   "PEDS018",
                   "PEDS110",
                   "PEDS117"]
## kim sept lines:
# specificSamples = specificSamples_both + specificSamples_onlyWES

## on hold CCLF lines:
specificSamples = ["CCLF_PEDS1012",
                   "PEDS012",
                   "PEDS018",
                   "PEDS157",
                   "PEDS167",
                   "PEDS182",
                   "PEDS195",
                   "PEDS196",
                   "PEDS204"]

In [None]:
df = "/Users/gmiller/Documents/Work/GitHub/ccle_processing/ccle_tasks/data/cclf_on_hold/cclf_on_hold.csv"

## kim sept line info:
# df = "/Users/gmiller/Documents/Work/GitHub/ccle_processing/ccle_tasks/data/kim_sept/kim_sample_disease_info.csv"

In [None]:
# 3/3/2020 gather all the existing files
CCLF_processing.getReport(datadir = "gs://cclf_results/", specificlist = specificSamples, specificlist_disease=df)

In [None]:
# 3/3/2020 gather all the existing files
CCLF_processing.getReport(datadir = "gs://cclf_results/targeted/neekesh_TEST/", specificlist = specificSamples, specificlist_disease=df)

In [None]:
# # gather all the existing files
# CCLF_processing.getReport(datadir = "gs://cclf_results/targeted/neekesh_201912/", specificlist = specificSamples, specificlist_disease=df)

In [None]:
# gather all the existing files
CCLF_processing.getReport(datadir = "gs://cclf_results/targeted/test/", specificlist = ["PEDS172"], specificlist_disease=df)
# CCLF_processing.getReport(datadir = "gs://cclf_results/targeted/kim_sept_6/", specificlist = specificSamples, specificlist_disease=df)

We want to create heat map style copy number plots for each participant. Want to have all the culture conditions, primary tissue, matched normal that exist side by side.

We might have to make separate CN heat map for TSCA vs WES samples because can't create sample set containing both since they're in separate workspaces... or at least I think this is problematic. But maybe there's a workaround.

* step 1: create sample set for each participant (add each sample_id to a sample set list?)
   
* step 2: create submission for each participant to generate the CN heat map
    + Terra.waitForSubmission needed before step 3
    + try/except style?
* step 3: copy the image from the workspace into the output location

In [None]:
# create heat map style copy number plots for each participant
# want to have all the culture conditions, primary tissue, matched normal that exist side by side

# step 1: create sample set for each participant (add each sample_id to a sample set list?)
# step 2: create submission for each participant to generate the CN heat map
# - Terra.waitForSubmission needed before step 3
# - try/except style?
# step 3: copy the image from the workspace into the output location

In [None]:
# ! gsutil -m rm -r 'gs://cclf_results/targeted/neekesh_TEST/' 

***
***

# Pretty report generation
After grabbing and making all of the files we want for a given participant (e.g. PEDS182), we want to make a pretty, interactive report. This will be similar to a README except that we will directly embed tables and images. This involves using Jupyter widgets to create dropdown menus and the like. Here are the main functionalities I'd like:

1. kable-like tables that are interactive: sorting, filtering, typing in text or numbers to search, (ability to download sorted/filtered table as a CSV?)
2. ability to quickly go to any image in the directory. I want this so that the user can quickly look through the copy number maps (horizontal plots). Ideally, I'd like to be able to select which one(s) I'd like to view. This could be useful if they want to see two or more at once (i.e. to compare two treatment conditions).

## Automate generation of separate Jupyter notebook for each participant
To do this, we will use Papermill. Papermill automates notebook to notebook generation, and also executes the generated notebook. We may also want to convert the generated notebook to HTML. We can use *nbconvert* for this operation (see https://github.com/jupyter/nbconvert).

In [None]:
# generate automated reports by passing links
paths = ["gs://cclf_results/Neuroblastoma/PEDS157/",
"gs://cclf_results/Neuroblastoma/PEDS167/",
"gs://cclf_results/Neuroblastoma/PEDS195/",
"gs://cclf_results/Osteosarcoma/PEDS182/",
"gs://cclf_results/Spindle_Cell_Sarcoma/PEDS018/",
"gs://cclf_results/Wilms_Tumor/CCLF_PEDS1012/",
"gs://cclf_results/Wilms_Tumor/PEDS012/",
"gs://cclf_results/Wilms_Tumor/PEDS196/",
"gs://cclf_results/Wilms_Tumor/PEDS204/"]      

# HTML report name format: 'PEDS157-Neuroblastoma'
outname = [os.path.basename(os.path.dirname(i)) + "-" + os.path.basename(os.path.dirname(os.path.dirname(i))) for i in paths]
outname

# # Alternative format: 'Neuroblastoma-PEDS157'
# alternate_outname = [i.split("cclf_results/")[1][:-1].replace("/", "-") for i in paths]
# alternate_outname

In [None]:
pm.execute_notebook(
   'notebooks/notebookA.ipynb', # notebook to execute
   'notebooks/temp.ipynb', # temporary notebook that will contains the outputs and will be used to save HTML
   report_mode=True, # To hide ingested parameters cells, but not working for me
   parameters=dict(filename_ipynb='notebooks/temp.ipynb', 
                   filename_html='output/notebookA.html')
)

In [None]:
import papermill as pm

for i, path in enumerate(paths[:1]):
    report_title = outname[i] 
    report_html = report_title + ".html"
    
    pm.execute_notebook(
   'CCLF_report_template-no-mutation-summary.ipynb',
        'temp.ipynb',
#    report_title + ".ipynb",
   parameters = dict(path=path,
                    filename_ipynb='temp.ipynb', 
                    filename_html='output/'+report_html))
    
#     # Convert ipynb to HTML with nbconvert
#     ! jupyter nbconvert --to html_toc --no-input world_facts_2017.ipynb --output report_html
#     ! jupyter nbconvert --ExecutePreprocessor.store_widget_state=True  CCLF_report_template.ipynb --no-input

## Note: I may want to show the conversion between gs:// and https://console.cloud.google.com/storage/browser/ so that people who are not comfortable with using terminal will be able to easily browse and download the data in the Google bucket. I just need to make sure we won't have privacy issues (we shouldn't, right?)

In [None]:
def read_in_tables(filepaths):
    """ takes list of filepaths to TSVs and returns dataframes read in by Pandas"""
    return [pd.read_csv(f, sep='\t') for f in filepaths]

In [None]:
# path would be the participant-specific path
path = "gs://cclf_results/targeted/neekesh_201912/" 

# A list of file paths for the selected participant
filepaths = ! gsutil ls -r {path}**

# Get all the table filepaths in the bucket
table_filepaths = ! gsutil ls -r {path}*.txt # check: will this search recursively for all .txt files?
to_add = ! gsutil ls -r {path}**.tsv
table_filepaths += to_add

# Get all the png filepaths in the bucket
img_filepaths = ! gsutil ls -r {path}**.png

# Copy all the pngs in the bucket to a tmp folder
# TODO: need to delete the files afterwards
tempdir='../temp/cclfreport/images/'
! gsutil cp -r {path}**.png {tempdir} # copy images from google bucket to local temp folder
# local_img_filepaths = ! ls {tempdir}*.png
local_img_filepaths = [tempdir + os.path.basename(i) for i in img_filepaths]
print(local_img_filepaths)

os.chdir(tempdir)
local_img_file_names = [os.path.basename(i) for i in local_img_filepaths]
os.chdir(cwd)

In [None]:
# TO DELETE
# these should match up
display(local_img_filepaths[:5])
display(local_img_file_names[:5])

In [None]:
def make_interactive_table(filepath, cols_to_include = None):
    """Takes single pd dataframe as input"""
    if type(filepath) != pd.core.frame.DataFrame:
        raise Exception("The function expected a pandas dataframe as input, but got: ", str(type(filepath)))
    data = filepath
    
    # Subset the data to include the specified columns, if any passed in
    if cols_to_include is not None:
        # The index is the first column listed
        index_name = cols_to_include[0]
        data = data[cols_to_include]
        data.set_index(index_name, inplace=True, drop=True)
        if 'keep' in cols_to_include:
            data = data.loc[data['keep'] == True]
            
    # Create and display interactive table
    qgrid_widget = qgrid.show_grid(data, show_toolbar=False, grid_options = {'forceFitColumns': False,
    'defaultColumnWidth': 150})
    display(qgrid_widget)
    print("\n")

# Sample information and identifiers
This section details the external IDs for all the samples we discovered when searching the existing targeted probe data and WES data.

In [None]:
all_external_ids = ! gsutil ls -r {path}**all_external_ids.tsv
all_failed_external_ids = ! gsutil ls -r {path}**all_failed_external_ids.tsv

# Read in the tables
all_external_ids_df = read_in_tables(all_external_ids)
all_failed_external_ids_df = read_in_tables(all_failed_external_ids)

## Table: all external IDs & associated metadata
The below table is sortable and filterable. You can double-click on the cells in the table if you want to copy the contents, like if you wanted to copy the link to the file in the Google storage console.

In [None]:
# instead of interactive for each participant, might be nice to combine all into one and add a column for the participant ID. This makes for less waiting and clicking overall.

# df1 = pd.read_csv(all_external_ids[0], sep='\t')
df1 = all_external_ids_df[0]
df1['participant'] = str(os.path.basename(os.path.dirname(all_external_ids[0])))
df1['disease'] = str(os.path.basename(os.path.dirname(os.path.dirname(all_external_ids[0]))))
df1['filepath'] = str(os.path.dirname(all_external_ids[0]))
df1['link'] = str(re.sub('gs://', 'https://console.cloud.google.com/storage/browser/', os.path.dirname(all_external_ids[0])))
# for filepath in all_external_ids[1:]:
dfs_remaining = all_external_ids_df[1:]
filepaths_remaining = all_external_ids[1:]
for table,filepath in zip(dfs_remaining, filepaths_remaining):
    df2 = table
    df2['participant'] = str(os.path.basename(os.path.dirname(filepath)))
    df2['disease'] = str(os.path.basename(os.path.dirname(os.path.dirname(filepath))))
    df2['filepath'] = str(os.path.dirname(filepath))
    df2['link'] = str(re.sub('gs://', 'https://console.cloud.google.com/storage/browser/', os.path.dirname(filepath)))
    df1 = pd.concat([df1, df2], ignore_index=True)
df1.set_index('participant', drop=True, inplace=True)

# print some summary information
print("We found a total of", df1.shape[0],"external IDs that passed the depth of coverage QC.")

# allow for filtering
qgrid_widget = qgrid.show_grid(df1, show_toolbar=False, grid_options = {'forceFitColumns': False})
display(qgrid_widget)

## Samples that failed the depth of coverage QC
This summary details all the the external IDs of each sample that failed the depth of coverage QC in the targeted probe pipeline. The depth of coverage QC in the targeted probe pipeline requires that the average gene-level or interval-level coverage is >=50x. 

The summary also lists the participants for which no samples failed the depth of coverage QC.

In [None]:
no_failed = []
for table,filepath in zip(all_failed_external_ids_df, all_failed_external_ids):
    tmp_df = table
    participant_name = str(os.path.basename(os.path.dirname(filepath)))
    if tmp_df.shape[0] ==1:
        print("There was", str(tmp_df.shape[0]), "failed sample for participant", participant_name,":")
        display(sorted(tmp_df.iloc[:,0].tolist()))
    elif tmp_df.shape[0] >1:
        print("There were", str(tmp_df.shape[0]), "failed samples for participant", participant_name)
    else:
        no_failed += [participant_name]
print("There were no failed samples for participant(s):")
display(sorted(no_failed))

# Copy number data

## Copy number heat maps
There are two plots in this section, one for CN data from the targeted probe data and a second for CN data from WES data. To look at any one sample in more detail, you can look either at the corresponding horizontal CN plot in the next section titled "Copy number horizontal plots" or look at the CN table (see either the tables below or the TSV available at the link specified in the "Sample information and identifiers" section.

These tables are searchable and filterable.

### Targeted CN heat map

In [None]:
# create a heat map specific to the samples requested (either per participant basis or per list of participants) using the plotSomaticCNV workflow in Terra
# Steps: create a new sample set with the appropriate samples, submit a new job, wait for it to finish, copy the picture to the temp dir (and add it to the list of local files), then display it here

### WES CN heat map

In [None]:
# what's the best way to create a CN heat map for the WES samples? create just using the segmented CN tsv I pull in from Terra? create new workflow?

In [None]:
wes_cn

In [None]:
# TODO: TO DELETE
table_filepaths = ! gsutil ls -r {path}*.txt # check: will this search recursively for all .txt files?
to_add = ! gsutil ls -r {path}**.tsv
table_filepaths += to_add

# Get all the png filepaths in the bucket
img_filepaths = ! gsutil ls -r {path}**.png

# Copy all the pngs in the bucket to a tmp folder
# TODO: need to delete the files afterwards
tempdir='../temp/cclfreport/images/'
! gsutil cp -r {path}**.png {tempdir} # copy images from google bucket to local temp folder

In [None]:
cntempdir = '../temp/cclfreport/tables/'
local_cn_filepaths = [cntempdir + os.path.basename(i) for i in wes_cn]

# Create dict of dicts: {participant_id: {external_id: seg file}}
seg_file_dict = dict()
seg_columns = ["external_id","Chromosome", "Start", "End", "Num_Probes", "Segment_Mean"]
for filepath in wes_cn:
    participant_id = str(os.path.basename(os.path.dirname(filepath)))
    df = pd.read_csv(filepath, sep="\t")
    df = df.loc[:,seg_columns]
    
    # Write each external_id level table to a seg file (TSV format)
    tmp_path = cntempdir + participant_id + ".tsv"
    df.to_csv(tmp_path, sep = "\t", index = False)
    
    # Add key:value pair for the seg files for each participant
    seg_file_dict[participant_id] = tmp_path
    
#     # Create dict with external ID and associated seg file (with correct columns) in TSV format
#     for ext_id in set(df.loc[:,"external_id"]):
#         tmp = df[df.loc[:,"external_id"] == ext_id]
#         tmp = tmp.drop(columns = ["external_id"])
#         # Write each external_id level table to a seg file (TSV format)
#         tmp_path = cntempdir + ext_id + ".tsv"
#         tmp.to_csv(tmp_path, sep = "\t")
#         # Add key:value pair for the seg files for each participant
#         seg_file_dict[participant_id] = {ext_id: tmp_path}

seg_file_dict

In [None]:
# Convert seg files to .cns format so CNVkit can create CN plots
for participant, seg_file in seg_file_dict.items():
    print(seg_file)
    ! cnvkit.py import-seg {seg_file} -d {cntempdir}


# cntempdir = '../temp/cclfreport/tables/'
# local_cn_filepaths = [cntempdir + os.path.basename(i) for i in wes_cn]

# for i,gspath in enumerate(wes_cn):
#     # Temporarily download CN seg file locally
#     ! gsutil cp {gspath} {cntempdir}
    
#     # Convert to .cns format
#     ! cnvkit.py import-seg {local_cn_filepaths[i]}
    
# # Remove CN seg files from temporary location
# ! rm {local_cn_filepaths}


In [None]:
# Show all code cells outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

In [None]:
# https://cnvkit.readthedocs.io/en/stable/plots.html
from glob import glob
segments = [cnvlib.read(f) for f in glob(cntempdir+"*.cns")]
ax = cnvlib.do_heatmap(segments) # cnvlib.heatmap.do_heatmap(cnarrs, show_range=None, do_desaturate=False
ax.set_title("All my samples")
plt.rcParams["font.size"] = 9.0
# plt.show()



## Copy number horizontal plots

Select the copy number plot you would like to display from the dropdown menu. The dropdown menu includes CN plots from both targeted probe (TSCA and TWIST) and WES data. The source of the data will be displayed on the title of the image. You can also refer to the table of all external IDs that maps each external ID to the source of the data (see "Sample information and identifiers").

The dropdown menu also includes merged copy number maps for each participant. This PNG file contains all the horizontal CN plots for a given participant in a single place for ease of quick comparison.

**check:** can I add a linked reference to this table so that they can quickly jump there? Might be best to just make it it's own section so that it shows up in the TOC.

<!-- Note that to get nice dropdown menu names, I'm changing directories for now. There's probably a better way to do this. -->

In [None]:
os.chdir(tempdir)

In [None]:
# TODO: this isn't the most helpful format / layout currently. Make it possible to select by WES vs Targeted, and select by participant as well. I don't currently know how to do this.
# select image to display from dropdown menu    
@interact
def show_images(file=local_img_file_names):
    print("File name:", file)
    display(Image(file))

In [None]:
# must change back to the main directory
os.chdir(cwd)

In [None]:
# Get the CN tables from the Google storage bucket
tsca_cn = ! gsutil ls -r {path}**copy_number.tsv
wes_cn = ! gsutil ls -r {path}**wes_copy_number.tsv

# Create dictionary with filepaths as keys and pandas DF as the values
tsca_cn_dict = {f:pd.read_csv(f, sep="\t") for f in tsca_cn}
wes_cn_dict = {f:pd.read_csv(f, sep="\t") for f in wes_cn}

# # Read in the tables
# tsca_cn_dfs = read_in_tables(tsca_cn)
# wes_cn_dfs = read_in_tables(wes_cn)

## Targeted CN table
Select from the dropdown menu to get the targeted CN table for each participant.

In [None]:
# # Old, working version: pass in list of filepaths ("make_interactive_table_orig_READINFILE" reads in the file inside the function
# @interact
# def show_tables(filepath = tsca_cn):
#     cn_col_names = ['external_id', 'Sample', 'condition','Chromosome', 'Start', 'End','Segment_Mean', 'Segment_Call', 'Num_Probes']
#     print("Participant: ", str(os.path.basename(os.path.dirname(filepath))))
#     print("Filepath: "+ filepath)
#     print("Link:", str(re.sub('gs://', 'https://console.cloud.google.com/storage/browser/', filepath)))
#     make_interactive_table_orig_READINFILE(filepath, cols_to_include = cn_col_names)

In [None]:
@interact
def show_tables(filepath = tsca_cn):
    """ pass in dict with keys as filepaths and values as DFs"""
    # Choose columns to include
    cn_col_names = ['external_id', 'Sample', 'condition','Chromosome', 'Start', 'End','Segment_Mean', 'Segment_Call', 'Num_Probes']
    
    # Print key information about the file
    print("Participant: ", str(os.path.basename(os.path.dirname(filepath))))
    print("Filepath: "+ filepath)
    print("Link:", str(re.sub('gs://', 'https://console.cloud.google.com/storage/browser/', filepath)))
    
    # Get the TSV from the dict and display it
    df = tsca_cn_dict[filepath]
    make_interactive_table(df, cols_to_include = cn_col_names)

## WES CN table
Select from the dropdown menu to get the WES CN table for each participant, when available. The TSV will contain the data for all the different external IDs.

In [None]:
@interact
def show_tables(filepath = wes_cn):
    """ pass in dict with keys as filepaths and values as DFs"""
    # Choose columns to include
    cn_col_names = ['external_id', 'Sample', 'condition','Chromosome', 'Start', 'End','Segment_Mean', 'Segment_Call', 'Num_Probes']
    
    # Print key information about the file
    print("Participant: ", str(os.path.basename(os.path.dirname(filepath))))
    print("Filepath: "+ filepath)
    print("Link:", str(re.sub('gs://', 'https://console.cloud.google.com/storage/browser/', filepath)))
    
    # Get the TSV from the dict and display it
    df = wes_cn_dict[filepath]
    make_interactive_table(df, cols_to_include = cn_col_names)

## Maybe think about including mutations found in targeted that WEREN'T found in WES. Or, alternatively, just plot a venn diagram. I can only do this for samples where we have both WES and Targeted data - this shouldn't be difficult to figure out.

# Mutation data

Below are interactive tables containing *select* mutation information from the targeted probe data and the WES data. If there were multiple external IDs in either dataset, they have been combined into one table. The external_id column can be used to filter the data so only the mutations for a single external ID is displayed.

Note that this report only includes samples from the targeted data that pass the depth of coverage QC. Samples that did not pass this QC are not included in this report, and their data is not included in the Google bucket. A list of the samples that failed this QC is included earlier in this document (search for "Table: failed QC external IDs").

Also, note that the below tables have been filtered such that the keep column equals True. What this means is that only the variants that passed the filtering steps in the pipeline are included in the tables below. However, the raw mutation TSVs included in the Google bucket contain all the variants regardless of whether keep is True or False if you are interested in that information. This TSV will also contain columns explaining why a mutation was removed during filtration.

Generally speaking, if you are looking for more detailed information about why a mutation you expected to see was filtered out or if you want to get access to all of the columns available in the mutation TSV rather than the ones selected here, you can download the raw mutation TSV from the Google bucket.

In [None]:
# Get the mutation TSVs
tsca_mut = ! gsutil ls -r {path}**mutation.tsv
wes_mut = ! gsutil ls -r {path}**wes_mutations.tsv

# Create dictionary with filepaths as keys and pandas DF as the values
tsca_mut_dict = {f:pd.read_csv(f, sep="\t") for f in tsca_mut}
wes_mut_dict = {f:pd.read_csv(f, sep="\t") for f in wes_mut}

## Targeted mutation table

In [None]:
# this code allows for the display of interactive tables with a dropdown menu to switch between participants
@interact
def show_tables(file=tsca_mut):
    # Choose columns to include
    mut_col_names = ['external_id', 'Genome_Change', 'Protein_Change','Variant_Classification', 'Variant_Type', 'tumor_f', 't_alt_count', 't_ref_count', 'COSMIC_total_alterations_in_gene',
                     'CGC_Tumor_Types_Somatic', 'CGC_Tumor_Types_Germline',
                     'Hugo_Symbol','Matched_Norm_Sample_Barcode','condition','Chromosome', 'Start_position', 'End_position','keep']
    
    # Print key information about the file
    print("Participant: ", str(os.path.basename(os.path.dirname(file))))
    print("Filepath: "+ file)
    print("Link:", str(re.sub('gs://', 'https://console.cloud.google.com/storage/browser/', file)))
    
    # Get the TSV from the dict and display it
    df = tsca_mut_dict[file]
    try:
        make_interactive_table(df, cols_to_include = mut_col_names)
    except:
        print("\nAt least one of this participant's mutation file is in a different format from the output of the newest pipeline. This data may be old, and have different column names. No filtering is performed on the displayed table, but you can add additional filters if desired:")
        make_interactive_table(df, cols_to_include = None) 

## WES mutation table

In [None]:
# this code allows for the display of interactive tables with a dropdown menu to switch between participants
@interact
def show_tables(file=wes_mut):
    # Choose columns to include
    mut_col_names = ['external_id', 'Genome_Change', 'Protein_Change','Variant_Classification', 'Variant_Type', 'tumor_f', 't_alt_count', 't_ref_count', 'COSMIC_total_alterations_in_gene',
                     'CGC_Tumor_Types_Somatic', 'CGC_Tumor_Types_Germline',
                     'Hugo_Symbol','Matched_Norm_Sample_Barcode','condition','Chromosome', 'Start_position', 'End_position','keep']
    
    # Print key information about the file
    print("Participant: ", str(os.path.basename(os.path.dirname(file))))
    print("Filepath: "+ file)
    print("Link:", str(re.sub('gs://', 'https://console.cloud.google.com/storage/browser/', file)))
    
    # Get the TSV from the dict and display it
    df = wes_mut_dict[file]
    try:
        make_interactive_table(df, cols_to_include = mut_col_names)
    except:
        print("\nAt least one of this participant's mutation file is in a different format from the output of the newest pipeline. This data may be old, and have different column names. No filtering is performed on the displayed table, but you can add additional filters if desired:")
        make_interactive_table(df, cols_to_include = None)    