Datasets This repository intially was used to extract the data from files in multiple different formats (including pdf) from mutational cancer papers, summarize the paper and data, and plot significantly mutated genes.
The repo was then utilized to establish an algorithm to pipeline pan-cancer Level 1 TCGA 450k Illumina Methylation data using webscraping techniques. Data was downloaded, cleaned, DP and M values were ascertained using Minfi package from R, and resulting data was manipulated and placed into Pandas dataframes in Python for downstream analysis of methylation data comparison between normal and tumor tissues.