The goal of this project is to reconstruct mRNA from RNA-Seq data. In this specific situation, the RNA-Seq was extracted from healthy patients and patients diagnosed with Type 2 Diabetes. Thus, this project examines the preproinsulin mRNA (pre-precursor of the insulin hormone/protein) of pancreatic beta cells.
Check all the imports in the next cell and install all packages you are missing. Then simply run the cells. Smaller necessary files are included in this GitHub repository while other files will be downloaded from this notebook. Download speeds from the European Bioinformatics Institute vary and could take a while. Estimated time to completion: About a week The final products of most interest are probably in these folders: Counts, Reconstruct, and Summary.
This project requires about 40 GBs of storage (NGS data are large files). RAM storage will likely not be a large issue for most computers.
For more general information about this project including biological interpretations, see my Medium article: https://daovang.medium.com/simple-reconstruction-of-mrna-from-next-generation-sequencing-rna-seq-c4faaa5da90d?source=friends_link&sk=ee61abdaa22a0773f5030d08d47de577
All reference preproinsulin data was obtained from NCBI: https://www.ncbi.nlm.nih.gov/gene?term=INS%5BGene%5D%20AND%20%22Homo%20sapiens%22%5BOrganism%5D&cmd=DetailsSearch
All NGS data is obtained from the European Bioinformatics Institute: https://medium.com/r/?url=https%3A%2F%2Fwww.ebi.ac.uk%2Farrayexpress%2Fexperiments%2FE-MTAB-5061%2F
The original purpose of the data can be found in this publication in Cell: https://www.cell.com/cell-metabolism/fulltext/S1550-4131(16)30436-3
Version 1.0 11/5/2020