Reproducible analysis for paper:
Maximilian Krause, Adnan M. Niazi, Kornel Labun, Yamila N. Torres Cleuren, Florian S. Müller, Eivind Valen
About the repo
This repository is roughly organized as an R package – but is not an R package per se – providing functions and the raw data to reproduce and extend the analyses reported in the publication. By raw data, we mean the output of tools such as tailfindr and Nanopolish etc.
This project is setup with a drake workflow, ensuring reproducibility. Intermediate targets/objects will be stored in a hidden .drake directory.
The R library of this project is managed by packrat. This makes sure that the exact same package versions are used when recreating the project.
Please note that this project was built with R version 3.6.0 on a MAC OSx Mojave operating system. The packrat packages from this project are not compatible with R versions prior version 3.6.0 (In general, it should be possible to reproduce the analysis on any other operating system.)
Before starting, please ensure that you have:
A working installation of git
R (version 3.6.0 or above)
A working installation of pandoc. You can install it using instructions here.
To clone the project, open a terminal in the directory of your choice and execute:
git clone https://github.com/adnaniazi/krauseNiazi2019Analyses.git
Then go into the
krauseNiazi2019Analyses directory using:
Now start R in this location in the terminal:
Now in R console, type:
# restore all R packages with their specific version (won't run in R < 3.6.0) packrat::restore()
drake::r_make() # recreates the analysis
This command will do a series of steps:
It will download outputs of tailfindr, Nanopolish, barcode assignment, eGFP alignment results for DNA and RNA data (both us and Workman et al.’s) as
.csvfiles in the
datafolder. This step may take some time as these files are large. All the scripts that generated these
csvfiles are present in the
scriptsfolder. You can use these scripts manually yourself if you want to start working your way up from Fast5 files. However, for the sake of ease and saving time, we have already generated the results of these scripts and will download these pre-computed results to the
datadirectory. The data directory has a README file containing detailed information about each file and their respective columns.
Once all csv files are downloaded, they are consolidated into dataframes. The code that does this is located in the
Rdirectory. This step results in three dataframes:
rna_wo_datacorresponding to RNA data of Krause/Niazi et al, DNA data of Krause/Niazi et al, and RNA data of Workman et al. respectively. You can access these datasets manually – if you wish so – by using drake’s
rna_wo_datadatasets, three R Markdown files (
workman_et_al_rna_analysis.Rmd) located in the
reportsdirectory are knit. These R Makrdown files contain the code for all the figures used in the manuscript. The html outputs of these R Markdown files are generated in the
reportsdirectory. Go to
reportdirectory and open these html files to view the rendered report.
If you want to extend the analysis, then open the R Markdown file, edit
it, and re-knit it in RStudio. You will need to open
krauseNiazi2019Analyses directory as a project in R-studio. The
knitting should work – provided steps 1 and 2 have been executed without
any errors. Alternatively, you can also run
drake::r_make(), and it
will automatically run anything that has changed downstream of whatever
What is what?
Contains helper functions for downloading the data and consolidating them.
Contains calls to helper functions in the
Contains all the data generated by tailfindr, Nanopolish etc as csv
files. These file are downloaded once
drake::r_make() is run as
Contains scripts that generated the data in the
data directory. These
scripts are not run at any point in the analyses done here; they have
been provided only for reference.
Contains R Markdown files and their knitted html versions.
Contains documentation of functions in