No description, website, or topics provided.
R TeX
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
docker updated with paper citation and docker image link Jan 4, 2018
kmrr added trees to repo Apr 1, 2017
levin_etal updated p to P in fig Nov 20, 2017
tests/testthat added testthat framework Feb 12, 2017
.gitignore .gitignore update Nov 11, 2017
Figure_overview.pdf minor text alignment in fig 1 Nov 20, 2017
functions.R corrected scaling Nov 20, 2017
manuscript.bib revised implications section Nov 20, 2017
manuscript.pdf rebuild pdf to bump commit Nov 20, 2017
manuscript.rmd
manuscript_kernel.R added edge summary code Nov 14, 2017
pnas.csl added significance statement May 4, 2017
readme.md

readme.md

Introduction

This repository contains files associated with our manuscript:

Dunn, CW, F Zapata, C Munro, S Siebert, A Hejnol (2018) Pairwise comparisons across species are problematic when analyzing functional genomic data. PNAS. doi:10.1073/pnas.1707515115.

It presents reanalyses of two previously published comparative gene expression studies:

Levin M, Anavy L, Cole AG, Winter E, Mostov N, Khair S, Senderovich N, Kovalev E, Silver DH, Feder M, et al. 2016. The mid-developmental transition and the evolution of animal body plans. Nature 531: 637-641. doi:10.1038/nature16994

Kryuchkova-Mostacci N, Robinson-Rechavi M: Tissue-Specificity of Gene Expression Diverges Slowly between Orthologs, and Rapidly between Paralogs. PLoS Computational Biology 2016, 12:e1005274–13. doi:10.1371/journal.pcbi.1005274

The files in this repository include:

  • manuscript.pdf The rendered manuscript. It is the simplest way to view our manuscript, including the computed results and figures.

  • manuscript.rmd The manuscript text and source code for presenting our analyses. It executes relatively quickly (a few minutes on a standard laptop) since computationally intensive analysis steps are all in manuscript_kernel.R .

  • manuscript_kernel.R The heavy lifting on the more computationally intensive analyses. It needs to be executed before running manuscript.rmd to generate the file manuscript.RData with intermediate results.

  • functions.R Custom functions required to run our analyses.

  • kmrr The folder with the data and original code from Kryuchkova-Mostacci Robinson-Rechavi 2016, as well as the products of their code that are needed to run our analyses.

  • kmrr/Compara.75.protein.nhx.emf.gz The Compara gene tree file, from ftp://ftp.ensembl.org/pub/release-75/emf/ensembl-compara/homologies/

  • levin_etal The folder with data provided by the authors of Levin et al. 2016, as well as our annotations of their analysis and the code we used to explore their results (reanalyses.rmd). The results of these analyses can be viewed at reanalyses.md.

Rerunning our analyses

We run the analyses in a Docker container with all the R dependencies needed by our code. Please see the docker folder for more information on building the docker image and running the container. Alternatively, you could run it directly on your computer after installing the dependencies yourself.

The manuscript is typically executed in two steps. First, run manuscript_kernel.R. This code includes the most computationally intensive steps, and outputs the file manuscript.RData with intermediate results. Next, knit manuscript.rmd. This reads in the intermediate results from manuscript.RData, formats them for presentation, and integrates them with the text in a combined document.

These two steps can be executed with:

nohup Rscript manuscript_kernel.R &
Rscript -e "library(rmarkdown); render('manuscript.rmd')"

The first step takes about an hour and a half in a docker container on an Amazon Web Services EC2 m4.16xlarge instance, which has 64 cores and 256 GB RAM. About 2GB RAM per core are required.

You can executed manuscript_kernel.R on a cluster or in the cloud, and then move the manuscript.RData to another computer (such as your laptop) to execute manuscript.rmd and generate the final pdf.

Development

Running tests

To run tests of the code, launch an R console from the root directory of this repository and run:

library( testthat )
test_dir( "tests/testthat/" )

Distribution

You can access this repository with the shortened url https://git.io/vDj9j