Skip to content
Source code and Jupyter notebooks for the paper: "The limits of long-term selection against Neandertal introgression"
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
code
data
figures/diagrams
notebooks
.gitignore
00_data_processing.sh
01_burnins.sh
02_introgression.sh
03_coalsims.sh
README.md
requirements.txt
sync_pull.sh

README.md

This repository contains source code and Jupyter notebooks for data processing, simulations and analyses used in this paper.

To reproduce everything from scratch, you'll need to install all dependencies listed bellow.

Full disclosure: I've been very lucky to have access to amazing computational resources (60 core machines with 1 TB RAM and a cluster with hundreds of nodes) and I often used them to their full potential. Unless you have similar resources, it's not going to be trivial to reproduce all results from scratch. At the very least, it will take much longer to run all the simulations if you cannot parallelize them effectively.

If you don't want to re-run the whole simulation and analysis pipeline but still want to play around with results and plots, you can use the rds and RData files in the data/ subdirectory. The notebooks/figures_for_paper.ipynb notebook is a good start, as it loads those processed R data files and uses them to generate plots for the paper.

Python

I used Python version 3.6.5 and the following Python modules:

pip install numpy pandas msprime pybedtools jupyter

The full list of Python modules I had installed in the project environment can be found in the requirement.txt file.

R

I used R version 3.4.3.

Packages from CRAN:

install.packages(c("broom", "forcats", "future", "ggbeeswarm", "ggrepel",
                   "here", "magrittr", "modelr", "purrr", "stringr", "tidyverse"))

Packages from Bioconductor:

install.packages("BiocManager")
BiocManager::install(c("biomaRt", "VariantAnnotation", "BSgenome.Hsapiens.UCSC.hg19",
                       "GenomicRanges",  "rtracklayer"))

Packages from GitHub:

install.packages("devtools")
devtools::install_github("bodkan/bdkn")
devtools::install_github("bodkan/slimr", ref = "v0.1")
devtools::install_github("bodkan/admixr", ref = "v0.6.2")

To be able to run Jupyter notebooks that contain all my analses and figures, you will also need to install IRkernel.

SLiM

I used SLiM v2.6. Be aware that SLiM introduced some backwards incompatible changes since its 2.0 release, so make sure to use exactly version 2.6.

HOWTO

In principle, different notebooks in the notebooks/ directory use different data generated by "pipeline scripts" in the root of the repository (00_...sh, 01_...sh, etc.).

However, there's no strict sequential order of executing everything. In fact, I ran those scripts mostly by pieces, adding additional commands as the project developed, and analyzed new data as they were being generated.

You can’t perform that action at this time.