Simple pipeline to run LDhat and estimate variable recombination rates
To obtain realistic results, the pipeline splits whole genome data into small chunk of 2,000 - 5,000 SNPs before running the different programs from LDhat
LDhat available here
bcftools software availablehere
vcftools software available here
python3
GNU parallel (installed by default on most linux cluster)
#ldhat:
make
#then add path to bashrc or cp to bin
#vcftools
git clone https://github.com/vcftools/vcftools.git
./autogen.sh
./configure --prefix=/path/to/vcftools/
make
make install
#then add path to bashrc or cp to bin
input needed: vcf file splitted by populations. An example script to split by pop is available in utility_scripts
to create input files:
./02-script/00-extract_data_bcftools.sh population_name list_chromosome
first make sure you have an appropriate lk file (see manual)
Such file can be obtained from lkgen
or from running complete
.
Edit files 02.interval_iteration.sh
and 03.rhomap_iteration.sh
in 02-scripts
to choose appropriate MCMC length, and other relevants parameters.
Then edit files
02-scripts/graham_cedar/04.interval_parallel_NC_arg.sh
and
02-scripts/graham_cedar/05.rhomap_parallel_NC_arg.sh
to match your cluster requirement and run the script
- please see very simple example scripts located in
02-scripts/complete/
#To do:
- add angsd for estimating theta
A. Auton and G. McVean. Recombination rate estimation in the presence of hotspots. Genome Res., 2007