A program to detect recombination hotspots using population genetic data. Citation.
After downloading, switch to the download folder and type:
make
If you're lucky, this will compile without errors. However, note you may need a compiler that supports the C++11 standard. If you see lots of errors, you may need to upgrade your compiler.
On some systems, you can compile with multi-threading turned on, which can result in a signficant reduction in runtime. To do this, type:
make MULTI=1
Two programs are provided. The main program is called as follows.
./ldhot --seq <seq_file> --loc <loc_file> --lk <lk_file> --res <res_file> --nsim 1000 --out <out_prefix>
The seq_file, loc_file, lk_file, and res_file are all derived from LDhat, although the seq_file is required to be phased and encoded using just zeros and ones. The --nsim parameter controls the number of simulations used within the method, with at least 1000 simulations being recommended. A complete option list is given below.
The ldhot program produces an output file of the form <output_prefix>.hotspots.txt, which contains the details of the windows tested for the presence of a hotspot. This file can be treated as the final output, or further summarized using the ldhot_summary program. This is a simple program which combines windows called as significant by the main ldhot program. It is called as follows.
./ldhot_summary --res <res_file> --hot <hotspot_file> --out <out_prefix>
The output of this program can be found in <output_prefix>.hot_summary.txt.
A more complete example of the usage of LDhat and LDhot, with both input and output files, can be found in the example folder.
The ldhot program takes the following parameters.
- --seq : Input LDhat-format sequence file. Required to be phased and encoded using zeros and ones only.
- --loc : Input LDhat-format positions file.
- --lk : Input LDhat-format likelihood lookup file.
- --res : Input recombination rate estimates in same format as LDhat 'stat' output.
- --out : Prefix for output files (default: out).
- --nsim : Maximum number of simulations to use (default: 100 but at least 1000 recommended).
- --startpos : Start position in kb.
- --endpos : End position in kb.
- --step : Step size (in kb) between tested windows (default: 1).
- --windist : Define background window as +/- windist kb of hotspot center (default: 50).
- --hotdist : Define hotspot window as +/- hotdist kb hotspot center (default: 1.5).
- --seed : Random seed.
- --nofreqcond : Turn off frequency conditioning.
- --lk-SNP-window : Number of SNPs over which to calculate the composite likelihood (default: 50).
The ldhot_summary program takes the following parameters.
- --res : Input recombination rate estimates in same format as LDhat 'stat' output.
- --hot : Input hotspot file from LDhot.
- --out : Prefix for output files (default: out).
- --sig : Significance cutoff for calling a hotspot (default: 0.001).
- --sigjoin : Significance cutoff for merging hotspot windows (default: 0.01).