Detect recombination hotspots using population genetic data.
C++ Perl Makefile
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
example
simulations
LICENSE.txt
README.md
Ran.h
gpd_fit.cpp
gpd_fit.h
ldhot.cpp
ldhot.h
makefile
output_log.cpp
output_log.h
parameters.cpp
parameters.h
sim.cpp
sim.h
summary.cpp
summary.h
tools.cpp
tools.h

README.md

LDhot

A program to detect recombination hotspots using population genetic data.

##Installation

After downloading, switch to the download folder and type:

make

If you're lucky, this will compile without errors. However, note you may need a compiler that supports the C++11 standard. If you see lots of errors, you may need to upgrade your compiler.

On some systems, you can compile with multi-threading turned on, which can result in a signficant reduction in runtime. To do this, type:

make MULTI=1

##Basic Usage

Two programs are provided. The main program is called as follows.

./ldhot --seq <seq_file> --loc <loc_file> --lk <lk_file> --res <res_file> --nsim 1000 --out <out_prefix>

The seq_file, loc_file, lk_file, and res_file are all derived from LDhat, although the seq_file is required to be phased and encoded using just zeros and ones. The --nsim parameter controls the number of simulations used within the method, with at least 1000 simulations being recommended. A complete option list is given below.

The ldhot program produces an output file of the form <output_prefix>.hotspots.txt, which contains the details of the windows tested for the presence of a hotspot. This file can be treated as the final output, or further summarized using the ldhot_summary program. This is a simple program which combines windows called as significant by the main ldhot program. It is called as follows.

./ldhot_summary --res <res_file> --hot <hotspot_file> --out <out_prefix>

The output of this program can be found in <output_prefix>.hot_summary.txt.

A more complete example of the usage of LDhat and LDhot, with both input and output files, can be found in the example folder.

##Option List

###ldhot

The ldhot program takes the following parameters.

####Required Parameters:

  • --seq : Input LDhat-format sequence file. Required to be phased and encoded using zeros and ones only.
  • --loc : Input LDhat-format positions file.
  • --lk : Input LDhat-format likelihood lookup file.
  • --res : Input recombination rate estimates in same format as LDhat 'stat' output.

####Important Parameters:

  • --out : Prefix for output files (default: out).
  • --nsim : Maximum number of simulations to use (default: 100 but at least 1000 recommended).

####Other Parameters:

  • --startpos : Start position in kb.
  • --endpos : End position in kb.
  • --step : Step size (in kb) between tested windows (default: 1).
  • --windist : Define background window as +/- windist kb of hotspot center (default: 50).
  • --hotdist : Define hotspot window as +/- hotdist kb hotspot center (default: 1.5).
  • --seed : Random seed.
  • --nofreqcond : Turn off frequency conditioning.
  • --lk-SNP-window : Number of SNPs over which to calculate the composite likelihood (default: 50).

###ldhot_summary

The ldhot_summary program takes the following parameters.

####Required Parameters:

  • --res : Input recombination rate estimates in same format as LDhat 'stat' output.
  • --hot : Input hotspot file from LDhot.

####Other Parameters:

  • --out : Prefix for output files (default: out).
  • --sig : Significance cutoff for calling a hotspot (default: 0.001).
  • --sigjoin : Significance cutoff for merging hotspot windows (default: 0.01).