Carry out linkage analysis on exome sequence data
This provides a system to carry out linkage analysis of exome sequence data. The typical scenario would be that one had exome-sequenced a few members of a family with a genetic disorder and had failed to discover a causative variant. The system uses mega2 to set up the analyses and merlin to perform them. A script to automate this process is provided.
If you use this, please be sure to appropriately acknowledge/cite the authors of mega2 and merlin.
It is assumed that the data is vcf.gz files, one per chromosome.
The user must create a pedigree file as described in the mega2 documentation and edit the runExomeLinkage.sh script to point at the right executables and data files.
Before using the system, mega2 and merlin must be installed and makeFakeMap (provided) must be compiled.
The runExomeLinkage.sh script sets up a batch file for mega2 and runs it. However mega2 needs a few extra lines which are provided by a redirected input file. The default map file produced by mega2 is replaced by one created by makeFakeMap. The merlin model file is replaced by the one provided by the user. Then the script to run merlin is executed. This is repeated for each chromosome.
Further notes are below.
Use mega2 and merlin
Download mega2 from here:
(https://watson.hgen.pitt.edu/register/)
Documentation for mega2 here:
(https://watson.hgen.pitt.edu/docs/mega2_html/mega2.html)
Run the install.sh script, which sets up to do things like point at python, which must be available on your system.
If you have trouble compiling mega2 then note the following.
To compile mega2 you need a modern C++ compiler because it has "lamdas" in the code.
On the CS cluster at UCL enter this to get g++ 4.7.2:
scl enable devtoolset-1.1 bash
When I did the default build mega2 would crash when trying to index a vcf file. I built the debug version and this ran fine. To build the debug version, in the srcdir folder enter:
make dbg
This gave me an executable called mega2_linux
Once a mega2 binary has been successfully created you will need to modify the supplied script, runExomeLinkage.sh, to point at it as well as other executables and folders.
Most of the input for mega2 is written to a batch file called e.g. mega2.chr22.inp.
mega2 creates a new working directory each time. The script has to parse the output of mega2 to find the name of this directory.
I selected compressed VCF format.
I added --maf 0.0001 to the "Enter VCF parameters" options (as well as --remove-indels). This was to remove monomorphic variants.
I made a pedigree file called jel.fam which looks like this:
jel J13 J11 J12 2 2
jel VIII9 J11 J12 2 2
jel J11 J14 J9 1 2
jel J12 0 0 2 1
jel J2 J14 J9 1 2
jel J9 J8 V9 2 2
jel J14 0 0 1 1
jel J10 J8 V9 1 1
jel J8 0 0 1 2
jel V9 0 0 2 1
If a subject is not in the vcf file it will be given unknown genotypes. Subjects in the vcf file not in the pedigree file will be ignored.
The vcf files should be numbered with 1 2 3 ... 22 but mega2 uses 01 02 03 ... 22. We haven't tried with X yet. The supplied scripts adds 0 where necessary.
I set the maximum number of alleles to 256 or more. The option to set from 3-255 did not work.
I used Mega2 command line flag --force_numeric_alleles so long alleles do not break merlin.
I selected option to Create Merlin model file.
I entered "1 e" for the R plot statistic selection menu.
In order to specify penetrances, one cannot use a mega2 penetrance file because it also needs a names.txt file. The best way seems to be to have a user-specified merlin model file and have it overwrite the default merlin_model file which gets created by mega2 and which looks like this:
default 0.00100 0.050000,0.900000,0.900000 default
You need to compile makeFakeMap with a command like this:
g++ -o makeFakeMap makeFakeMap.cpp -lm
Once everything is installed and compiled you need to create your own pedigree file and penetrance file. Then you need to edit runMakeFile.sh to point to them and to the right files and folders on your system.
Good luck!