# Tutorial of MendelTwoPointLinkage

## Julia version
Current code supports Julia version 1.0+ 

## Overview
Mendel Two Point Linkage is a component of the umbrella [OpenMendel](https://openmendel.github.io) project. This analysis option maps a trait locus using two-point linkage analysis.

## Installation

*Note: Since the OpenMendel packages are not yet registered, the three OpenMendel packages (1) [SnpArrays](https://openmendel.github.io/SnpArrays.jl/latest/), (2) [MendelSearch](https://openmendel.github.io/MendelSearch.jl), and (3) [MendelBase](https://openmendel.github.io/MendelBase.jl) **must** be installed before any other OpenMendel package is installed. It is easiest if these three packages are installed in the above order.*

If you have not already installed the MendelTwoPointLinkage, then within Julia, use the package manager to install MendelTwoPointLinkage:

In [None]:
] add https://github.com/OpenMendel/TwoPointLinkage.jl.git

or once the OpenMendel packages are registered simply use:

`pkg> add TwoPointLinkage`

This package supports Julia v1.0+

## Input Files
The MendelTwoPointLinkage analysis package accepts the following input files. Example input files can be found in the [data](https://github.com/OpenMendel/MendelTwoPointLinkage.jl/tree/master/data) subfolder of the MendelTwoPointLinkage project. (An analysis won't always need every file type below.)

* [Control File](https://openmendel.github.io/MendelTwoPointLinkage.jl/#control-file): Specifies the names of your data input and output files and any optional parameters (*keywords*) for the analysis. (For a list of common keywords, see [Keywords Table](https://openmendel.github.io/MendelBase.jl/#keywords-table)). The Control file is optional. If you don't use a Control file you will enter your keywords directly in the command line.
* [Locus File](https://openmendel.github.io/MendelBase.jl/#locus-file): Names and describes the genetic loci in your data.
* [Pedigree File](https://openmendel.github.io/MendelBase.jl/#pedigree-file): Gives information about your individuals, such as name, sex, family structure, and ancestry.
* [Phenotype File](https://openmendel.github.io/MendelBase.jl/#phenotype-file): Lists the available phenotypes.

### Control file
The Control file is a text file consisting of keywords and their assigned values. The format of the Control file is:

	Keyword = Keyword_Value(s)

Below is an example of a simple Control file to run Two Point Linkage:

	#
	# Input and Output files.
	#
	locus_file = two-point linkage LocusFrame.txt
	pedigree_file = two-point linkage PedigreeFrame.txt
	phenotype_file = two-point linkage PhenotypeFrame.txt
	output_file = two-point linkage Output.txt
	lod-score-table = two-point linkage Output LOD Table.txt
	#
	# Analysis parameters for Two-Point Linkage option.
	#
	trait = RADIN
	GENDER-NEUTRAL = true
	standard_errors = true
	travel = grid

In the example above, there are nine keywords. The first three keywords specify input files: **two-point linkage LocusFrame.txt**, **two-point linkage PedigreeFrame.txt**, **two-point linkage PhenotypeFrame.txt**. The next two keywords specify output files with results of the analysis: **two-point linkage Output.txt** and **two-point linkage LOD Table Output.txt**. The last four keywords specify analysis parameters: **trait**, **gender-neutral**, **standard_errors** and **travel**. The text after the **'='** are the keyword values.

We are using as our trait RADIN. RADIN is actually an antigen that is expressed on red blood cells. According to [OMIM](https://www.omim.org/entry/111620), is part of the Scianna blood group system and is the result of variation in the gene encoding erythroblast membrane-associated protein (ERMAP; 609017) on chromosome 1p34.2.  **GENDER-NEUTRAL = true** means that we are not using different recombination maps for males and females. The keyword value **standard_errors = true** means we are asking for standard-errors to be calculated. If you have a large number of markers and individuals you may want to use the default value of **'false'** to save on computational cost on the first pass through the data. The final keyword travel refers to how the likehood is evaluated. By using **travel = grid** we are specifying that we are calculating the LOD score at specified distances between the putative trait gene and the marker.

## Keywords
This is a list of OpenMendel keywords specific to Two Point Linkage. A list of OpenMendel keywords common to most analysis package can be found [here](https://openmendel.github.io/MendelBase.jl/#keywords-table). The names of keywords are *not* case sensitive. (The keyword values *may* be case sensitive.)

 Keyword          |   Default Value    | Allowed Values |  Short Description       
----------------  |  ----------------  |  ------------- |  ----------------
   gender_neutral | true               |   true, false  | Forces equal recombination fractions
   goal           |  maximize          
   lod_score_table|Lod_Score_Frame.txt | User-defined output file name  |  Creates a lod score table output file
   output_unit    
   parameters     |  1
   points         |   9
   travel         |  grid              |_                |  Mode of sampling parameter space

## Data Files
Two Point Linkage requires a [Control file](https://openmendel.github.io/MendelBase.jl/#control-file), and a [Pedigree file](https://openmendel.github.io/MendelBase.jl/#pedigree-file). Genotype data is provided in a [SNP data file](https://openmendel.github.io/MendelBase.jl/#snp-data-file), with a [SNP Definition File](https://openmendel.github.io/MendelBase.jl/#snp-definition-file) describing the SNPs. Details on the format and contents of the Control and data files can be found on the [MendelBase](https://openmendel.github.io/MendelBase.jl) documentation page. There are example data files in the Two Point Linkage [data](https://github.com/OpenMendel/MendelTwoPointLinkage.jl/tree/master/data) folder.

The data are interesting because the markers are of historic interest.  Before the ready availablity of codominant markers (SNPs or microsatellites), researchers used biochemical markers like blood group antigens to map traits.  Maps were slowly and painfully constructed by mapping a new marker to the positions of existing markers of known location.  In this example we are trying to determine whether RADIN is linked to RHD (the familiar Rhesus factor blood group) or the enzyme PGH1 (Phosphoglucomutase-1), both of which are located on chromosome 1.  This example illustrates that these even with sparsely separated markers and very small numbers of families we can begin the process of mapping the location of trait genes.

## Running the Analysis
To run this analysis package, first launch Julia. Then load the package with the command:

`julia> using MendelTwoPointLinkage`

Next, if necessary, change to the directory containing your files, for example,

`julia> cd("~/path/to/data/files/")`

Finally, to run the analysis using the parameters in your Control file, for example, Control_file.txt, use the command:

`julia> TwoPointLinkage("Control_file.txt")`

*Note: The package is called* MendelTwoPointLinkage *but the analysis function is called simply* TwoPointLinkage.

## Output Files
Each option will create output files specific to that option, and will save them to the same directory that holds the input data files.

# Example 1: 

### Step 0: Load the OpenMendel pacakage and then go to the directory containing the data files:
First we load the MendelEstimateFrequencies package.

In [None]:
using MendelTwoPointLinkage

In this example we go to the directory containing the example data files that come with this package.

In [None]:
cd(MendelTwoPointLinkage.datadir())
pwd()

### Step 1: Preparing the Pedigree files:
Recall the structure of a [valid pedigree structure](https://openmendel.github.io/MendelBase.jl/#pedigree-file). Note that we require a header line. Let's examine the first few lines of such an example:

In [None]:
;head -10 "two-point linkage PedigreeFrame.txt"

### Step 2: Preparing the Control file
A Control file gives specific instructions to `MendelTwoPointLinkage`. A minimal Control file looks like the following:

In [None]:
;cat "two-point linkage Control.txt"

### Step 3: Run the analysis in Julia REPL or directly in notebook

In [None]:
TwoPointLinkage("two-point linkage Control.txt")

### Step 4: Output File
`TwoPointLinkage` should have generated two output files in your local directory: `two-point linkage Output.txt` and `two-point linkage Output LOD Table.txt`. The output file has detailed information on the analysis, and the output table gives the calculated lod scores.

In [None]:
;cat "two-point linkage Output LOD Table.txt"

### Step 4: Interpreting the result

`TwoPointLinkage` should have generated the files `two-point linkage Output.txt` and `two-point linkage Output LOD Table.txt` in your local directory. One can directly open the file, or import into the Julia environment for ease of manipulation using the DataFrames package.

*jss: Need to explain what each of the columns means.*

In this analysis we are considering each marker separately. We find that the LOD (the ratio of Log base 10 of the Odds) is greatest when the recombination fraction between RADIN and PGM1 is 0.20 (20% probability of recombination between RADIN and PGM1), which is roughly equivalent to genetic distance of 20 cMorgans or even more roughly to a physical distance of 20,000,000 basepairs. We find that the LOD is greateest when the recombination fraction is 0.15. We can better determine the order of these three markers and their genetic distances using multiple markers and determining their Location Scores. The OpenMendel option [MendelLocationScores](https://openmendel.github.io/MendelLocationScores.jl/) can be used for this purpose.

## Citation

If you use this analysis package in your research, please cite the following reference in the resulting publications:

*Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM (2013) Mendel: The Swiss army knife of genetic analysis programs. Bioinformatics 29:1568-1570.*

## Acknowledgments

This project is supported by the National Institutes of Health under NIGMS awards R01GM053275 and R25GM103774 and NHGRI award R01HG006139.