# Tutorial of MendelGeneDropping
### last update: 2/4/2019

## Julia version
Current code supports Julia version 1.0+ 

## Overview
MendelGeneDropping is a component of the umbrella [OpenMendel](https://openmendel.github.io) project. This analysis option ‚"drops‚" genes into a pedigree through the founders and propagates them through the pedigrees. In many circumstances, it is useful to simulate genetic data consistent with a postulated map. For example, you might want to generate p-values empirically or to estimate the power of a collection of pedigrees to detect linkage. Gene dropping randomly fills in genotypes subject to prescribed allele frequencies, a given genetic map, and Hardy-Weinberg and linkage equilibrium. After performing gene dropping the package generates a new Pedigree file for further analysis. The missing data pattern in the new pedigrees can mimic the input pedigrees, or the user can specify a new pattern.

## When to use MendelGeneDropping
The raw material for gene dropping consists of sets of pedigrees and loci. People within the pedigrees must be assigned either blank phenotypes or Mendelian consistent phenotypes. Gene dropping is carried out independently of observed phenotypes at those loci common to the definition and map files. By varying the content of the map file, you can choose exactly which loci to subject to gene dropping. Phenotypes at the remaining loci of the definition file are left untouched. Simulated genotypes rather than simulated phenotypes are reported. There are no limits on the complexity of the pedigrees or the number of loci. You can use founders from different populations, provided these populations are defined. Each founder should be assigned to a population; any unassigned founders are assumed to come from the first population.

## Installation

*Note: Since the OpenMendel packages are not yet registered, the three OpenMendel packages (1) [SnpArrays](https://openmendel.github.io/SnpArrays.jl/latest/), (2) [MendelSearch](https://openmendel.github.io/MendelSearch.jl), and (3) [MendelBase](https://openmendel.github.io/MendelBase.jl) **must** be installed before any other OpenMendel package is installed. It is easiest if these three packages are installed in the above order.*

If you have not already installed the MendelGeneDropping, then within Julia, use the package manager to install MendelGeneDropping:

In [1]:
] add https://github.com/OpenMendel/MendelGeneDropping.jl.git

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K[?25h[32m[1m   Cloning[22m[39m git-repo `https://github.com/OpenMendel/MendelGeneDropping.jl.git`
[2K[?25h[32m[1m  Updating[22m[39m git-repo `https://github.com/OpenMendel/MendelGeneDropping.jl.git`6 %
[?25l[2K[?25h[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.1/Project.toml`
 [90m [5fe433f5][39m[92m + MendelGeneDropping v0.5.0 #master (https://github.com/OpenMendel/MendelGeneDropping.jl.git)[39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.1/Manifest.toml`
 [90m [5fe433f5][39m[92m + MendelGeneDropping v0.5.0 #master (https://github.com/OpenMendel/MendelGeneDropping.jl.git)[39m


or once the OpenMendel packages are registered simply use:

`pkg> add MendelGeneDropping`

This package supports Julia v1.0+

## Input Files
The MendelGeneDropping analysis package uses the following input files. Example input files can be found in the [data](https://github.com/OpenMendel/MendelGeneDropping.jl/tree/master/data) subfolder of the MendelGeneDropping project. (An analysis won't always need every file type below.)

* [Control File](https://openmendel.github.io/MendelGeneDropping.jl/#control-file): Specifies the names of your data input and output files and any optional parameters (*keywords*) for the analysis. (For a list of common keywords, see [Keywords Table](https://openmendel.github.io/MendelBase.jl/#keywords-table)).
* [Locus File](https://openmendel.github.io/MendelBase.jl/#locus-file): Names and describes the genetic loci in your data.
* [Pedigree File](https://openmendel.github.io/MendelBase.jl/#pedigree-file): Gives information about your individuals, such as name, sex, family structure, and ancestry.
* [Phenotype File](https://openmendel.github.io/MendelBase.jl/#phenotype-file): Lists the available phenotypes.
* [SNP Definition File](https://openmendel.github.io/MendelBase.jl/#snp-definition-file): Defines your SNPs with information such as SNP name, chromosome, position, allele names, allele frequencies.
* [SNP Data File](https://openmendel.github.io/MendelBase.jl/#snp-data-file): Holds the genotypes for your data set. Must be a standard binary PLINK BED file in SNP major format. If you have a SNP data file you must have a SNP definition file.

### Control file
The Control file is a text file consisting of keywords and their assigned values. The format of the Control file is:

	Keyword = Keyword_Value(s)

Below is an example of a simple Control file to run Gene Dropping:

	#
	# Input and Output files.
	#
	locus_file = genedropping LocusFrame.txt
	pedigree_file = genedropping PedigreeFrame.txt
	phenotype_file = genedropping PhenotypeFrame.txt
	new_pedigree_file = genedropping NewPedigreeFrame.txt
	output_file = genedropping Output.txt
	#
	# Analysis parameters for GeneDropping option.
	#
	repetitions = 2
	populations = European, African, Chinese

In the example above, there are seven keywords. Three keywords specify the input files: *genedropping LocusFrame.txt*, *genedropping PedigreeFrame.txt*, and *genedropping PhenotypeFrame.txt*. Two keywords specify the output files: *genedropping Output.txt* is the results file and *genedropping NewPedigreeFrame.txt* is the new pedigree OpenMendel generates, with the added simulated genotypes. The last two keywords specify analysis parameters: *repetitions* (2), and *populations* (European, African, and Chinese). The text after the '=' are the keyword values.

## Keywords<a id="keywords-table"></a>
This is a list of OpenMendel keywords specific to Gene Dropping. A list of OpenMendel keywords common to most analysis package can be found [here](https://openmendel.github.io/MendelBase.jl/#keywords-table). The names of keywords are *not* case sensitive. (The keyword values *may* be case sensitive.)

Keyword          |   Default Value    | Allowed Values |  Short Description       
----------------      |  ----------------       |  ----------------      |  ----------------
gene_drop_output  | Unordered | Unordered, Ordered, Sourced, or Population |   Output format style for gene drop pedigrees 
interleaved          | FALSE |  TRUE, FALSE  |  Whether or not genes are dropped into pedigrees sequentially or interleaved
keep_founder_genotypes           | TRUE  |  TRUE, FALSE  |  Whether or not founder genotypes are retained
missing_rate   | 0.0 |   Real     |       
repetitions    |   1   |   integer     |       Repetitions for sharing statistics

## Data Files
Gene Dropping requires a [Control file](https://openmendel.github.io/MendelBase.jl/#control-file), and a [Pedigree file](https://openmendel.github.io/MendelBase.jl/#pedigree-file). Genotype data can be included in the Pedigree file, in which case a [Locus file](https://openmendel.github.io/MendelBase.jl/#locus-file) is required. Details on the format and contents of the Control and data files can be found on the [MendelBase](https://openmendel.github.io/MendelBase.jl) documentation page. There are example data files in the Gene Dropping [data](https://github.com/OpenMendel/MendelGeneDropping.jl/tree/master/data) folder.

## Running the Analysis
To run this analysis package, first launch Julia. Then load the package with the command:

`julia> using MendelGeneDropping`

Next, if necessary, change to the directory containing your files, for example,

`julia> cd("~/path/to/data/files/")`

Finally, to run the analysis using the parameters in your Control file, for example, Control_file.txt, use the command:

`julia> GeneDropping("Control_file.txt")`

*Note: The package is called* MendelGeneDropping *but the analysis function is called simply* GeneDropping.

# Example 1: 

### Step 0: Load the OpenMendel package and then go to the directory containing the data files:
In this example, we go to the directory containing the example data files that come with this package.

In [2]:
using MendelGeneDropping
cd(MendelGeneDropping.datadir())
pwd()

┌ Info: Precompiling MendelGeneDropping [5fe433f5-5a6a-5244-bccc-36c1fc03abd2]
└ @ Base loading.jl:1186


"/Users/jcpapp/.julia/packages/MendelGeneDropping/qZrcT/data"

### Step 1: Preparing the Pedigree files:
Recall the structure of a [valid pedigree structure](https://openmendel.github.io/MendelBase.jl/#pedigree-file). Note that we require a header line. Let's examine the first few lines of such an example:

In [3]:
;head -10 "genedropping PedigreeFrame.txt"

Pedigree,Person,Mother,Father,Sex,Twin,BirthYear,Politician,European,African,Chinese,ABO,Rh,Xg,XSNP,HtMeters
Bush,George,,,male,0,1946.0,true,1.0,0.0,0.0,AB,+,+,1/1,1.82
Bush,Laura,,,female,0,1946.0,false,1.0,0.0,0.0,O,+,+,1/2,NA
Bush,"Henry Hagar",,,male,0,1978.0,false,1.0,0.0,0.0,A,+,+,1/1,NA
Bush,Barbara,Laura,George,female,1,1981.0,false,NA,NA,NA,B,+,-,1/1,NA
Bush,Jenna,Laura,George,female,1,1981.0,false,NA,NA,NA,B,+,+,1/2,NA
Bush,Margaret,Jenna,"Henry Hagar",female,0,2013.0,false,NA,NA,NA,B,-,-,1/2,NA
Clinton,Bill,,,male,0,1946.0,true,1.0,0.0,0.0,A,+,+,2/2,1.88
Clinton,Hillary,,,female,0,1947.0,true,1.0,0.0,0.0,O,-,-,1/1,NA
Clinton,Chelsea,Hillary,Bill,female,0,1980.0,false,NA,NA,NA,A,+,-,1/2,NA


### Step 2: Preparing the Control file
A Control file gives specific instructions to `MendelGeneDropping`. A minimal Control file looks like the following:

In [6]:
;cat "genedropping Control.txt"

#
# Input and Output files.
#
locus_file = genedropping LocusFrame.txt
pedigree_file = genedropping PedigreeFrame.txt
phenotype_file = genedropping PhenotypeFrame.txt
new_pedigree_file = genedropping Output PedigreeFrame.txt
output_file = genedropping Output.txt
#
# Analysis parameters for GeneDropping option.
#
repetitions = 2
populations = European, African, Chinese


### Step 3: Run the analysis in Julia REPL or directly in notebook


In [7]:
using MendelGeneDropping
    GeneDropping("genedropping Control.txt")

 
 
     Welcome to OpenMendel's
  Gene Dropping analysis option
        version 0.5.0
 
 
Reading the data.

The current working directory is "/Users/jcpapp/.julia/packages/MendelGeneDropping/qZrcT/data".

Keywords modified by the user:

  control_file = genedropping Control.txt
  locus_file = genedropping LocusFrame.txt
  new_pedigree_file = genedropping Output PedigreeFrame.txt
  output_file = genedropping Output.txt
  pedigree_file = genedropping PedigreeFrame.txt
  phenotype_file = genedropping PhenotypeFrame.txt
  populations = Set(AbstractString["Chinese", "African", "European"])
  repetitions = 2
 
 
Analyzing the data.

26×18 DataFrames.DataFrame. Omitted printing of 12 columns
│ Row │ Pedigree │ Person      │ Mother   │ Father      │ Sex     │ Twin   │
│     │ [90mString⍰[39m  │ [90mString⍰[39m     │ [90mString⍰[39m  │ [90mString⍰[39m     │ [90mString⍰[39m │ [90mInt64⍰[39m │
├─────┼──────────┼─────────────┼──────────┼─────────────┼─────────┼────────┤
│ 1   │ Bush1

### Step 4: Interpreting the result

`MendelGeneDropping` should have generated the file `genedropping Output PedigreeFrame.txt` in your local directory. One can directly open the file, or import into the Julia environment for ease of manipulation using the DataFrames package. 

## Citation

If you use this analysis package in your research, please cite the following reference in the resulting publications:

*Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM (2013) Mendel: The Swiss army knife of genetic analysis programs. Bioinformatics 29:1568-1570.*

## Acknowledgments

This project is supported by the National Institutes of Health under NIGMS awards R01GM053275 and R25GM103774 and NHGRI award R01HG006139.