# Tutorial of MendelEstimateFrequencies


## Julia version
Current code supports Julia version 1.0+ 

## When to use MendelEstimateFrequencies
The Estimate Frequencies model applies to pedigrees, including those with missing marker data. With too many marker alleles computational efficiency suffers and large sample statistical assumptions become suspect. We recommend consolidating alleles until at most eight alleles remain and each has a frequency of 0.05 or greater. If the fraction of missing data is large, ethnic stratification may come into play. One remedy is to limit analysis to a single ethnic group; another is to use ethnic-specific allele frequencies. If you opt for the latter strategy, then you cannot simultaneously estimate allele frequencies and transmission parameters. 

## Installation

*Note: Since the OpenMendel packages are not yet registered, the three OpenMendel packages (1) [SnpArrays](https://openmendel.github.io/SnpArrays.jl/latest/), (2) [MendelSearch](https://openmendel.github.io/MendelSearch.jl), and (3) [MendelBase](https://openmendel.github.io/MendelBase.jl) **must** be installed before any other OpenMendel package is installed. It is easiest if these three packages are installed in the above order.*

If you have not already installed the MendelEstimateFrequencies, then within Julia, use the package manager to install MendelEstimateFrequencies:

In [None]:
] add https://github.com/OpenMendel/MendelEstimateFrequencies.jl.git

or once the OpenMendel packages are registered simply use:

`pkg> add MendelEstimateFrequencies`

This package supports Julia v1.0+

## Input Files
The Mendel EstimateFrequencies analysis package accepts the following input files. Example input files can be found in the [data]( https://github.com/OpenMendel/MendelEstimateFrequencies.jl/tree/master/data) subfolder of the Mendel EstimateFrequencies project. (An analysis won't always need every file type below.)

* [Control File](https://openmendel.github.io/MendelEstimateFrequencies.jl/#control-file): Specifies the names of your data input and output files and any optional parameters (*keywords*) for the analysis. (For a list of common keywords, see [Keywords Table](https://openmendel.github.io/MendelBase.jl/#keywords-table)). The Control file is optional. If you don't use a Control file you will enter your keywords directly in the command line.
* [Locus File]( https://openmendel.github.io/MendelBase.jl/#locus-file): Names and describes the genetic loci in your data.
* [Pedigree File]( https://openmendel.github.io/MendelBase.jl/#pedigree-file): Gives information about your individuals, such as name, sex, family structure, and ancestry.
* [Phenotype File]( https://openmendel.github.io/MendelBase.jl/#phenotype-file): Lists the available phenotypes.

### Control file
The Control file is a text file consisting of keywords and their assigned values. The format of the Control file is:

	Keyword = Keyword_Value(s)

Below is an example of a simple Control file to run EstimateFrequencies:

	#
	# Input and Output files.
	#
	locus_file = estimate frequencies 2 LocusFrame.txt
	pedigree_file = estimate frequencies 2 PedigreeFrame.txt
	phenotype_file = estimate frequencies 2 PhenotypeFrame.txt
	output_file = estimate frequencies 2 Output.txt
	#
	# Analysis parameters for Estimate Frequencies option.
	#

In the example above, there are three keywords specifying the input files: *estimate frequencies 2 LocusFrame.txt*, *estimate frequencies 2 PedigreeFrame.txt*, and *estimate frequencies 2 PhenotypeFrame.txt*. There is one keyword specifying the standard output file: *estimate frequencies 2 Output.txt*. There are no analysis parameters specified for this run; all analysis parameters take the default values. The text after the '=' are the keyword values. A list of OpenMendel keywords common to most analysis package can be found [here](https://openmendel.github.io/MendelBase.jl/#keywords-table). The names of keywords are *not* case sensitive. (The keyword values *may* be case sensitive.)

## Data Files
EstimateFrequencies requires a [Control file](https://openmendel.github.io/MendelBase.jl/#control-file), and a [Pedigree file](https://openmendel.github.io/MendelBase.jl/#pedigree-file). Genotype data can be included in the Pedigree file, in which case a [Locus file](https://openmendel.github.io/MendelBase.jl/#locus-file) is required. Alternatively, genotype data can be provided in a [SNP data file](https://openmendel.github.io/MendelBase.jl/#snp-data-file), in which case a [SNP Definition File](https://openmendel.github.io/MendelBase.jl/#snp-definition-file) is required. Details on the format and contents of the Control and data files can be found on the [MendelBase](https://openmendel.github.io/MendelBase.jl) documentation page. There are example data files in the EstimateFrequencies [data](https://github.com/OpenMendel/MendelEstimateFrequencies.jl/tree/master/data) folder.

## Running the Analysis
To run this analysis package, first launch Julia. Then load the package with the command:

`julia> using MendelEstimateFrequencies`

Next, if necessary, change to the directory containing your files, for example,

`julia> cd("~/path/to/data/files/")`

Finally, to run the analysis using the parameters in your Control file, for example, Control_file.txt, use the command:

`julia> EstimateFrequencies("Control_file.txt")`

*Note: The package is called* MendelEstimateFrequencies *but the analysis function is called simply* EstimateFrequencies.



## Output Files
Each option will create output files specific to that option, and will save them to the same directory that holds the input data files. The EstimateFrequencies package creates an output file that gives details about the analysis run.

# Example 1: 

### Step 0: Load the OpenMendel package and then go to the directory containing the data files:
First we load the MendelEstimateFrequencies package.

In [7]:
using MendelEstimateFrequencies

In this example we go to the directory containing the example data files that come with this package.

In [8]:
cd(MendelEstimateFrequencies.datadir())
pwd()

"/Users/jcpapp/.julia/packages/MendelEstimateFrequencies/mamno/data"

### Step 1: Preparing the Pedigree files:
Recall the structure of a [valid pedigree structure](https://openmendel.github.io/MendelBase.jl/#pedigree-file). Note that we require a header line. Let's examine the first few lines of such an example:

In [9]:
;head -10 "estimate frequencies 1 PedigreeFrame.txt"

Pedigree,Person,Mother,Father,Sex,Twin,SNP1,SNP2,SNP3,SNP4,SNP5,SNP6,SNP7,ID,SNP9,CT,ACE
1,1,NA,NA,1,NA,NA,NA,2/2,2/2,2/2,NA,1/2,1/2,1/2,1/2,0
1,2,NA,NA,2,NA,1/1,1/1,1/1,1/1,1/1,1/2,1/2,1/2,1/2,1/2,0
1,4,NA,NA,1,NA,1/1,1/1,1/2,1/1,1/1,1/2,1/2,1/2,1/2,1/2,0
1,8,1,2,2,NA,1/2,1/2,1/2,1/2,1/2,2/2,1/2,1/2,1/2,1/2,0
1,7,1,2,2,NA,1/2,1/2,1/2,1/2,1/2,1/2,1/2,1/2,1/2,1/2,0
1,6,1,2,2,NA,1/2,NA,1/2,1/2,1/2,2/2,1/2,1/2,1/2,1/2,0
1,5,1,2,2,NA,1/2,1/2,1/2,1/2,1/2,1/2,1/2,1/2,1/2,1/2,0
1,10,4,5,2,NA,1/1,1/1,1/1,1/1,1/1,1/1,2/2,1/1,1/1,1/1,0
1,9,4,5,2,NA,1/2,1/2,1/2,1/2,1/2,1/2,1/2,1/2,1/2,1/2,0


### Step 2: Preparing the Control file
A Control file gives specific instructions to `MendelEstimateFrequencies`. To estimate allele frequencies, a minimal Control file looks like the following:

In [10]:
;cat "estimate frequencies 1 Control.txt"

#
# Input and Output files.
#
locus_file = estimate frequencies 1 LocusFrame.txt
pedigree_file = estimate frequencies 1 PedigreeFrame.txt
output_file = estimate frequencies 1 Output.txt
#
# Analysis parameters for Estimate Frequencies option.
#


### Step 3: Run the analysis in Julia REPL or directly in notebook

In [11]:
EstimateFrequencies("estimate frequencies 1 Control.txt")

 
 
     Welcome to OpenMendel's
  Estimate Frequencies analysis option
        version 0.5.0
 
 
Reading the data.

The current working directory is "/Users/jcpapp/.julia/packages/MendelEstimateFrequencies/mamno/data".

Keywords modified by the user:

  control_file = estimate frequencies 1 Control.txt
  locus_file = estimate frequencies 1 LocusFrame.txt
  output_file = estimate frequencies 1 Output.txt
  pedigree_file = estimate frequencies 1 PedigreeFrame.txt
 
 
Analyzing the data.

20×6 DataFrames.DataFrame. Omitted printing of 1 columns
│ Row │ Locus   │ Allele │ Chromosome │ Frequency │ FemaleMorgans │
│     │ [90mString⍰[39m │ [90mInt64⍰[39m │ [90mString⍰[39m    │ [90mFloat64⍰[39m  │ [90mFloat64⍰[39m      │
├─────┼─────────┼────────┼────────────┼───────────┼───────────────┤
│ 1   │ SNP1    │ 1      │ AUTOSOME   │ 0.6811    │ 0.0           │
│ 2   │ SNP1    │ 2      │ AUTOSOME   │ 0.3189    │ 0.0           │
│ 3   │ SNP2    │ 1      │ AUTOSOME   │ 0.6693    │ 1.0      

### Step 4: Output File

`MendelEstimateFrequencies` should have generated the file `estimate frequencies 1 Output.txt` in your local directory. This file has detailed information on the analysis (see below).

In [12]:
;cat "estimate frequencies 1 Output.txt"

 
                Search, Julia Version

            (c) Copyright Kenneth Lange, 2019

Title: Estimate Frequencies analyis for SNP1
Grid or search option: search
Minimize or maximize: maximize
 
Parameter minima and maxima:
 
    freq 1      freq 2  
 
  1.0000e-05  1.0000e-05 
     Inf         Inf      
 
Parameter constraints:
 
    freq 1      freq 2      level 

  1.0000e+00  1.0000e+00  1.0000e+00
 
iter  steps   function    freq 1      freq 2  
 
  1    0   -3.1944e+02   6.8110e-01  3.1890e-01 
 
  2    2   -3.1911e+02   6.5917e-01  3.4083e-01 
 
  3    0   -3.1911e+02   6.5984e-01  3.4016e-01 
 
  4    0   -3.1911e+02   6.5986e-01  3.4014e-01 
 
  5    0   -3.1911e+02   6.5986e-01  3.4014e-01 
 
  6    1   -3.1911e+02   6.5986e-01  3.4014e-01 
 
  7    0   -3.1911e+02   6.5986e-01  3.4014e-01 
 
The maximum function value of -319.11062 occurs at iteration 7.
 
                Search, Julia Version

            (c) Copyright Kenneth Lange, 2019

Title: Estimate Frequencies analy

### Step 5: Interpreting the result

## Citation

If you use this analysis package in your research, please cite the following reference in the resulting publications:

*Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM (2013) Mendel: The Swiss army knife of genetic analysis programs. Bioinformatics 29:1568-1570.*

## Acknowledgments

This project is supported by the National Institutes of Health under NIGMS awards R01GM053275 and R25GM103774 and NHGRI award R01HG006139.