# Kinship comparison with untyped ancestors

## 3/8/2019

To run this notebook, ensure you have the latest MendelKinship.jl installed. Press `]` and type:
```
    (v1.0) pkg> update MendelKinship
```

If you don't have MendelKinship installed already, press `]` and do the following:
```
    (v1.0) add https://github.com/OpenMendel/SnpArrays.jl
    (v1.0) add https://github.com/OpenMendel/MendelSearch.jl
    (v1.0) add https://github.com/OpenMendel/MendelBase.jl
    (v1.0) add https://github.com/biona001/MendelKinship.jl
```

## Introduction

Sometimes we have untyped ancestors (perhaps long deceased) that must be included in a pedigree for theoretical kinship calculations. However, some family members do have genotype information, so we would like to be able to compare theoretical/emprical kinship coefficients for only this subset of people. We illustrates how to do so with [MendelKinship.jl](https://github.com/biona001/MendelKinship.jl).

First we must prepare 2 pedigree files: 
+ Pedigree file with the *full* family structure, which includes people without genotype info
+ Pedigree file only containing people that have genotypes. If there are 2 pedigrees that actually should be a single larger pedigree, then keep them as 2 pedigree in this file.


# Example

## Dataset description
We use **Ped29a_subset.fam** as an example (same file as **Ped29a.in** in Mendel), which contains 212 people that have been fully genotyped. 

We added 5 extra people: 30001, 30002, 30003, 30004, and 30005 to form the *full* pedigree file **Ped29a_full.fam**. Here 30001, 30002, and 30003 were added to pedigree 1, 2, 3 respectively, and does not change the pedigree structure. But 30004 and 30005 were added to pedigree 4, and we destined person 70 (which is the founder of pedigree 5) to be a descendant of 30004 and 30005. In other words, we are assuming that 30004 and 30005 are the long deceased parents of person 70, who really belongs to pedigree 4 but have nevertheless been placed into a separate pedigree because 30004 and 30005 have not been genotyped.

In [1]:
;head -35 Ped29a_subset.fam

1,16,,,  F       ,29.20564
1,8228,,,  F       ,31.80179
1,17008,,,  M       ,37.82143
1,9218,17008,16,  M       ,35.08036
1,3226,9218,8228,  F       ,28.32902
2,29,,,  F       ,36.17929
2,2294,,,  M       ,42.88099
2,3416,,,  M       ,40.98316
2,17893,2294,29,  F       ,35.55038
2,6952,3416,17893,  M       ,48.06048
2,14695,2294,29,  F       ,37.60566
2,6790,2294,29,  M       ,46.36752
2,3916,2294,29,  F       ,35.05782
3,39,,,  F       ,34.28877
3,4521,,,  F       ,38.13171
3,8366,,,  M       ,40.98539
3,16693,,,  F       ,34.21628
3,21688,8366,16693,  M       ,36.63124
3,25532,21688,39,  F       ,31.88658
3,26294,21688,39,  M       ,39.75311
3,16795,21688,39,  F       ,33.99074
3,17445,8366,16693,  M       ,38.53802
3,2039,17445,4521,  M       ,39.46585
3,2831,8366,16693,  M       ,42.91433
4,54,,,  F       ,32.13501
4,5072,,,  M       ,37.75151
4,17240,5072,54,  F       ,33.84349
5,70,,,  F       ,34.59888
5,24010,,,  M       ,40.09751
5,21999,24010,70,  M       ,42.99402
8,109,,,  

In [2]:
;head -40 Ped29a_full.fam

1,16,,,  F       ,29.20564
1,8228,,,  F       ,31.80179
1,17008,,,  M       ,37.82143
1,9218,17008,16,  M       ,35.08036
1,3226,9218,8228,  F       ,28.32902
1,30001,9218,8228,  F       ,999.9999
2,29,,,  F       ,36.17929
2,2294,,,  M       ,42.88099
2,3416,,,  M       ,40.98316
2,17893,2294,29,  F       ,35.55038
2,6952,3416,17893,  M       ,48.06048
2,14695,2294,29,  F       ,37.60566
2,6790,2294,29,  M       ,46.36752
2,3916,2294,29,  F       ,35.05782
2,30002,2294,29,  F       ,999.9999
3,39,,,  F       ,34.28877
3,4521,,,  F       ,38.13171
3,8366,,,  M       ,40.98539
3,16693,,,  F       ,34.21628
3,21688,8366,16693,  M       ,36.63124
3,25532,21688,39,  F       ,31.88658
3,26294,21688,39,  M       ,39.75311
3,16795,21688,39,  F       ,33.99074
3,17445,8366,16693,  M       ,38.53802
3,2039,17445,4521,  M       ,39.46585
3,2831,8366,16693,  M       ,42.91433
3,30003,8366,16693,  M       ,999.9999
4,30004,,,  F       ,34.59888
4,30005,,,  M       ,34.59888
4,54,30004,30005,  F   

## Prepare control file and run

The **full** pedigree structure shall be specified through the **full_pedigree_file** keyword. Remember that any `.fam` file should *not* contain header line, but other file types should. An example control file that will run the above 2 pedigree files is shown below.

In [3]:
;cat control_includes_untyped.txt

#
# Input and Output files.
#
plink_field_separator = ','
pedigree_file = Ped29a_subset.fam
snpdata_file = SNP_data29a.bed
snpdefinition_file = SNP_def29a_converted.txt
#
# Analysis parameters for Kinship option.
#
compare_kinships = true
maf_threshold = 0.01
kinship_plot = kinship_plot
z_score_plot = z_score_plot
full_pedigree_file = Ped29a_full.fam


In [4]:
using MendelKinship
Kinship("control_includes_untyped.txt")

┌ Info: Recompiling stale cache file /Users/biona001/.julia/compiled/v1.0/MendelKinship/jENRZ.ji for MendelKinship [57586ee1-7d7e-549d-a2d8-59dc17d6b397]
└ @ Base loading.jl:1190


 
 
     Welcome to OpenMendel's
     Kinship analysis option
        version 0.2.0
 
 
Reading the data.

The current working directory is "/Users/biona001/.julia/dev/MendelKinship/docs/untyped_parents".

Keywords modified by the user:

  affected_designator = 2
  compare_kinships = true
  control_file = control_includes_untyped.txt
  full_pedigree_file = Ped29a_full.fam
  kinship_plot = kinship_plot
  maf_threshold = 0.01
  pedigree_file = Ped29a_subset.fam
  plink_field_separator = ,
  snpdata_file = SNP_data29a.bed
  snpdefinition_file = SNP_def29a_converted.txt
  z_score_plot = z_score_plot
 
 
Analyzing the data.

Kinship plot saved.


Unnamed: 0_level_0,Pedigree1,Pedigree2,Person1,Person2,theoretical_kinship,empiric_kinship,fishers_zscore
Unnamed: 0_level_1,String,String,String,String,Float64,Float64,Float64
1,4,4,54,70,0.25,-0.00176085,-11.0347
2,4,4,54,21999,0.125,-0.0196182,-6.0572
3,4,4,17240,70,0.125,0.005838,-4.9247
4,14,14,26732,264,0.0,0.109552,5.29817
5,31,31,15884,19770,0.25,0.150364,-4.21656
6,23,23,9943,392,0.125,0.0225133,-4.18276
7,25,14,22041,16636,0.0,0.0969715,4.73251
8,25,25,11822,24192,0.25,0.159229,-3.81252
9,14,14,25732,264,0.125,0.216622,4.60682
10,25,25,3012,3016,0.125,0.213888,4.47931


Fisher's plot saved.
 
 
Mendel's analysis is finished.



## Examine results

Note that the top 3 most deviated pairs are (54, 70), (54, 21999), and (17240, 70). This is because we artificially made these people direct descendants of our 2 hypothetical parents 30004 and 30005. Their empiric kinship being close to 0 confirms that they are indeed unrelated. Other pairs have numbers equal to our [old tutorial](https://github.com/OpenMendel/Tutorials/blob/master/Kinship/KinshipTutorial.ipynb), which could not handle presence of untyped people. Note that none of the 30001, ..., 30005 will show up in the final table comparison, because this comparison table is strictly for those that have genotype information. 

In [5]:
;open kinship_plot.html

In [6]:
;open z_score_plot.html