# Simlate Genomic Data for Exercise 5
This notebook is used to simulate a small genomic dataset that is analysed in the course GELASMSS2019. The material shown here is based on the JuliaCourse held in december 2018 in Munich.

In a first step, packages `Distributions` and `Random` must be loaded. After that, we are setting a seed.

In [1]:
using Distributions, Random
Random.seed!(2345);

## Initialize The Sampler
In a first step we take the code from day2/dataSimulation and modify it to a much smaller example.

In [2]:
using XSim
chrLength = 1.0
numChr    = 1
numLoci   = 100
mutRate   = 0.0
locusInt  = chrLength/numLoci
mapPos   = collect(0:locusInt:(chrLength-0.0001))
geneFreq = fill(0.5,numLoci)
XSim.build_genome(numChr,chrLength,numLoci,geneFreq,mapPos,mutRate) 

In [3]:
### #checking the result
XSim.G

XSim.GenomeInfo(XSim.ChromosomeInfo[], 0, 0.0, 0.0, Int64[], Float64[])

## Random Mating In Finite Populations
We start by generating a small founder population. In this exercise, we give a haplotype file as input which are then assigned to our founders. This is according to the example of day2/XSimMating.

In [4]:
### # specify the number of founders
popSizeFounder=5
### # sample the founders
basePop = sampleFounders(popSizeFounder, "founder_hap.txt");


Sampling 5 animals into base population.


The base population is separated into males and females

In [5]:
nMaleFounder = 2
basePopMales = XSim.Cohort(Array{XSim.Animal,1}(undef,0),Array{Int64,2}(undef,0,0))
basePopMales.animalCohort = basePop.animalCohort[1:nMaleFounder];
nFemaleFounder = popSizeFounder - nMaleFounder
basePopFemales = XSim.Cohort(Array{XSim.Animal,1}(undef,0),Array{Int64,2}(undef,0,0))
basePopFemales.animalCohort = basePop.animalCohort[(nMaleFounder + 1):popSizeFounder];

## Check
We do an intermediate check of the sampled genotypes. The cohorts do not have to be concatenated, because, we already have them together in basePop.

In [6]:

MFounders = getOurGenotypes(basePop)

5×100 Array{Int64,2}:
 2  1  0  2  2  1  1  2  1  1  1  0  1  …  1  2  1  1  1  0  2  1  1  2  0  1
 1  1  1  1  0  0  1  1  0  0  1  2  0     1  1  0  0  1  1  0  1  1  2  0  2
 2  0  2  0  2  1  1  1  0  1  2  1  1     2  1  0  2  2  2  1  2  1  1  0  1
 1  0  2  1  0  1  1  1  1  1  1  1  2     0  1  1  1  2  1  1  2  0  1  2  1
 1  1  0  0  1  1  2  2  0  1  0  2  0     2  0  1  1  1  1  0  1  1  1  2  0

## Random Mating
The founder cohorts in `basePop` are used to generate offspring from the first generation via randomly mating the sires and dams from the founder cohort. We use `basePopMales` and `basePopFemales` to produce a second generation

In [7]:
ngen,popSize = 1,5
sires1,dams1,gen1 = sampleRan(popSize, ngen, basePopMales, basePopFemales);

Generation     2: sampling     2 males and     2 females


In [8]:
ngen, popSize = 1,11
sires2,dams2,gen2 = sampleRan(popSize, ngen, sires1, dams1);

Generation     2: sampling     6 males and     6 females


All sampled animals are combined into a single cohort called `animals`.

In [9]:
animals=concatCohorts(basePop,sires1,dams1,sires2,dams2);

We use animal 6 as a sire and animal 9 as a dam to produce two more males and two more female offspring animals.

In [13]:
sire6 = XSim.Cohort(Array{XSim.Animal,1}(undef,0),Array{Int64,2}(undef,0,0))
sire6.animalCohort = animals.animalCohort[6:6]
dam9 = XSim.Cohort(Array{XSim.Animal,1}(undef,0),Array{Int64,2}(undef,0,0))
dam9.animalCohort = animals.animalCohort[9:9]

1-element Array{XSim.Animal,1}:
 XSim.Animal(XSim.Chromosome[Chromosome(Int64[], [4, 3], [0.0, 0.903508])], XSim.Chromosome[Chromosome(Int64[], [6], [0.0])], Float64[], 9, 2, 3, -9999.0, -9999.0, -9999.0)

sire6 and dam9 are parents of four offspring.

In [15]:
ngen, popSize = 1,4
offspringMales,offspringFemales,gen3 = sampleRan(popSize, ngen, sire6, dam9)

Generation     2: sampling     2 males and     2 females


(XSim.Cohort(XSim.Animal[Animal(XSim.Chromosome[Chromosome(Int64[], [1, 6], [0.0, 0.0599605])], XSim.Chromosome[Chromosome(Int64[], [4, 6], [0.0, 0.421812])], Float64[], 22, 6, 9, -9999.0, -9999.0, -9999.0), Animal(XSim.Chromosome[Chromosome(Int64[], [1, 2, 1, 6], [0.0, 0.425391, 0.688324, 0.816385])], XSim.Chromosome[Chromosome(Int64[], [4, 6], [0.0, 0.263561])], Float64[], 23, 6, 9, -9999.0, -9999.0, -9999.0)], Array{Int64}(0,0)), XSim.Cohort(XSim.Animal[Animal(XSim.Chromosome[Chromosome(Int64[], [1, 2, 1, 6], [0.0, 0.425391, 0.688324, 0.884986])], XSim.Chromosome[Chromosome(Int64[], [4, 6], [0.0, 0.531522])], Float64[], 24, 6, 9, -9999.0, -9999.0, -9999.0), Animal(XSim.Chromosome[Chromosome(Int64[], [5, 6, 2, 1], [0.0, 0.0437567, 0.442916, 0.688324])], XSim.Chromosome[Chromosome(Int64[], [6, 4, 6], [0.0, 0.322201, 0.469019])], Float64[], 25, 6, 9, -9999.0, -9999.0, -9999.0)], Array{Int64}(0,0)), 2)

Combining all animals into a singel cohort and writing the data to files


In [17]:
animals=concatCohorts(animals,offspringMales,offspringFemales);

## Randomly Sample The QTL Positions
We fix the number of QTL and initialize an index vector with the same length as the number of loci to all FALSE. Then later a random sample in this vector will be set to TRUE. These positions then represent the QTL

In [None]:
nQTL   = 5;
selQTL = fill(false,numLoci);

Using the `sample()` function to determine the QTL positions

In [None]:
selQTL[sample(1:numLoci, nQTL, replace=false)] .= true;

All positions that are not QTL are set to be markers

In [None]:
selMrk =.!selQTL;

Genotypes of markers and QTL are separated into two different matrices

In [None]:
Q = M[:,selQTL]

In [None]:
X = M[:,selMrk]

## Simulation of breeding values and phenotypic values
We start by setting a fixed number of significant QTL. In our case, this corrsponds to the number of columns of the matrix `Q`. These QTL get an associated effect which is then used to generate the breeding values called `a`. Then we add an intercept and a random error term.

In [None]:
nSigQTL = size(Q,2)
nObs = size(Q,1)
alphaSd = 1
alpha = rand(Normal(0,alphaSd),nSigQTL)
a = Q*alpha
# scaling breeding values to have variance 25.0
v = var(a)
genVar = 25.0
a *= sqrt(genVar/v)
va = var(a)
# formatted printing
println("genetic variance = $va")

Computation of `mean` and `sd` require the package `Statistics`.

In [None]:
using Statistics

In [None]:
resVar = 75.0
resStd = sqrt(resVar)
e = rand(Normal(0,resStd),nObs)
intercept = 100
y = intercept .+ a + e
yMean = Statistics.mean(y)
yVar = Statistics.var(y)
println("phenotypic mean     = $yMean")
println("phenotypic variance = $yVar")


Generated phenotypic values are assigned to the `animals` cohort. Starting with a single element.

In [None]:
animals.animalCohort[1].phenVal = y[1]


In [None]:
animals.animalCohort[1].phenVal

In a loop over the vector `y` of phenotypic observations, we assign them to the cohort `animals`. 

In [None]:
for i in 1:nObs
    println("Assigning obs: ", i, " to ", y[i])
    animals.animalCohort[i].phenVal = y[i]
end

Checking back whether assignment worked with 

In [None]:
P = getOurPhenVals(animals, resVar)

## Writing The Data
Now that we have generated the data, we must write them to files. The data consist of 

* marker and QTL genotypes
* phenotypic observations
* pedigree information

Before the data is written, we first delete any old files from previous runs. Otherwise new data gets appended to the old files.

In [None]:
outFile = "data_ex04"
# delete old files first
run(`\rm -f $outFile.ped`)
run(`\rm -f $outFile.phe`)
run(`\rm -f $outFile.brc`)
run(`\rm -f $outFile.gen`)
# write new output    
outputPedigree(animals, outFile)

## Convert This Notebook


In [None]:
;ipython nbconvert --to slides SimulateDataEx04.ipynb