# Chapter 3.3
## A Model for an Animal Evaluation (Animal Model)

Consider the following data set for the pre-weaning gain (WWG) of beef calves (calves assumed to be reared under the same management conditions). The objective is to estimate the effects of sex and predict the breeding values for all animals.

Assume $\sigma^2_u = 20$ and $\sigma^2_e = 40$, therefore $\lambda = \frac{40}{20} = 2$

In [1]:
using JWAS, DataFrames, CSV, LinearAlgebra, Statistics

data = DataFrame(ID=[4,5,6,7,8], sex=["male","female","female","male","male"], sire=[1,3,1,4,3], dam=["NA",2,2,5,6], WWG=[4.5,2.9,3.9,3.5,5.0])

Unnamed: 0_level_0,ID,sex,sire,dam,WWG
Unnamed: 0_level_1,Int64,String,Int64,Any,Float64
1,4,male,1,,4.5
2,5,female,3,2.0,2.9
3,6,female,1,2.0,3.9
4,7,male,4,5.0,3.5
5,8,male,3,6.0,5.0


The model to describe the observations is:

$y_{ij} = p_i + a_j +e_{ij}$

where: 

$y_{ij} = $ the WWG of the $j$th calf of the $i$th sex

$p_i = $ the fixed effect of the $i$th sex

$a_j = $ the random effect of the $j$th calf

$e_{ij} = $ the random error effect

### The Linear Mixed Model

Note that the above equation features both a fixed effect and a random effect. When we are working with a model where we distinguish certain parameters as fixed, and others as random, we are working with a mixed model. The general form of the linear mixed model is:

$y = Xb + Zu + e$

where:

y = vector of measured responses

X = design matrix of fixed effects

b = vector of unknown fixed effect parameters

Z = design matrix of random effects

u = vector of unknown random effect parameters, u ~ N(0,G)

e = vector of unknown random error effects, e ~ N(0, R)


So, working with our example, let's set up the vectors and matrices.

In [2]:
y = [4.5, 2.9, 3.9, 3.5, 5.0]

X = [1 0; 0 1; 0 1; 1 0; 1 0]
b = ["male", "female"]

Z = [0 0 0 1 0 0 0 0
     0 0 0 0 1 0 0 0
     0 0 0 0 0 1 0 0
     0 0 0 0 0 0 1 0
     0 0 0 0 0 0 0 1]
u = ["BV1", "BV2", "BV3", "BV4", "BV5", "BV6", "BV7", "BV8"];

### Best Linear Unbiased Estimation and Prediction (BLUE & BLUP)

When we are working with a linear mixed model, we want to solve the model for the unknown parameters. We talk about <b>estimating</b> fixed effects (BLUE), and <b>predicting</b> random effects (BLUP).

### Mixed Model Equations (MME)

Henderson (1950) presented the mixed model equations, which simultaneously solves for b and u. The mixed model equations were derived by maximizing the joint probability density of y and u. In matrix notation, the MME can be expressed as:

$\begin{bmatrix}
X'X & X'Z \\
Z'X & Z'Z + G^{-1}
\end{bmatrix}$
$\begin{bmatrix}
\hat{\beta} \\
\hat{u}
\end{bmatrix}$
=
$\begin{bmatrix}
X'y \\
Z'y
\end{bmatrix}$

Before we try to solve the MME for our example, let's look at the elements of the MME in our example:

In [3]:
X' * X

2×2 Array{Int64,2}:
 3  0
 0  2

Remember that the $X$ matrix is associated with the fixed effect parameters. The first row of $X'X$ is associated with the first fixed parameter. In our example, the first fixed effect parameter is the male sex effect. So, the first row of $X'X$ tells us how many observations we have for the male sex effect. The second row of $X'X$ tells us how many observations we have for the female sex effect. The male and female effect are the 2 <b>levels</b> of the sex <b>factor</b>.

In [4]:
Z' * Z

8×8 Array{Int64,2}:
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  1  0  0  0  0
 0  0  0  0  1  0  0  0
 0  0  0  0  0  1  0  0
 0  0  0  0  0  0  1  0
 0  0  0  0  0  0  0  1

The $Z$ matrix is associated with the random effect parameters. It is useful to think of $Z'Z$ as a matrix of records for each animal. In our example, animals 1, 2, and 3 have no record, so the first 3 rows and columns are empty. Animals 4 through 8 all have 1 record associated with them.

Note that Henderson's derivation adds the inverse $G$ matrix to this element.

In [5]:
X' * Z

2×8 Array{Int64,2}:
 0  0  0  1  0  0  1  1
 0  0  0  0  1  1  0  0

In [6]:
Z' * X

8×2 Array{Int64,2}:
 0  0
 0  0
 0  0
 1  0
 0  1
 0  1
 1  0
 1  0

$X'Z$ and $Z'X$ are transposes of each other. In $X'Z$, the rows are associated with levels, while the columns are associated with records. For example, row 1 column 4 tells us that animal 4 has one record for the male effect. $Z'X$ tells us the same information, but with the rows and columns switched.

In [7]:
X' * y

2-element Array{Float64,1}:
 13.0
  6.8

The rows of $X'y$ are the sums of $y$ for each level of the fixed effect factors.

In [8]:
Z' * y

8-element Array{Float64,1}:
 0.0
 0.0
 0.0
 4.5
 2.9
 3.9
 3.5
 5.0

The rows of $Z'y$ are the sums of $y$ for each "level" of the random effect factors. The "levels" of the random effect are unique to each animal. If animal 4 had two records, and therefore two responses, the fourth row would be a sum of the two recorded responses.

## Coding

Now that we have discussed the concept of the MME, we will use the JWAS package to evaluate our example. First, let's build our model equation:

In [9]:
var_u = 20
var_e = 40

model_equation = "WWG = sex + ID"
R = var_e
model = build_model(model_equation, R);

Now, we need the pedigree information of our example to correctly calculate the inverse $G$ matrix for our example.

In [10]:
ped = data[[:ID, :sire, :dam]]
CSV.write("pedigree.txt", ped)

pedigree = get_pedigree("pedigree.txt", header=true);

[31mThe delimiter in pedigree.txt is ','.[39m
Finished!


In [11]:
G = var_u

set_random(model, "ID", pedigree, G)

We have completed our model! Now, we run MCMC:

In [12]:
outputMCMCsamples(model)
out=runMCMC(model,data,chain_length=5000,output_samples_frequency=100,burnin=100)

[0m[1mA Linear Mixed Model was build using model equations:[22m

WWG = sex + ID

[0m[1mModel Information:[22m

Term            C/F          F/R            nLevels
sex             factor       fixed                2
ID              factor       random               9

[0m[1mMCMC Information:[22m

methods                        conventional (no markers)
chain_length                                   5000
burnin                                          100
estimateScale                                 false
starting_value                                false
printout_frequency                             5001
output_samples_frequency                        100
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0

[0m[1mHyper-parameters Information:[22m

residual variances:                          40.000
genetic variances (polygenic):
                                         

[32mrunning MCMC for conventional (no markers)...100%|██████| Time: 0:00:01[39m


Dict{Any,Any} with 4 entries:
  "Posterior mean of polyg… => [17.6022]
  "EBV_WWG"                 => 9×2 DataFrame…
  "Posterior mean of resid… => 20.6377
  "Posterior mean of locat… => 11×4 DataFrame…

After we run MCMC, we want to look at the dictionary that was created. To make things easier, let's look at the key names of the dictionary:

In [13]:
keys(out)

Base.KeySet for a Dict{Any,Any} with 4 entries. Keys:
  "Posterior mean of polygenic effects covariance matrix"
  "EBV_WWG"
  "Posterior mean of residual variance"
  "Posterior mean of location parameters"

Now, let's look at the information stored in the parameter key:

In [14]:
out["Posterior mean of location parameters"]

Unnamed: 0_level_0,Trait,Effect,Level,Estimate
Unnamed: 0_level_1,Any,Any,Any,Any
1,1,sex,male,4.51244
2,1,sex,female,3.54517
3,1,ID,3,-0.230036
4,1,ID,1,0.0954544
5,1,ID,2,-0.0957728
6,1,ID,6,0.13906
7,1,ID,8,0.0598715
8,1,ID,,-0.0281775
9,1,ID,4,-0.022299
10,1,ID,5,-0.402696


In Raphael A. Mrode's book, <ins>Linear Models for the Prediction of Animal Breeding Values</ins>, he gets the following results using MME:

In [15]:
MME_res = [4.358, 3.404, -0.041, 0.098, -0.019, 0.177, 0.183, 0, -0.009, -0.186, -0.249];

To see how our code matches with the MME results, we will look at the correlation between our results and Mrode's results:

In [16]:
MCMC_res = out["Posterior mean of location parameters"]

cor(MCMC_res[4], MME_res)

0.9991370372849071