# Biostat 257 HW2
## Caesar Z. Li
## UID: 704135662


## Question 1
Write down the log-likelihood of the $i$ -th datum, with $(\boldsymbol{y}_i, \boldsymbol{X}_i, \boldsymbol{Z}_i)$, given parameters, $(\boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma)$.    
<br/>
Answer: the marginal distribution of a linear mixed effect model is given by,
$$
\boldsymbol{y}_i \sim N(\boldsymbol{X}_i \boldsymbol{\beta}, \boldsymbol{Z}_i\boldsymbol{\Sigma}\boldsymbol{Z}^{T}_i + \sigma^2\boldsymbol{I}) 
$$

Thus the marginal likelihood is given by,
$$
-\frac{n_i}{2}log(2\pi)-\frac{1}{2}log(|\boldsymbol{D}|)-\frac{1}{2}(\boldsymbol{y}_i - \boldsymbol{X}_i \boldsymbol{\beta})^{T}\boldsymbol{D}^{-1}(\boldsymbol{y}_i - \boldsymbol{X}_i \boldsymbol{\beta})
$$

, where $\boldsymbol{D} = \boldsymbol{Z}_i\boldsymbol{\Sigma}\boldsymbol{Z}^{T}_i + \sigma^2\boldsymbol{I}_n$

Before we move on, we can make the computing more efficient by utilizing Woodbury formula on $\boldsymbol{D}$:

$$
\boldsymbol{D}^{-1} = \frac{1}{\sigma^2}\boldsymbol{I}^{-1} - \frac{1}{\sigma^2}\boldsymbol{I}^{-1}\boldsymbol{Z}_i(\boldsymbol{\Sigma}^{-1} + \frac{1}{\sigma^2}\boldsymbol{Z}^{T}_i\boldsymbol{I}^{-1}\boldsymbol{Z}_i)^{-1}\boldsymbol{Z}^{T}_i\frac{1}{\sigma^2}\boldsymbol{I}^{-1}
$$

$$
= \frac{1}{\sigma^2}\boldsymbol{I} - \frac{1}{\sigma^4}\boldsymbol{Z}_i(\boldsymbol{\Sigma}^{-1} + \frac{1}{\sigma^2}\boldsymbol{Z}^{T}_i\boldsymbol{Z}_i)^{-1}\boldsymbol{Z}^{T}_i
$$


$$
= \frac{1}{\sigma^2}[\boldsymbol{I} - \boldsymbol{Z}_i(\sigma^2\boldsymbol{\Sigma}^{-1} + \boldsymbol{Z}^{T}_i\boldsymbol{Z}_i)^{-1}\boldsymbol{Z}^{T}_i]
$$

Note that later in the function optimization part I will also use the determinant identity from homework1.

$$
\det(\boldsymbol{D}^{-1}) = \det(\boldsymbol{Z}_i\boldsymbol{\Sigma}\boldsymbol{Z}^{T}_i + \sigma^2\boldsymbol{I})
$$

$$
= \det(\sigma^2\boldsymbol{I}) \det(\boldsymbol{I}_q + \boldsymbol{L}^{T}\boldsymbol{Z}^{T}_i\boldsymbol{Z}_i\boldsymbol{L})
$$

$$
= (\sigma^2)^n \det(\boldsymbol{I}_q + \boldsymbol{L}^{T}\boldsymbol{Z}^{T}_i\boldsymbol{Z}_i\boldsymbol{L})
$$
<br/>
, where $\boldsymbol{L}$ is the lower triangular matrix from Cholesky Decomposition for $\boldsymbol{\Sigma}$.

## Question 2 Start-up Code



In [1]:
# define a type that holds LMM datum
struct LmmObs{T <: AbstractFloat}
    # data
    y :: Vector{T}
    X :: Matrix{T}
    Z :: Matrix{T}
    # working arrays
    # whatever intermediate arrays you may want to pre-allocate
    res        :: Vector{T}
    storage_q  :: Vector{T}
    ztz        :: Matrix{T}
    storage_qq :: Matrix{T}
end

# constructor
function LmmObs(
        y::Vector{T}, 
        X::Matrix{T}, 
        Z::Matrix{T}) where T <: AbstractFloat
    res        = similar(y)
    storage_q  = Vector{T}(undef, size(Z, 2))
    ztz        = transpose(Z) * Z
    storage_qq = similar(ztz)
    LmmObs(y, X, Z, res, storage_q, ztz, storage_qq)
end

LmmObs

In [39]:
function logl!(
        obs :: LmmObs{T}, 
        β   :: Vector{T}, 
        L   :: Matrix{T}, 
        σ²  :: T) where T <: AbstractFloat
    n, p, q = size(obs.X, 1), size(obs.X, 2), size(obs.Z, 2)    
    # TODO: compute and return the log-likelihood
    #obs.storage_qq .= I + transpose(L) * obs.ztz * L / σ²
    #obs.storage_qq .= Matrix(I, q, q)
    
    BLAS.gemm!('T', 'N', 1 / σ², L, obs.ztz, 0.0, obs.storage_qq)
    BLAS.trmm!('R', 'L', 'N', 'N', 1.0, L, obs.storage_qq)
    for i = 1:q
        obs.storage_qq[i, i] += 1
    end
    
    dtmt = logdet(cholesky!(Symmetric(obs.storage_qq)))
    
    BLAS.gemm!('N', 'T', 1.0, L, L, 0.0, obs.storage_qq)
    obs.storage_qq .= inv(cholesky!(obs.storage_qq)) * σ² + obs.ztz
    obs.res .= y
    BLAS.gemv!('N', -1.0, obs.X, β, 1.0, obs.res)
    BLAS.gemm!('T', 'N', 1.0, obs.Z, obs.res, 0.0, obs.storage_q)
    sig_inv = 1 / σ²
    
    return -(n//2) * log(2π) - (1//2) * (n * log(σ²) + dtmt) -
        (1//2) * sig_inv * (dot(obs.res, obs.res) - 
        dot(obs.storage_q, cholesky!(Symmetric(obs.storage_qq)) \ obs.storage_q))
    
    #sleep(1e-3) # wait 1 ms as if your code takes 1ms
    #return 0.0
end

logl! (generic function with 1 method)

## Question 3 Correctness (15 pts)

In [30]:
using BenchmarkTools, Distributions, LinearAlgebra, Random

Random.seed!(257)
# dimension
n, p, q = 2000, 5, 3
# predictors
X  = [ones(n) randn(n, p - 1)]
Z  = [ones(n) randn(n, q - 1)]
# parameter values
β  = [2.0; -1.0; rand(p - 2)]
σ² = 1.5
Σ  = fill(0.1, q, q) + 0.9I
# generate y
y  = X * β + Z * rand(MvNormal(Σ)) + sqrt(σ²) * randn(n)

# form an LmmObs object
obs = LmmObs(y, X, Z)

LmmObs{Float64}([5.739048710854997, 5.705395720270055, 2.7368899643050355, 1.4201223592870758, -0.2099433929180451, 3.5886971824690486, -1.3778538474575956, -0.08406026821055201, -2.208007878450787, 1.309558511583542  …  1.2947876180172686, -1.970126530439509, -2.040383092851745, -1.459029682565868, 0.18616271231054726, 1.0681247149968018, 2.2292080864625254, 1.195238535460355, 1.1310626949609706, -0.43507816286713696], [1.0 -2.506566300781151 … 0.5863780184080776 1.1092991040518192; 1.0 -0.974090320735282 … 1.4143507320583761 0.45608259198567447; … ; 1.0 -1.0076371084863895 … -1.3241972696483915 1.4547609424344008; 1.0 0.38036793320364776 … -0.5857507269707397 1.796804266836504], [1.0 -0.6380567326757537 1.4738982136806946; 1.0 -2.0711110232845926 0.21422658785510312; … ; 1.0 0.5917731507133951 -0.9163364468263059; 1.0 0.9463732120394507 -0.325860403600768], [8.0e-323, 8.4e-323, 9.0e-323, 9.4e-323, 1.0e-322, 1.04e-322, 1.1e-322, 1.14e-322, 1.2e-322, 1.24e-322  …  3.26e-322, 3.3e-322, 

In [40]:
μ  = X * β
Ω  = Z * Σ * transpose(Z) +  σ² * I
mvn = MvNormal(μ, Symmetric(Ω)) # MVN(μ, Σ)
logpdf(mvn, y)

-3247.456858063827

Now check my answers with the answer from standard package above:

In [41]:
L = Matrix(cholesky(Σ).L)
logl!(obs, β, L, σ²)

-3247.456858063825

In [42]:
@assert logl!(obs, β, Matrix(cholesky(Σ).L), σ²) ≈ logpdf(mvn, y)

## Question 4 Efficiency (30 pts)
<br/>

Benchmarking your code and compare to the Distributions.jl function `logpdf`.

In [6]:
# benchmark the `logpdf` function in Distribution.jl
bm1 = @benchmark logpdf($mvn, $y)

BenchmarkTools.Trial: 
  memory estimate:  30.55 MiB
  allocs estimate:  4
  --------------
  minimum time:     57.799 ms (4.07% GC)
  median time:      69.587 ms (2.03% GC)
  mean time:        69.070 ms (6.87% GC)
  maximum time:     232.618 ms (70.73% GC)
  --------------
  samples:          73
  evals/sample:     1

In [43]:
# benchmark your implementation
L = Matrix(cholesky(Σ).L)
bm2 = @benchmark logl!($obs, $β, $L, $σ²)

BenchmarkTools.Trial: 
  memory estimate:  992 bytes
  allocs estimate:  19
  --------------
  minimum time:     41.053 μs (0.00% GC)
  median time:      47.407 μs (0.00% GC)
  mean time:        48.125 μs (0.00% GC)
  maximum time:     344.959 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [44]:
# this is the points you'll get 
clamp(median(bm1).time / median(bm2).time / 1000 * 30, 0, 30)

30.0

## Question 5 Memory (30 pts)
<br/>
You want to avoid memory allocation in the "hot" function `logl!`. You will lose 1 point for each `1 KiB = 1024` bytes memory allocation. In other words, the points you get for this question is

In [45]:
clamp(30 - median(bm2).memory / 1024, 0, 30)

29.03125

## Quewtion 6 Misc (15 pts)

Let me know if you had any issues running my code. It runs perfectly here on the cluster.