## Multilevel Mixed Models

The general linear mixed model:

$$Y = X\beta + Z\gamma + e$$

Variables is of size:

Y (N x 1)

X (N x (1+k))

B ((1+k) x 1)

Z (N x r*g) where r random effects, g groups

$\gamma$ (r*g x 1)

e (N x 1)

Z contains the predictors of the random effects - the intercepts, slopes and all other covariates. 

$\gamma$ contains the random effects for each observation.

![Screen%20Shot%202018-07-12%20at%2011.02.04%20PM.png](attachment:Screen%20Shot%202018-07-12%20at%2011.02.04%20PM.png)



Random effect assumptions:
    Multivariate Normal, across r random effects
    Mean Vector 0; covariance matrix G (block diagonal within a group)
    
$$\gamma \sim N_r(0, G)$$

Error term assumtopns
    multivariate Normal (within group)
    Mean vector 0; Covariance Matrix R
    
$$e \sim N_N(0, R)$$
    
Model as a whole assumptions:

    Conditional distribution Y has a Multivariate Distribution:
        Mean Vector is the predicted values of y
        
        Covariance Matrix is a combination of the random effets 
        and error term covariance matricies
        
$$f(Y | X, Z) \sim N_N(X \beta, ZGZ^T + R)$$


$$\Sigma  = ZGZ^T + R$$

The new covariance matrix is block diagonal where each block is a matrix representing the covariance matrix of a group/cluster of observations:

![Screen%20Shot%202018-07-13%20at%201.17.54%20AM.png](attachment:Screen%20Shot%202018-07-13%20at%201.17.54%20AM.png)


## Model Estimates

This model does not have a single estimation equation. The estimation process is iterative. Two estimators are commonly used, Maximum liklihood and residual maximum liklihood. 

Goal of ML estimation is to pick a set of parameters that maximize the likelihood function
Generally log-liklihood.
For this specific model we need to know $\beta, \gamma, G, R$
The log-liklihood function is the log of the model assumed MVN:

$$N_N(X\beta, V = ZGZ^T + R)$$
By the technique estimated generalized least squares:

somehow estimate $G$ and $R$: $\hat{G}$ and $\hat{R}$

find $\beta$ using $\hat{G}$ and $\hat{R}$ 

and $\hat{V} = Z\hat{G}Z^T + \hat{R}$

finaly estimating:
$$\hat{\beta} = (X^T\hat{V}X)^{-1}X^T\hat{V}^{-1} Y$$

The most important part is picking G and R and then substituting them into the log likelihood function:


$$l(G,R) = -1/2log|\hat{V}| - 1/2 R^T\hat{V}^{-1} r - n/2log(2\pi)$$
where 
$$r = Y-X\hat{\beta}$$
$$= Y-X(X^T\hat{V}X)^{-1}X^T\hat{V}^{-1}Y$$


ML estimate preforms well when sample size is large, but estimates will be biased. Residual ML is better.

In [1]:
def root_Finding_Func(volume):
    return (((8*.8) / (3*volume - 1)) - (3 / (volume**2))) - .4

In [2]:
def newtonMethod(root_Finding_Func, derivative_Pr,  xInit, tolerance = 1e-8):
    """
    Returns the root of the input function f and the number of iterations it takes as
    a pair using the Newton's method.
    
    Keywords:
    fx: function for which root is to be found. Default is myFunction2. fx must
       return both function value and derivative at x.
    tolerance: the decimal point up to which precision is desired. Default is 1e-7
    
    """
    
    # Initialize the variables
    fx = root_Finding_Func
    fp = derivative_Pr
    
    step = 0
    f = fx(xInit)
    df = fp(xInit)
    dx = -f/df
    x = xInit
    
    # Run the iterations in the algorithm
    while np.abs(dx) > tolerance:        
        #print("Best guess of root = {} at iteration {:3d}".format(x, step))
        f = fx(x)
        df = fp(x)
        dx = -f/df
        x += dx
        step += 1
    
    # Return output
    return x, step

In [3]:
# estimates_for_newton = [volume[75], volume[387], volume[2000]]

# n_est1 = newtonMethod(root_Finding_Func, derivative_Pr, estimates_for_newton[0])
# n_est2 = newtonMethod(root_Finding_Func, derivative_Pr, estimates_for_newton[1])
# n_est3 = newtonMethod(root_Finding_Func, derivative_Pr, estimates_for_newton[2])

# print(n_est1)
# print(n_est2)
# print(n_est3)

## Multilevel Mixed Models

The general linear mixed model:

$$Y = X\beta + Z\gamma + e$$

Variables is of size:

Y (N x 1)

X (N x (1+k))

B ((1+k) x 1)

Z (N x r*g) where r random effects, g groups

$\gamma$ (r*g x 1)

e (N x 1)

Z contains the predictors of the random effects - the intercepts, slopes and all other covariates.