# VarLMM.jl

`VarLMM.jl` is a Julia package for modeling within-subject variability of a longitudinal measurement. 

In addition to mean levels, it can be important to model factors influencing within-subject variability of longitudinal outcomes. We utilize a modified linear mixed effects model where ---. How much methods?

# Installation

This package requires Julia v1.0 or later, which can be obtained from https://julialang.org/downloads/ or by building Julia from the sources in the https://github.com/JuliaLang/julia repository.

The package has not yet been registered and must be installed using the repository location. Start Julia and use the ] key to switch to the package manager REPL
```
(v1.1) pkg> add TODO: ADD URL
```
Use the backspace key to return to the Julia REPL.

In [1]:
using VarLMM, CSV, Random, JuliaDB

# Example Usage

The example dataset, `varlmm_exdata.csv`, is contained in `data` folder of the package. It is a simulated datatset with 500 individuals, each having 10 observations. The outcome, systolic blood pressure (sbp), is a function of other covariates. Below we read in the data with the [CSV package](https://juliadata.github.io/CSV.jl) which stores it into a `DataFrame` object. The package can take other data table objects, such as `IndexedTables` generated from reading in data with the [JuliaDB](https://github.com/JuliaData/JuliaDB.jl) package. 

In [2]:
filepath = normpath(joinpath(dirname(pathof(VarLMM)), "../data/"))
df = CSV.read(filepath * "varlmm_exdata.csv")

Unnamed: 0_level_0,id,sbp,agegroup,gender,bmi,meds
Unnamed: 0_level_1,Int64,Float64,Float64,String,Float64,String
1,1,139.586,3.0,Male,23.1336,NoMeds
2,1,141.849,3.0,Male,26.5885,NoMeds
3,1,140.484,3.0,Male,24.8428,NoMeds
4,1,141.134,3.0,Male,24.9289,NoMeds
5,1,145.443,3.0,Male,24.8057,NoMeds
6,1,140.053,3.0,Male,24.1583,NoMeds
7,1,142.1,3.0,Male,25.2543,NoMeds
8,1,143.153,3.0,Male,24.3951,NoMeds
9,1,146.675,3.0,Male,26.1514,NoMeds
10,2,110.765,1.0,Male,22.6263,NoMeds


## Loading Data Into Model

First we will create a VarLmmModel object from the dataframe.

The `VarLmmModel()` function takes the following arguments: 

- `meanformula`: the formula for the mean fixed effects β (variables in X matrix)
- `reformula`: the formula for the mean random effects γ (variables in Z matrix)
- `wsvarformula`: the formula  for the within-subject variance fixed effects τ (variables in W matrix). 
- `idvar`: the id variable for groupings. 
- `datatable`: the datatable holding all of the data for the model. Can be a DataFrame or various types of tables such as an IndexedTable.

For documentation of the ordinalgwas function, type `?VarLmmModel` in Julia REPL.

We will model sbp as a function of age, gender, and bmi. The following commands fit the following model:

$sbp_{ij} = \beta_0 + \beta_1 \text{agegroup}_{ij} + \beta_2 \text{gender}_{ij} + \beta_3 \text{bmi}_{ij} + \gamma_{i0} + \gamma_{i1}\text{bmi} + \epsilon_{ij}$

$\epsilon_{ij}$ is distributed with mean 0 variance $\sigma^2_{\epsilon_{ij}}$

$\gamma_{i} = (\gamma_{i0}, \gamma_{i1})$ has mean **0** and variance $\Sigma_\gamma$

$\sigma^2_{\epsilon_{ij}} = exp(\tau_0 + \tau_1 \text{agegroup}_{ij} + \tau_2 \text{gender}_{ij} + \tau_3 \text{bmi}_{ij})$

In [3]:
vlmm = VarLmmModel(@formula(sbp ~ 1 + agegroup + gender + bmi + meds), 
    @formula(sbp ~ 1 + bmi), 
    @formula(sbp ~ 1 + agegroup + meds + bmi),
    :id, df);

The `vlmm` object has the appropriate data. We can use the `fit!()` function to fit the model.

## Fitting

The `fit!()` function takes the following arguments:
* `m::VarLmmModel` the model to fit.
* `solver` by default this is Ipopt.IpoptSolver(print_level=5, watchdog_shortened_iter_trigger=3)
* `fittype` by default this is :Hybrid. Performing the Hybrid fit described below. The other options are :Weighted and :Unweighted.
* `weightedruns` by default this is 1.

### Fittypes
The `fit!()` function fits with the following procedures based on `fittype`. The default is `:Hybrid`.

**:Hybrid**

1. Intialize parameters through least squares $(\beta, \tau, L_\gamma)$ for initial starting points.
2. Re-Estimate $(\tau, L_\gamma)$ through the unweighted objective function (method of moments).
3. Re-estimate $\beta$ with weighted least squares using $(\tau, L_\gamma)$ estimates. 
4. For 1:weightedruns
    - 4a. Update working covariance matrix $V_i^{(0)}$ with current estimates of $(\tau, \Sigma_\gamma)$.
    - 4b. Re-estimate $(\tau, \Sigma_\gamma)$ using weighted estimating equations (weighted objective function).
5. Use final estimates and conduct inference.


**:Weighted**
1. Intialize parameters through least squares $(\beta, \tau, \Sigma)$ for initial starting points.
2. Re-estimate $\beta$ with weighted least squares using $(\tau, \Sigma_\gamma)$ estimates. 
3. For 1:weightedruns
    - 3a. Update $V_i^{(0)}$ with current estimates of $(\tau, \Sigma_\gamma)$.
    - 3b. Re-estimate $(\tau, \Sigma_\gamma)$ using weighted estimating equations (new objective function).
5. Use final estimates and conduct inference.


**:Unweighted**
1. Intialize parameters through least squares $(\beta, \tau, L_\gamma)$ for initial starting points.
2. Re-Estimate $(\tau, L_\gamma)$ through the unweighted objective function (method of moments).
3. Re-estimate $\beta$ with weighted least squares using $(\tau, L_\gamma)$ estimates. 
4. Use final estimates and conduct inference.


Generally, the unweighted code will return less precise estimates as it is less statistically efficient.

In [4]:
fit!(vlmm)


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       28

Total number of variables............................:        7
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equ

──────────────────────────────────────────────────────────────────────
                     Estimate  Std. Error  Wald Statistic  Pr(>|Wald|)
──────────────────────────────────────────────────────────────────────
β: (Intercept)   100.077        0.346108         83606.99       <1e-99
β: agegroup       14.9864       0.063344         55973.45       <1e-99
β: gender: Male   -9.92749      0.100267          9803.07       <1e-99
β: bmi             0.248811     0.0126703          385.62       <1e-85
β: meds: OnMeds  -10.115        0.123701          6686.30       <1e-99
τ: (Intercept)    -2.63253      1.0034               6.88       0.0087
τ: agegroup        1.50585      0.0567203          704.84       <1e-99
τ: meds: OnMeds   -0.426159     0.0535525           63.33       <1e-14
τ: bmi             0.00434957   0.0472932            0.01       0.9267
──────────────────────────────────────────────────────────────────────

By default, the `fit!()` function fits with the `IPOPT` solver. This can be changed easily by specifying a different solver.

In [5]:
fit!(vlmm, NLopt.NLoptSolver(algorithm = :LD_MMA, maxeval = 4000))

─────────────────────────────────────────────────────────────────────
                    Estimate  Std. Error  Wald Statistic  Pr(>|Wald|)
─────────────────────────────────────────────────────────────────────
β: (Intercept)   100.07        0.352407         80633.82       <1e-99
β: agegroup       14.9858      0.0633186        56013.88       <1e-99
β: gender: Male   -9.92915     0.1002            9819.58       <1e-99
β: bmi             0.249165    0.0130076          366.92       <1e-81
β: meds: OnMeds  -10.1153      0.123551          6702.90       <1e-99
τ: (Intercept)    -1.73145     0.859082             4.06       0.0439
τ: agegroup        1.50106     0.0285644         2761.50       <1e-99
τ: meds: OnMeds   -0.438733    0.0517293           71.93       <1e-16
τ: bmi            -0.0311176   0.0365688            0.72       0.3948
─────────────────────────────────────────────────────────────────────

In [6]:
fit!(vlmm, KNITRO.KnitroSolver(outlev=3))


Knitro 12.1.1 STUDENT LICENSE (problem size limit = 300)

Knitro 12.1.1 STUDENT LICENSE (problem size limit = 300)

            Student License
       (NOT FOR COMMERCIAL USE)
         Artelys Knitro 12.1.1

Knitro presolve eliminated 0 variables and 0 constraints.

outlev:                  3
The problem is identified as unconstrained.
Knitro changing algorithm from AUTO to 1.
Knitro changing bar_initpt from AUTO to 3.
Knitro changing bar_murule from AUTO to 4.
Knitro changing bar_penaltycons from AUTO to 0.
Knitro changing bar_penaltyrule from AUTO to 2.
Knitro changing bar_switchrule from AUTO to 0.
Knitro changing linesearch from AUTO to 2.
Knitro changing linsolver from AUTO to 2.

Problem Characteristics                                 (   Presolved)
-----------------------
Objective goal:  Minimize
Objective type:  general
Number of variables:                                  7 (           7)
    bounded below only:                               0 (           0)
    bounded abov

──────────────────────────────────────────────────────────────────────
                     Estimate  Std. Error  Wald Statistic  Pr(>|Wald|)
──────────────────────────────────────────────────────────────────────
β: (Intercept)   100.077        0.346108         83607.01       <1e-99
β: agegroup       14.9864       0.063344         55973.45       <1e-99
β: gender: Male   -9.92749      0.100267          9803.07       <1e-99
β: bmi             0.248811     0.0126703          385.62       <1e-85
β: meds: OnMeds  -10.115        0.123701          6686.30       <1e-99
τ: (Intercept)    -2.63253      1.0034               6.88       0.0087
τ: agegroup        1.50585      0.0567204          704.83       <1e-99
τ: meds: OnMeds   -0.426159     0.0535525           63.33       <1e-14
τ: bmi             0.00434962   0.0472933            0.01       0.9267
──────────────────────────────────────────────────────────────────────

### Weighted Only Estimation Procedure

The following demonstrates using only the weighted objective function to fit the model.

In [7]:
fit!(vlmm; fittype=:Weighted, weightedruns=2)

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       28

Total number of variables............................:        7
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 

──────────────────────────────────────────────────────────────────────
                     Estimate  Std. Error  Wald Statistic  Pr(>|Wald|)
──────────────────────────────────────────────────────────────────────
β: (Intercept)   100.012        0.600366         27750.81       <1e-99
β: agegroup       14.9991       0.0642862        54436.98       <1e-99
β: gender: Male   -9.92815      0.103299          9237.23       <1e-99
β: bmi             0.250905     0.023382           115.15       <1e-26
β: meds: OnMeds  -10.1508       0.128621          6228.40       <1e-99
τ: (Intercept)    -2.63686      1.04134              6.41       0.0113
τ: agegroup        1.50697      0.0621493          587.95       <1e-99
τ: meds: OnMeds   -0.42941      0.0527195           66.34       <1e-15
τ: bmi             0.00444777   0.0494806            0.01       0.9284
──────────────────────────────────────────────────────────────────────

## Tips

- If the `fit!()` function is giving errors, a different solver may remedy the issue. By default, `VarLMM` is using the IPOPT solver, but it can take any solver that supports [MathProgBase.jl](https://github.com/JuliaOpt/MathProgBase.jl). In our experience, [Knitro.jl](https://github.com/JuliaOpt/KNITRO.jl) works the best, but it has only a commercial or academic trial period. 
- The `fit!()` function can also perform poorly if the variance contribution from the random effects heavily outweighs the contribution from the within-subject variance. Standardizing continuous variables can help this immensely. To standardize a variable, a convenient way is to apply the `zscore` function from [StatsBase.jl](https://github.com/JuliaStats/StatsBase.jl) to the variable.
- Using the `:Weighted` option for `fittype` in the `fit!()` function can also work better with non-standardized variables. This performs the optimization using only the weighted objective function. 

## Simulating response

The `rvarlmm()` and `rvarlmm!()` functions can be used to generate a respone from user-supplied data and parameters. 

The `rvarlmm()` takes arrays of matricies of the data in addition to the reponse. It generates a simulated response from the VarLMM model based on:
- `Xs`: array of each clusters `X`: mean fixed effects covariates
- `Zs`: array of each clusters `Z`: random location effects covariates
- `Ws`: array of each clusters `W`: within-subject variance fixed effects covariates
- `β`: mean fixed effects vector
- `τ`: within-subject variance fixed effects vector
- `respdist`: the distribution for response. Default is MvNormal. 
- `Σγ`: random location effects covariance matrix. 
- `Σγω`: joint random location and random scale effects covariance matrix (if generating from full model).

The `rvarlmm!()` function can be used to generate a simulated response from the VarLMM model based on a datatable and place the generated response into the datatable with the `respname` field. 
Note: **the datatable MUST be ordered by grouping variable for it to generate in the correct order.
This can be checked via `datatable == sort(datatable, idvar)`. The response is based on:

- `meanformula`: represents the formula for the mean fixed effects `β` (variables in X matrix)
- `reformula`: represents the formula for the mean random effects γ (variables in Z matrix)
- `wsvarformula`: represents the formula for the within-subject variance fixed effects τ (variables in W matrix)
- `idvar`: the id variable for groupings.
- `datatable`: the data table holding all of the data for the model. For this function it **must be in order**.
- `β`: mean fixed effects vector
- `τ`: within-subject variance fixed effects vector
- `respdist`: the distribution for response. Default is MvNormal. 
- `Σγ`: random location effects covariance matrix. 
- `Σγω`: joint random location and random scale effects covariance matrix (if generating from full model)
- `respname`: symbol representing the simulated response variable name.

For both functions, only one of the Σγ or Σγω matrices have to be specified in order to use the function. Σγ can be used to specify that the generative model will not include a random scale component. It outputs `ys`: an array of reponse `y` that match the order of the data arrays (`Xs, Zs, Ws`).

In [10]:
Random.seed!(123)
t = table((id = [1; 1; 2; 3; 3; 3; 4], y = randn(7),
x1 = ones(7), x2 = randn(7), x3 = randn(7), z1 = ones(7),
z2 = randn(7), w1 = ones(7), w2 = randn(7), w3 = randn(7)))
df = DataFrame(t)

f1 = @formula(y ~ 1 + x2 + x3)
f2 = @formula(y ~ 1 + z2)
f3 = @formula(y ~ 1 + w2 + w3)

β = zeros(3)
τ = zeros(3)
Σγ = [1. 0.; 0. 1.]

rvarlmm!(f1, f2, f3, :id, t, β, τ;
        Σγ = Σγ, respname = :response)

7-element Array{Float64,1}:
 -0.07117541750437145
 -1.1460703751022492 
  4.519421309839423  
  0.18247559667949215
  0.6224277746034257 
 -2.07057805504152   
  1.9612563619429881 

In [15]:
transform(t, :response => [-0.07117541750437145
 -1.1460703751022492 
  4.519421309839423  
  0.18247559667949215
  0.6224277746034257 
 -2.07057805504152   
  1.9612563619429881])

Table with 7 rows, 11 columns:
Columns:
[1m#   [22m[1mcolname   [22m[1mtype[22m
─────────────────────
1   id        Int64
2   y         Float64
3   x1        Float64
4   x2        Float64
5   x3        Float64
6   z1        Float64
7   z2        Float64
8   w1        Float64
9   w2        Float64
10  w3        Float64
11  response  Float64

In [16]:
typeof(t) <: IndexedTable

true

In [17]:
rvarlmm!(f1, f2, f3, :id, df, β, τ;
        Σγ = Σγ, respname = :response)

7-element Array{Float64,1}:
 -0.11667873622421054
 -1.3545937159271269 
  4.731488534391006  
 -4.033494615797211  
 -1.5126831892388284 
 -1.1585730212737306 
 -0.276205556650883  

In [18]:
df[!, :response]

7-element Array{Float64,1}:
 -0.11667873622421054
 -1.3545937159271269 
  4.731488534391006  
 -4.033494615797211  
 -1.5126831892388284 
 -1.1585730212737306 
 -0.276205556650883  