# Real Data Examples from gcmr R Package:
In this notebook we will compare the fit of our model vs that of R packages geepack and gcmr on the two example datasets provided in the gcmr package. 

For these examples we will use the autoregressive AR(1) parameterization of the covariance matrix $\Gamma,$ estimating only two additional parameters, $\rho$ and $\sigma^2$. 

For the dispersion parameter, $\sigma^2$, we add an L2 penalty to the loglikelihood to keep the estimates from going off to infinity. 


## Ex 1: Longitudinal Poisson
#### Betas

|Beta|Estimate_ours (Penalized)|Estimate_ours (Unpenalized)|Estimate_geepack|Estimate_gcmr |
|--|-----|-----|--|--|
|intercept | 3.44 | 3.45 | 3.39 | 3.23 |
|visit | -1.26 | -1.26 | -1.23 | -1.09 |
|trt | 0.01 | -0.01 | 0.02 | -0.05 |
|visit*trt | -0.12 | -0.11 | -0.13 | -0.52 |

#### sigma2 AR dispersion
|Estimate|Estimate_ours (Penalized)|Estimate_ours (Unpenalized)|Estimate_geepack|Estimate_gcmr |
|--|-----|-----|--|--|
| sigma2| 0.536 | 137262 | 19.93 | - |

#### rho AR correlation
|Estimate|Estimate_ours (Penalized)|Estimate_ours (Unpenalized)|Estimate_geepack|Estimate_gcmr |
|--|-----|-----|--|--|
| rho | 1.0 | 0.95 | 0.89 | 0.46 |

## Ex 2: Longitudinal Logistic

#### Betas

|Beta|Estimate_ours (Penalized)|Estimate_ours (Unpenalized)|Estimate_geepack|Estimate_gcmr |
|--|-----|-----|--|--|
|intercept| -0.92 | -0.86 | -0.95 |-1.01 |
|center| 0.83 | 0.83 | 0.76 | 0.75 |
|age| -0.02 | -0.03 | -0.02 | -0.02 |
|baseline| 1.97 | 2.10 | 1.61 | 1.72 |

#### sigma2 AR dispersion
|Estimate|Estimate_ours (Penalized)|Estimate_ours (Unpenalized)|Estimate_geepack|Estimate_gcmr |
|--|-----|-----|--|--|
| sigma2 | 0.27 |34284 | 1.04 | - |

#### rho AR correlation
|Estimate|Estimate_ours (Penalized)|Estimate_ours (Unpenalized)|Estimate_geepack|Estimate_gcmr |
|--|-----|-----|--|--|
| rho | 0.85 | 0.78 | 0.52 | 0.59 |

# Example 1: Longitudinal Poisson

In [1]:
cd("/Users/sarahji/.julia/dev/GLMCopula.jl/R_example_datasets/")
using CSV, DataFrames, GLMCopula, LinearAlgebra, GLM, RCall, RData, RDatasets

┌ Info: Precompiling GLMCopula [c47b6ae2-b804-4668-9957-eb588c99ffbc]
└ @ Base loading.jl:1342


In [2]:
df = CSV.read("epilepsy_gcmr.csv", DataFrame)
groups = unique(df[!, :id])
n, p, m = length(groups), 1, 2
d = Poisson()
link = LogLink()
D = typeof(d)
Link = typeof(link)
T = Float64
gcs = Vector{GLMCopulaARObs{T, D, Link}}(undef, n)
for (i, grp) in enumerate(groups)
    gidx = df[!, :id] .== grp
    ni = count(gidx)
    y = Float64.(df[gidx, :counts])
    X1 = Float64.(df[gidx, :visit])
    X2 = Float64.(df[gidx, :trt])
    X3 = Float64.(df[gidx, :visit] .* df[gidx, :trt])
    X = [ones(ni, 1) X1 X2 X3]
    gcs[i] = GLMCopulaARObs(y, X, d, link)
end
gcm = GLMCopulaARModel(gcs);

# initialize β and τ from least square solution
@info "Initial point:"
@time initialize_model!(gcm);
@show gcm.β
@show gcm.σ2
@show gcm.ρ

@show loglikelihood!(gcm, true, false)
@time GLMCopula.fit!(gcm, IpoptSolver(print_level = 5, max_iter = 100, tol = 10^-5, limited_memory_max_history = 20, hessian_approximation = "limited-memory"));

┌ Info: Initial point:
└ @ Main In[2]:23


initializing β using Newton's Algorithm under Independence Assumption
1 0.0 -2383.6953083154885 235
2 -2383.6953083154885 -2367.3833507578256 176
3 -2367.3833507578256 -2332.6495007505846 176
4 -2332.6495007505846 -2322.3539684318803 176
5 -2322.3539684318803 -2319.5117860113937 176
6 -2319.5117860113937 -2318.761664391745 176
7 -2318.761664391745 -2318.5687204140713 176
8 -2318.5687204140713 -2318.5197750083207 176
9 -2318.5197750083207 -2318.507447809101 176
10 -2318.507447809101 -2318.5043545133703 176
11 -2318.5043545133703 -2318.503579743562 176
12 -2318.503579743562 -2318.5033858698157 176
13 -2318.5033858698157 -2318.5033373786814 176
14 -2318.5033373786814 -2318.5033252530557 176
  2.066830 seconds (5.98 M allocations: 335.694 MiB, 5.20% gc time, 99.67% compilation time)
gcm.β = [3.4271102938648115, -1.2745025414594278, 0.027541807868130437, -0.10472782359332208]
gcm.σ2 = [7.187675027177891]
gcm.ρ = [1.0]
loglikelihood!(gcm, true, false) = -3695.298987001914
gcm.θ = [3.42711029

In [3]:
@show gcm.β
@show gcm.σ2
@show gcm.ρ;

gcm.β = [3.4431068704669063, -1.2639779859421596, 0.014892138344205698, -0.11631781868420082]
gcm.σ2 = [0.5361926759265381]
gcm.ρ = [1.0]


In [4]:
@show loglikelihood!(gcm, false, false);

loglikelihood!(gcm, false, false) = -2194.8262499954526


## Using geepack

In [5]:
using RCall
R"""
library("gcmr")
library("geepack")
data("epilepsy", package = "gcmr")
gee.ar1 <- geeglm(counts ~ 1 + visit + trt + visit:trt,
data = epilepsy, id = id, family = poisson,
corstr = "ar1")

summary(gee.ar1)
"""

└ @ RCall /Users/sarahji/.julia/packages/RCall/iMDW2/src/io.jl:160
└ @ RCall /Users/sarahji/.julia/packages/RCall/iMDW2/src/io.jl:160


RObject{VecSxp}

Call:
geeglm(formula = counts ~ 1 + visit + trt + visit:trt, family = poisson, 
    data = epilepsy, id = id, corstr = "ar1")

 Coefficients:
            Estimate  Std.err    Wald Pr(>|W|)    
(Intercept)  3.38922  0.16199 437.726   <2e-16 ***
visit       -1.22996  0.11387 116.673   <2e-16 ***
trt          0.01628  0.21150   0.006    0.939    
visit:trt   -0.13162  0.26713   0.243    0.622    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation structure = ar1 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)    19.93   8.939
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha   0.8924 0.03863
Number of clusters:   59  Maximum cluster size: 5 


## Using gcmr

In [6]:
using RCall

R"""
data("epilepsy", package = "gcmr")
mod.ar <- gcmr(counts ~ 1 + visit + trt + visit:trt,
 data = epilepsy, marginal = poisson.marg(link = "log"),
 cormat = cluster.cormat(id, "ar1"))
summary(mod.ar)
"""

RObject{VecSxp}

Call:
gcmr(formula = counts ~ 1 + visit + trt + visit:trt, data = epilepsy, 
    marginal = poisson.marg(link = "log"), cormat = cluster.cormat(id, 
        "ar1"))


Coefficients marginal model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   3.2260     0.0362   89.22  < 2e-16 ***
visit        -1.0866     0.0386  -28.19  < 2e-16 ***
trt          -0.0143     0.0515   -0.28     0.78    
visit:trt    -0.5391     0.0682   -7.91  2.6e-15 ***

Coefficients Gaussian copula:
    Estimate Std. Error z value Pr(>|z|)    
ar1   0.4632     0.0196    23.6   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

log likelihood = 1367.7,  AIC = 2745


# Example 2: Longitudinal Logistic 

In [None]:
df = CSV.read("respiratory_geepack.csv", DataFrame)
groups = unique(df[!, :id])
n, p, m = length(groups), 1, 2
d = Bernoulli()
link = LogitLink()
D = typeof(d)
Link = typeof(link)
T = Float64
gcs = Vector{GLMCopulaARObs{T, D, Link}}(undef, n)

for (i, grp) in enumerate(groups)
    gidx = df[!, :id] .== grp
    ni = count(gidx)
    y = Float64.(df[gidx, :outcome])
    X1 = Float64.(df[gidx, :center])
    X2 = Float64.(df[gidx, :age])
    X3 = Float64.(df[gidx, :baseline])
    X = [ones(ni, 1) X1 X2 X3]
    gcs[i] = GLMCopulaARObs(y, X, d, link)
end
gcm = GLMCopulaARModel(gcs);

# initialize β and τ from least square solution
@info "Initial point:"
@time initialize_model!(gcm);
@show gcm.β
@show gcm.σ2
@show gcm.ρ

@show loglikelihood!(gcm, true, false)

@time GLMCopula.fit!(gcm, IpoptSolver(print_level = 5, max_iter = 100, tol = 10^-5, limited_memory_max_history = 20, hessian_approximation = "limited-memory"))

┌ Info: Initial point:
└ @ Main In[7]:24


initializing β using Newton's Algorithm under Independence Assumption


In [None]:
@show gcm.β
@show gcm.σ2
@show gcm.ρ;

In [None]:
@show loglikelihood!(gcm, false, false);

## Using geepack

In [None]:
R"""
gee.ar1 <- geeglm(outcome ~ center + age + baseline, data=respiratory, id=id,
family=binomial(), corstr="ar1")
summary(gee.ar1)
"""

## Using gcmr

In [None]:
R"""
data(respiratory, package="geepack")
newdata <- respiratory[order(respiratory$id),]

mod.ar <- gcmr(outcome ~ center + age + baseline,
 data = newdata, marginal = binomial.marg(link = "logit"),
 cormat = cluster.cormat(id, "ar1"))
summary(mod.ar)
"""