# NB(p = 0.9, r = 10) with CS rho = 0.5, sigma2 = 0.5

For NB Base with $p = 0.9, r = 10$, with the correlation $\rho = 0.5, \sigma^2 = 0.5,$  

The theoretical correlation is a function of $d_i, \rho, \sigma^2, kt$

$$Corr(Y_k, Y_l)
 = \frac{\rho * \sigma^2}{1 + \frac{1}{2} * \sigma^2 * kt + \frac{1}{2} * \sigma^2 (d_i - 1)}$$

We have $p = 0.9, r = 10$ thus we have kurtosis $kt = kt(r, p) = 4.41.$

Let's see what happens to the theoretical/empirical correlations when we fix $\rho = 0.5, \sigma^2 = 0.5, kt = 4.41$ and range cluster sizes $d_i$

$$Corr(Y_k, Y_l)
 = \frac{0.25}{1 + \frac{1}{4} *  4.41  + \frac{1}{4} * (d_i - 1)}$$


# We will show: 
    1) Simulating under QC model, the theoretical and empirical correlation IS a function of cluster sizes di

    2) Simulating under GLMM, the theoretical and empirical correlation is NOT function of cluster sizes di




## TOC:

# di = 2
* [CS di = 2](#ex0)
* [simulate under GLMM CS di = 2](#ex0g)

# di = 5
* [CS di = 5](#ex1)
* [simulate under GLMM CS di = 5](#ex1g)

# di = 10 
* [CS di = 10](#ex2)
* [simulate under GLMM CS di = 10](#ex2g)

# di = 25
* [CS di = 25](#ex3)
* [simulate under GLMM CS di = 25](#ex3g)

# Comparisons
* [Theoretical vs. Empirical Correlation Simulated under QC](#ex4)
* [Empirical Correlation Simulated under GLMM](#ex4g)

In [1]:
using GLMCopula, DelimitedFiles, LinearAlgebra, Random, GLM, MixedModels, CategoricalArrays
using Random, Roots, SpecialFunctions, StatsBase
using DataFrames, DelimitedFiles, Statistics, ToeplitzMatrices
import StatsBase: sem

In [2]:
function get_V_CS(ρ, n)
    vec = zeros(n)
    vec[1] = 1.0
    for i in 2:n
        vec[i] = ρ
    end
    V = ToeplitzMatrices.SymmetricToeplitz(vec)
    V
end

get_V_CS (generic function with 1 method)

In [3]:
# true parameter values
p = [0.9]
βtrue = [log(9)]
rtrue = 10.0
σ2true = [0.5]
ρtrue = [0.5]

samplesize = 100000 # number of sampling units

d = NegativeBinomial()
link = LogLink()
D = typeof(d)
Link = typeof(link)
T = Float64

Float64

# Kurtosis of each Bernoulli(0.5) base distribution is 1.0

We have $p = 0.9, r = 10$ thus we have kurtosis $kt = kt(r, p) = 4.41.$

Let's see what happens to the theoretical/empirical correlations when we fix $\rho = 0.5, \sigma^2 = 0.5, kt = 4.41$ and range cluster sizes $d_i$

In [4]:
d = NegativeBinomial(rtrue, p[1])
μ, σ², sk, kt = mean(d), var(d), skewness(d), kurtosis(d, false)

(1.111111111111111, 1.2345679012345676, 1.1000000000000003, 4.41)

## CS di = 2 <a class="anchor" id="ex0"></a>

$d_i = 2$

$p = 0.9, r = 10, \rho = 0.5, \sigma^2 = 0.5$

In [5]:
di = 2 # number of observations per cluster
V_CS = get_V_CS(ρtrue[1], di)

# true Gamma
Γ_CS = σ2true[1] * V_CS

2×2 Matrix{Float64}:
 0.5   0.25
 0.25  0.5

In [6]:
vecd = Vector{DiscreteUnivariateDistribution}(undef, di)
for i in 1:di
    vecd[i] = NegativeBinomial(rtrue, p[1])
end
nonmixed_multivariate_dist = NonMixedMultivariateDistribution(vecd, Γ_CS)

Random.seed!(12345)
@time Y_nsample = simulate_nobs_independent_vectors(nonmixed_multivariate_dist, samplesize)

gcs = Vector{NBCopulaCSObs{T, D, Link}}(undef, samplesize)
for i in 1:samplesize
    X = ones(di, 1)
    y = Float64.(Y_nsample[i])
    V = [Float64.(Matrix(I, di, di))]
    gcs[i] = NBCopulaCSObs(y, X, d, link)
end

# form model
gcm = NBCopulaCSModel(gcs);

N = length(gcm.data)
di = length(gcm.data[1].y)
Y_CS = zeros(N, di)
for j in 1:di
    Y_CS[:, j] = [gcm.data[i].y[j] for i in 1:N]
end
empirical_cor_2 = StatsBase.cor(Y_CS)

  1.353420 seconds (5.72 M allocations: 559.024 MiB, 6.84% gc time, 42.68% compilation time)


2×2 Matrix{Float64}:
 1.0       0.088303
 0.088303  1.0

In [7]:
theoretical_rho_2_kurtosis = (ρtrue[1] * σ2true[1]) / (1 + ((di/2) * σ2true[1]) + (0.5 * (kt - 1) * σ2true[1]))

0.10626992561105207

In [8]:
mean_empirical_cor_2 = mean(GLMCopula.offdiag(empirical_cor_2))

0.08830301023394084

## simulate under GLMM CS di = 2 <a class="anchor" id="ex0g"></a>

$d_i = 2$

$p = 0.9, r = 10, \rho = 0.5, \sigma^2 = 0.5$

In [9]:
function __get_distribution(dist::Type{D}, μ, r) where D <: UnivariateDistribution
    return dist(r, μ)
end

for i in 1:samplesize
    X = ones(di, 1)
    η = X * βtrue
    # generate mvn response
    mvn_d = MvNormal(η, Γ_CS)
    mvn_η = rand(mvn_d)
    μ = GLM.linkinv.(link, mvn_η)
    p_nb = rtrue ./ (μ .+ rtrue)
    y = Float64.(rand.(__get_distribution.(D, p_nb, rtrue)))
    gcs[i] = NBCopulaCSObs(y, X, d, link)
end

# form model
gcm = NBCopulaCSModel(gcs);

N = length(gcm.data)
Y_CS = zeros(N, di)
for j in 1:di
    Y_CS[:, j] = [gcm.data[i].y[j] for i in 1:N]
end
empirical_cor_2_GLMM = StatsBase.cor(Y_CS)

2×2 Matrix{Float64}:
 1.0       0.313654
 0.313654  1.0

In [10]:
mean_empirical_cor_2_GLMM = mean(GLMCopula.offdiag(empirical_cor_2_GLMM))

0.31365405505730504

## CS di = 5 <a class="anchor" id="ex1"></a>

$d_i = 5$

$p = 0.9, r = 10, \rho = 0.5, \sigma^2 = 0.5$

In [11]:
di = 5 # number of observations per cluster
V_CS = get_V_CS(ρtrue[1], di)

# true Gamma
Γ_CS = σ2true[1] * V_CS

5×5 Matrix{Float64}:
 0.5   0.25  0.25  0.25  0.25
 0.25  0.5   0.25  0.25  0.25
 0.25  0.25  0.5   0.25  0.25
 0.25  0.25  0.25  0.5   0.25
 0.25  0.25  0.25  0.25  0.5

In [12]:
vecd = Vector{DiscreteUnivariateDistribution}(undef, di)
for i in 1:di
    vecd[i] = NegativeBinomial(rtrue, p[1])
end
nonmixed_multivariate_dist = NonMixedMultivariateDistribution(vecd, Γ_CS)

Random.seed!(12345)
@time Y_nsample = simulate_nobs_independent_vectors(nonmixed_multivariate_dist, samplesize)

gcs = Vector{NBCopulaCSObs{T, D, Link}}(undef, samplesize)
for i in 1:samplesize
    X = ones(di, 1)
    y = Float64.(Y_nsample[i])
    V = [Float64.(Matrix(I, di, di))]
    gcs[i] = NBCopulaCSObs(y, X, d, link)
end

# form model
gcm = NBCopulaCSModel(gcs);

N = length(gcm.data)
di = length(gcm.data[1].y)
Y_CS = zeros(N, di)
for j in 1:di
    Y_CS[:, j] = [gcm.data[i].y[j] for i in 1:N]
end
empirical_cor_5 = StatsBase.cor(Y_CS)

  2.424956 seconds (9.00 M allocations: 1.121 GiB, 23.29% gc time)


5×5 Matrix{Float64}:
 1.0        0.0728463  0.0670433  0.073037   0.0691792
 0.0728463  1.0        0.0624476  0.0705222  0.0713074
 0.0670433  0.0624476  1.0        0.0714172  0.0771427
 0.073037   0.0705222  0.0714172  1.0        0.0724919
 0.0691792  0.0713074  0.0771427  0.0724919  1.0

In [13]:
theoretical_rho_5_kurtosis = (ρtrue[1] * σ2true[1]) / (1 + ((di/2) * σ2true[1]) + (0.5 * (kt - 1) * σ2true[1]))

0.08058017727639001

In [14]:
mean_empirical_cor_5 = mean(GLMCopula.offdiag(empirical_cor_5))

0.07074347659265902

## simulate under GLMM CS di = 5 <a class="anchor" id="ex1g"></a>

$d_i = 5$

$p = 0.9, r = 10, \rho = 0.5, \sigma^2 = 0.5$

In [15]:
for i in 1:samplesize
    X = ones(di, 1)
    η = X * βtrue
    # generate mvn response
    mvn_d = MvNormal(η, Γ_CS)
    mvn_η = rand(mvn_d)
    μ = GLM.linkinv.(link, mvn_η)
    p_nb = rtrue ./ (μ .+ rtrue)
    y = Float64.(rand.(__get_distribution.(D, p_nb, rtrue)))
    gcs[i] = NBCopulaCSObs(y, X, d, link)
end

# form model
gcm = NBCopulaCSModel(gcs);

N = length(gcm.data)
Y_CS = zeros(N, di)
for j in 1:di
    Y_CS[:, j] = [gcm.data[i].y[j] for i in 1:N]
end
empirical_cor_5_GLMM = StatsBase.cor(Y_CS)

5×5 Matrix{Float64}:
 1.0       0.316997  0.312304  0.312297  0.313186
 0.316997  1.0       0.309237  0.31543   0.316629
 0.312304  0.309237  1.0       0.315637  0.314888
 0.312297  0.31543   0.315637  1.0       0.312149
 0.313186  0.316629  0.314888  0.312149  1.0

In [16]:
mean_empirical_cor_5_GLMM = mean(GLMCopula.offdiag(empirical_cor_5_GLMM))

0.313875467123078

## CS di = 10 <a class="anchor" id="ex2"></a>

$d_i = 10$

$p = 0.9, r = 10, \rho = 0.5, \sigma^2 = 0.5$

In [17]:
di = 10 # number of observations per cluster
V_CS = get_V_CS(ρtrue[1], di)

# true Gamma
Γ_CS = σ2true[1] * V_CS

10×10 Matrix{Float64}:
 0.5   0.25  0.25  0.25  0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.5   0.25  0.25  0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.5   0.25  0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.5   0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.5   0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.5   0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25  0.5   0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25  0.25  0.5   0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25  0.25  0.25  0.5   0.25
 0.25  0.25  0.25  0.25  0.25  0.25  0.25  0.25  0.25  0.5

In [18]:
vecd = Vector{DiscreteUnivariateDistribution}(undef, di)
for i in 1:di
    vecd[i] = NegativeBinomial(rtrue, p[1])
end
nonmixed_multivariate_dist = NonMixedMultivariateDistribution(vecd, Γ_CS)

Random.seed!(12345)
@time Y_nsample = simulate_nobs_independent_vectors(nonmixed_multivariate_dist, samplesize)

gcs = Vector{NBCopulaCSObs{T, D, Link}}(undef, samplesize)
for i in 1:samplesize
    X = ones(di, 1)
    y = Float64.(Y_nsample[i])
    V = [Float64.(Matrix(I, di, di))]
    gcs[i] = NBCopulaCSObs(y, X, d, link)
end

# form model
gcm = NBCopulaCSModel(gcs);

N = length(gcm.data)
di = length(gcm.data[1].y)
Y_CS = zeros(N, di)
for j in 1:di
    Y_CS[:, j] = [gcm.data[i].y[j] for i in 1:N]
end
empirical_cor_10 = StatsBase.cor(Y_CS)

  4.691104 seconds (17.50 M allocations: 2.660 GiB, 18.19% gc time)


10×10 Matrix{Float64}:
 1.0        0.0551526  0.0477892  …  0.0484697  0.0574756  0.0507224
 0.0551526  1.0        0.0479644     0.0549867  0.0467314  0.053797
 0.0477892  0.0479644  1.0           0.0496512  0.0545935  0.054813
 0.0554153  0.0509814  0.0579918     0.0535626  0.0486271  0.0474945
 0.0519531  0.0502995  0.0534576     0.0525523  0.0599769  0.0554797
 0.0542896  0.0513325  0.0499887  …  0.0535237  0.0513801  0.0474423
 0.0436852  0.0544684  0.0493251     0.046236   0.0539935  0.0588734
 0.0484697  0.0549867  0.0496512     1.0        0.0516027  0.0536698
 0.0574756  0.0467314  0.0545935     0.0516027  1.0        0.0505316
 0.0507224  0.053797   0.054813      0.0536698  0.0505316  1.0

In [19]:
theoretical_rho_10_kurtosis = (ρtrue[1] * σ2true[1]) / (1 + ((di/2) * σ2true[1]) + (0.5 * (kt - 1) * σ2true[1]))

0.05743825387708214

In [20]:
mean_empirical_cor_10 = mean(GLMCopula.offdiag(empirical_cor_10))

0.052200486723412855

## simulate under GLMM CS di = 10 <a class="anchor" id="ex2g"></a>

$d_i = 10$

$p = 0.9, r = 10, \rho = 0.5, \sigma^2 = 0.5$

In [21]:
for i in 1:samplesize
    X = ones(di, 1)
    η = X * βtrue
    # generate mvn response
    mvn_d = MvNormal(η, Γ_CS)
    mvn_η = rand(mvn_d)
    μ = GLM.linkinv.(link, mvn_η)
    p_nb = rtrue ./ (μ .+ rtrue)
    y = Float64.(rand.(__get_distribution.(D, p_nb, rtrue)))
    gcs[i] = NBCopulaCSObs(y, X, d, link)
end

# form model
gcm = NBCopulaCSModel(gcs);

N = length(gcm.data)
Y_CS = zeros(N, di)
for j in 1:di
    Y_CS[:, j] = [gcm.data[i].y[j] for i in 1:N]
end
empirical_cor_10_GLMM = StatsBase.cor(Y_CS)

10×10 Matrix{Float64}:
 1.0       0.31776   0.310125  0.313824  …  0.3151    0.310703  0.313126
 0.31776   1.0       0.316531  0.315146     0.316181  0.311838  0.313051
 0.310125  0.316531  1.0       0.314743     0.314154  0.311302  0.315465
 0.313824  0.315146  0.314743  1.0          0.307315  0.310697  0.314183
 0.312649  0.313211  0.321329  0.31538      0.314345  0.309248  0.314442
 0.316495  0.321222  0.313878  0.316827  …  0.311271  0.308824  0.312258
 0.311514  0.316665  0.31055   0.308869     0.311833  0.313051  0.313336
 0.3151    0.316181  0.314154  0.307315     1.0       0.309812  0.315264
 0.310703  0.311838  0.311302  0.310697     0.309812  1.0       0.309723
 0.313126  0.313051  0.315465  0.314183     0.315264  0.309723  1.0

In [22]:
mean_empirical_cor_10_GLMM = mean(GLMCopula.offdiag(empirical_cor_10_GLMM))

0.31302712585269177

## CS di = 25 <a class="anchor" id="ex3"></a>

$d_i = 25$

$p = 0.9, r = 10, \rho = 0.5, \sigma^2 = 0.5$

In [23]:
di = 25 # number of observations per cluster
V_CS = get_V_CS(ρtrue[1], di)

# true Gamma
Γ_CS = σ2true[1] * V_CS

25×25 Matrix{Float64}:
 0.5   0.25  0.25  0.25  0.25  0.25  …  0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.5   0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.5   0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.5   0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.5   0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.5   …  0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25  …  0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25  0.25  0.25     0.25  0.25  0.25  0.25  0.25  0.25
 0

In [24]:
vecd = Vector{DiscreteUnivariateDistribution}(undef, di)
for i in 1:di
    vecd[i] = NegativeBinomial(rtrue, p[1])
end
nonmixed_multivariate_dist = NonMixedMultivariateDistribution(vecd, Γ_CS)

Random.seed!(12345)
@time Y_nsample = simulate_nobs_independent_vectors(nonmixed_multivariate_dist, samplesize)

gcs = Vector{NBCopulaCSObs{T, D, Link}}(undef, samplesize)
for i in 1:samplesize
    X = ones(di, 1)
    y = Float64.(Y_nsample[i])
    V = [Float64.(Matrix(I, di, di))]
    gcs[i] = NBCopulaCSObs(y, X, d, link)
end

# form model
gcm = NBCopulaCSModel(gcs);

N = length(gcm.data)
di = length(gcm.data[1].y)
Y_CS = zeros(N, di)
for j in 1:di
    Y_CS[:, j] = [gcm.data[i].y[j] for i in 1:N]
end
empirical_cor_25 = StatsBase.cor(Y_CS)

 15.452485 seconds (43.00 M allocations: 13.748 GiB, 18.36% gc time)


25×25 Matrix{Float64}:
 1.0        0.0298764  0.0292273  …  0.0302369  0.0322874  0.032105
 0.0298764  1.0        0.0266843     0.0272483  0.0289992  0.0357406
 0.0292273  0.0266843  1.0           0.0242882  0.0320809  0.0336612
 0.028722   0.0292228  0.0273056     0.0260612  0.0310376  0.0273011
 0.0328596  0.0304534  0.0272602     0.0298686  0.0277192  0.031977
 0.0292227  0.0268602  0.0277422  …  0.033642   0.0277322  0.0277135
 0.0214314  0.0314837  0.0285073     0.0239892  0.0298542  0.0259005
 0.0295384  0.030158   0.0258541     0.0297123  0.0310755  0.0273671
 0.0314814  0.0296167  0.0271021     0.0325716  0.0292358  0.0320776
 0.0280264  0.0265325  0.0301113     0.0268812  0.0283984  0.0236278
 0.0264516  0.027906   0.0275057  …  0.0278934  0.030911   0.0271946
 0.0300285  0.0286945  0.0326629     0.0279797  0.0318865  0.0216332
 0.0291758  0.0290099  0.0287786     0.0276509  0.032077   0.0328438
 0.03431    0.0315139  0.0278055     0.0342312  0.031295   0.0265904
 0.0272577  0

In [25]:
theoretical_rho_25_kurtosis = (ρtrue[1] * σ2true[1]) / (1 + ((di/2) * σ2true[1]) + (0.5 * (kt - 1) * σ2true[1]))

0.030854674483184207

In [26]:
mean_empirical_cor_25 = mean(GLMCopula.offdiag(empirical_cor_25))

0.029521935324674318

## simulate under GLMM CS di = 25 <a class="anchor" id="ex3g"></a>

$d_i = 25$

$p = 0.9, r = 10, \rho = 0.5, \sigma^2 = 0.5$

In [27]:
for i in 1:samplesize
    X = ones(di, 1)
    η = X * βtrue
    # generate mvn response
    mvn_d = MvNormal(η, Γ_CS)
    mvn_η = rand(mvn_d)
    μ = GLM.linkinv.(link, mvn_η)
    p_nb = rtrue ./ (μ .+ rtrue)
    y = Float64.(rand.(__get_distribution.(D, p_nb, rtrue)))
    gcs[i] = NBCopulaCSObs(y, X, d, link)
end

# form model
gcm = NBCopulaCSModel(gcs);

N = length(gcm.data)
Y_CS = zeros(N, di)
for j in 1:di
    Y_CS[:, j] = [gcm.data[i].y[j] for i in 1:N]
end
empirical_cor_25_GLMM = StatsBase.cor(Y_CS)

25×25 Matrix{Float64}:
 1.0       0.324409  0.317535  0.315106  …  0.318859  0.324065  0.324601
 0.324409  1.0       0.311454  0.319595     0.315314  0.319755  0.31659
 0.317535  0.311454  1.0       0.311257     0.309234  0.312305  0.309873
 0.315106  0.319595  0.311257  1.0          0.318159  0.31409   0.313859
 0.322652  0.317657  0.316902  0.312022     0.313448  0.323842  0.32008
 0.322046  0.317475  0.316123  0.31822   …  0.313553  0.318009  0.310367
 0.319945  0.319562  0.315556  0.316649     0.319121  0.321424  0.314637
 0.319613  0.312075  0.32059   0.314015     0.313148  0.316303  0.315333
 0.320392  0.317575  0.310526  0.312756     0.320204  0.321022  0.315008
 0.315685  0.313282  0.318589  0.309326     0.312519  0.311343  0.3158
 0.31734   0.313012  0.309474  0.313241  …  0.310366  0.312093  0.315558
 0.312953  0.311475  0.319058  0.309613     0.311193  0.317445  0.31995
 0.322492  0.311532  0.31603   0.319197     0.322528  0.317778  0.321519
 0.323695  0.317291  0.311802  0.

In [28]:
mean_empirical_cor_25_GLMM = mean(GLMCopula.offdiag(empirical_cor_25_GLMM))

0.31603224055480267

## Comparisons 

##  1. Theoretical vs. Empirical Correlation simulated under QC <a class="anchor" id="ex4"></a>

### Takeaway: Quasi-Copula correlation IS a function of the di, kt, $\rho$ and $\sigma^2$


The theoretical correlation is a function of $d_i, \rho, \sigma^2, kt$

$$Corr(Y_k, Y_l)
 = \frac{\rho * \sigma^2}{1 + \frac{1}{2} * \sigma^2 * kt + \frac{1}{2} * \sigma^2 (d_i - 1)}$$

We have $p = 0.9, r = 10$ thus we have kurtosis $kt = kt(r, p) = 4.41.$

Let's see what happens to the theoretical/empirical correlations when we fix $\rho = 0.5, \sigma^2 = 0.5, kt = 4.41$ and range cluster sizes $d_i$

$$Corr(Y_k, Y_l)
 = \frac{0.25}{1 + \frac{1}{4} *  4.41  + \frac{1}{4} * (d_i - 1)}$$



In [29]:
# di = 2
[theoretical_rho_2_kurtosis mean_empirical_cor_2]

1×2 Matrix{Float64}:
 0.10627  0.088303

In [30]:
# di = 5
[theoretical_rho_5_kurtosis mean_empirical_cor_5]

1×2 Matrix{Float64}:
 0.0805802  0.0707435

In [31]:
# di = 10
[theoretical_rho_10_kurtosis mean_empirical_cor_10]

1×2 Matrix{Float64}:
 0.0574383  0.0522005

In [32]:
# di = 25
[theoretical_rho_25_kurtosis mean_empirical_cor_25]

1×2 Matrix{Float64}:
 0.0308547  0.0295219

##  2. Empirical Correlation simulated under GLMM<a class="anchor" id="ex4g"></a>

### Takeaway: GLMM correlation is NOT a function of the cluster size (Only a function of $\rho$ and $\sigma^2)$

In [33]:
# glmm correlation di = 2 
mean_empirical_cor_2_GLMM

0.31365405505730504

In [34]:
# glmm correlation di = 5 
mean_empirical_cor_5_GLMM

0.313875467123078

In [35]:
# glmm correlation di = 10
mean_empirical_cor_10_GLMM

0.31302712585269177

In [36]:
# glmm correlation di = 25
mean_empirical_cor_25_GLMM

0.31603224055480267