# Compare different ways of estimating covariance

In [1]:
using Revise
using Knockoffs
using Test
using LinearAlgebra
using Random
using StatsBase
using Statistics
using Distributions
using ToeplitzMatrices
using Plots
using CovarianceEstimation
gr(fmt=:png);

┌ Info: Precompiling Knockoffs [878bf26d-0c49-448a-9df5-b057c815d613]
└ @ Base loading.jl:1423


Our model is

$$X_{p \times 1} \sim N(\mathbf{0}_p, \Sigma)$$
where
$$
\Sigma = 
\begin{pmatrix}
    1 & \rho & \rho^2 & ... & \rho^p\\
    \rho & 1 & & ... & \rho^{p-1}\\
    \vdots & & & 1 & \vdots \\
    \rho^p & \cdots & & & 1
\end{pmatrix}
$$
Given $n$ iid samples from the above distribution, we will generate knockoffs according to 
$$(X, \tilde{X}) \sim N
\left(0, \ 
\begin{pmatrix}
    \Sigma & \Sigma - diag(s)\\
    \Sigma - diag(s) & \Sigma
\end{pmatrix}
\right)
$$
where $s$ is solved so that $0 \le s_j \forall j$ and $G$ is PSD (i.e. $2Σ - diag(s)$ is PSD)

In [13]:
# simulate data
Random.seed!(2022)
n = 100
p = 500
ρ = 0.4
Sigma = Matrix(SymmetricToeplitz(ρ.^(0:(p-1))))
L = cholesky(Sigma).L
X = randn(n, p) * L # var(X) = L var(N(0, 1)) L' = var(Σ)
true_mu = zeros(p);

## $p > n$ case

### LinearShrinkage via Ledoit Wolf with DiagonalUnequalVariance

This is the default method recommended for $p>n$ case, see https://mateuszbaran.github.io/CovarianceEstimation.jl/dev/man/methods/#Comparing-estimators

In [14]:
@time Σapprox = cov(LinearShrinkage(DiagonalUnequalVariance(), :lw), X);

  0.013680 seconds (33 allocations: 8.791 MiB)


### LinearShrinkage via schaffer-strimmer with DiagonalCommonVariance

This seems to give best MSE for various n/p combinations, as shown https://mateuszbaran.github.io/CovarianceEstimation.jl/dev/man/msecomp/#msecomp

In [15]:
@time Σapprox2 = cov(LinearShrinkage(DiagonalCommonVariance(), :ss), X);

  0.029732 seconds (39 allocations: 12.610 MiB)


In [17]:
# compare estimates to truth
@show norm(Sigma .- Σapprox)
@show norm(Sigma .- Σapprox2)
[vec(Sigma) vec(Σapprox) vec(Σapprox2)]

norm(Sigma .- Σapprox) = 13.672021076565523
norm(Sigma .- Σapprox2) = 13.30716999433807


250000×3 Matrix{Float64}:
 1.0           0.807852      0.976725
 0.4           0.0314039     0.0291557
 0.16          0.0214846     0.0199465
 0.064         0.00477416    0.00443238
 0.0256        0.00428901    0.00398196
 0.01024       0.00335206    0.00311209
 0.004096      0.00786648    0.00730333
 0.0016384     0.00759405    0.0070504
 0.00065536    0.0122099     0.0113358
 0.000262144  -0.00630362   -0.00585235
 0.000104858   0.000187246   0.000173841
 4.1943e-5    -0.0101748    -0.00944637
 1.67772e-5   -0.00774287   -0.00718856
 ⋮                          
 4.1943e-5    -0.00566428   -0.00525878
 0.000104858  -0.00690351   -0.00640929
 0.000262144   0.00579939    0.00538422
 0.00065536   -0.00173785   -0.00161344
 0.0016384    -0.00499857   -0.00464073
 0.004096      0.0031627     0.00293628
 0.01024       0.0101293     0.00940414
 0.0256        0.00296201    0.00274996
 0.064         0.00141277    0.00131163
 0.16          0.0184645     0.0171427
 0.4           0.0244281     0.

## $n > p$ case

In [57]:
# simulate data
Random.seed!(2022)
n = 1000
p = 500
ρ = 0.4
Sigma = Matrix(SymmetricToeplitz(ρ.^(0:(p-1))))
L = cholesky(Sigma).L
X = randn(n, p) * L # var(X) = L var(N(0, 1)) L' = var(Σ)

# simulate data (this data shows Analytical Non-linear shrinkage can do poorly)
# Random.seed!(2022)
# n = 1000
# p = 500
# ρ = 0.4
# Sigma = (1-ρ)I + ρ .* ones((p, p))
# L = cholesky(Sigma).L
# X = randn(n, p) * L # var(X) = L var(N(0, 1)) L' = var(Σ)
# true_mu = zeros(p);

### LinearShrinkage via Ledoit Wolf with DiagonalUnequalVariance

In [58]:
@time Σapprox = cov(LinearShrinkage(DiagonalUnequalVariance(), :lw), X);

  0.036899 seconds (33 allocations: 19.091 MiB)


### LinearShrinkage via schaffer-strimmer with DiagonalCommonVariance
This seems to give best MSE for $p>n$ case (see above). Lets see how it performs in $n>p$ case

In [59]:
@time Σapprox2 = cov(LinearShrinkage(DiagonalCommonVariance(), :ss), X);

  0.073585 seconds (39 allocations: 22.910 MiB)


### Analytical Non-linear shrinkage 

This sometimes perform worse and is slow in general

In [60]:
@time Σapprox3 = cov(AnalyticalNonlinearShrinkage(), X);

  0.150961 seconds (123 allocations: 28.860 MiB)


In [61]:
# compare estimates to truth
@show norm(Sigma .- Σapprox)
@show norm(Sigma .- Σapprox2)
@show norm(Sigma .- Σapprox3)
[vec(Sigma) vec(Σapprox) vec(Σapprox2) vec(Σapprox3)]

norm(Sigma .- Σapprox) = 10.419794517086471
norm(Sigma .- Σapprox2) = 10.375954445137257
norm(Sigma .- Σapprox3) = 10.305885983875836


250000×4 Matrix{Float64}:
 1.0           1.09108      1.04167      1.04947
 0.4           0.18855      0.187167     0.204735
 0.16          0.065677     0.0651952    0.0702049
 0.064         0.0134579    0.0133592    0.0133384
 0.0256        0.0135339    0.0134346    0.0133853
 0.01024       0.0171558    0.01703      0.0180576
 0.004096     -0.00358494  -0.00355865  -0.00215183
 0.0016384    -0.0257009   -0.0255124   -0.0243957
 0.00065536   -0.0287337   -0.0285229   -0.0301445
 0.000262144  -0.0153974   -0.0152844   -0.0114235
 0.000104858  -0.00758369  -0.00752806  -0.00540316
 4.1943e-5    -0.0191237   -0.0189834   -0.0205698
 1.67772e-5    0.00725894   0.0072057    0.00761886
 ⋮                                      
 4.1943e-5     0.0170714    0.0169462    0.0191543
 0.000104858   0.00506735   0.00503019   0.00385539
 0.000262144   0.00256931   0.00255046   0.00274609
 0.00065536   -0.0024898   -0.00247154  -0.00410855
 0.0016384     0.0142631    0.0141585    0.0133332
 0.004096   

## Conclusion: 

`LinearShrinkage(DiagonalCommonVariance(), :ss)` seems to perform best in general for both $p>n$ and $n>p$ case, and is pretty fast in general