# Group Knockoffs

This tutorial generates group (model-X) knockoffs, which is useful when predictors are highly correlated. The methodology is described in the following paper

> Dai R, Barber R. The knockoff filter for FDR control in group-sparse and multitask regression. InInternational conference on machine learning 2016 Jun 11 (pp. 1851-1859). PMLR.


!!! note

    In the original paper, Dai and Barber only describes how to construct equi-correlated group knockoffs, but the same idea can be generalized to SDP/MVR/max-entropy group knockoffs, which we also implement here. 

In [1]:
# load packages for this tutorial
using Revise
using Knockoffs
using LinearAlgebra
using Random
using StatsBase
using Statistics
using ToeplitzMatrices

# some helper functions to compute power and empirical FDR
function TP(correct_groups, signif_groups)
    return length(signif_groups ∩ correct_groups) / length(correct_groups)
end
function TP(correct_groups, β̂, groups)
    signif_groups = get_signif_groups(β̂, groups)
    return TP(correct_groups, signif_groups)
end
function FDR(correct_groups, signif_groups)
    FP = length(signif_groups) - length(signif_groups ∩ correct_groups) # number of false positives
    FDR = FP / max(1, length(signif_groups))
    return FDR
end
function FDR(correct_groups, β̂, groups)
    signif_groups = get_signif_groups(β̂, groups)
    return FDR(correct_groups, signif_groups)
end
function get_signif_groups(β, groups)
    correct_groups = Int[]
    for i in findall(!iszero, β)
        g = groups[i]
        g ∈ correct_groups || push!(correct_groups, g)
    end
    return correct_groups
end

┌ Info: Precompiling Knockoffs [878bf26d-0c49-448a-9df5-b057c815d613]
└ @ Base loading.jl:1423


get_signif_groups (generic function with 1 method)

In [62]:
# simulate data
Random.seed!(111)
n = 1000 # sample size
p = 100  # number of covariates
k = 10   # number of true predictors
x = randn(p, p)
# Σ = x' * x
Σ = Matrix(SymmetricToeplitz(0.9.^(0:(p-1)))) # true covariance matrix
groupsizes = [5 for i in 1:20] # each group has 5 variables
groups = vcat([i*ones(g) for (i, g) in enumerate(groupsizes)]...) |> Vector{Int}
true_mu = zeros(p)
L = cholesky(Σ).L
X = randn(n, p) * L
zscore!(X, mean(X, dims=1), std(X, dims=1)); # standardize columns of X

In [63]:
@time me = modelX_gaussian_group_knockoffs(X, groups, :maxent_full, Σ, true_mu, verbose=true);

Iter 1: δ = 0.11197644138259803
Iter 2: δ = 0.05324651165827554
Iter 3: δ = 0.01831412107058905
Iter 4: δ = 0.006396670854896258
Iter 5: δ = 0.002931853929055616
Iter 6: δ = 0.0015124867463455205
Iter 7: δ = 0.0010012587363727172
Iter 8: δ = 0.0006885989929523604
Iter 9: δ = 0.0005190031049111778
Iter 10: δ = 0.0003735901333316354
Iter 11: δ = 0.00025726463422049724
Iter 12: δ = 0.00017153127746927387
Iter 13: δ = 0.00011225218311612953
Iter 14: δ = 7.26177667243821e-5
Iter 15: δ = 4.657438371206907e-5
Iter 16: δ = 3.0066672694224122e-5
Iter 17: δ = 1.930310117850633e-5
Iter 18: δ = 1.230341943703583e-5
Iter 19: δ = 7.790251213921796e-6
Iter 20: δ = 4.9029393265496e-6
Iter 21: δ = 3.069199992039728e-6
Iter 22: δ = 1.912357822679633e-6
Iter 23: δ = 1.1868845769189051e-6
Iter 24: δ = 7.342545552654127e-7
  0.077592 seconds (1.12 k allocations: 5.645 MiB)


In [64]:
me.S

100×100 Matrix{Float64}:
 0.202671    0.0751677   0.0272938  …  0.0        0.0         0.0
 0.0751677   0.114027    0.0450499     0.0        0.0         0.0
 0.0272938   0.0450499   0.115214      0.0        0.0         0.0
 0.00853296  0.0158989   0.0472812     0.0        0.0         0.0
 0.00251566  0.00511727  0.0166264     0.0        0.0         0.0
 0.0         0.0         0.0        …  0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0        …  0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 ⋮                                  ⋱                         
 0.0         0.0         0.0           0.0        0.0 

In [56]:
me.S

100×100 Matrix{Float64}:
 0.202671    0.0751677   0.0272938  …  0.0        0.0         0.0
 0.0751677   0.114027    0.0450499     0.0        0.0         0.0
 0.0272938   0.0450499   0.115214      0.0        0.0         0.0
 0.00853296  0.0158989   0.0472812     0.0        0.0         0.0
 0.00251566  0.00511727  0.0166264     0.0        0.0         0.0
 0.0         0.0         0.0        …  0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0        …  0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 0.0         0.0         0.0           0.0        0.0         0.0
 ⋮                                  ⋱                         
 0.0         0.0         0.0           0.0        0.0 

In [93]:
m = 1
eigmin((m+1)/m * Σ - me.S)

0.05001139324827105

In [60]:
ko_equi.S

100×100 BlockDiagonals.BlockDiagonal{Float64, Matrix{Float64}}:
 0.126346   0.113712   0.10234   0.0921064  …  0.0       0.0        0.0
 0.113712   0.126346   0.113712  0.10234       0.0       0.0        0.0
 0.10234    0.113712   0.126346  0.113712      0.0       0.0        0.0
 0.0921064  0.10234    0.113712  0.126346      0.0       0.0        0.0
 0.0828958  0.0921064  0.10234   0.113712      0.0       0.0        0.0
 0.0        0.0        0.0       0.0        …  0.0       0.0        0.0
 0.0        0.0        0.0       0.0           0.0       0.0        0.0
 0.0        0.0        0.0       0.0           0.0       0.0        0.0
 0.0        0.0        0.0       0.0           0.0       0.0        0.0
 0.0        0.0        0.0       0.0           0.0       0.0        0.0
 0.0        0.0        0.0       0.0        …  0.0       0.0        0.0
 0.0        0.0        0.0       0.0           0.0       0.0        0.0
 0.0        0.0        0.0       0.0           0.0       0.0        0.0


In [62]:
m = 1
eigmin((m+1)/m * Σ - ko_equi.S)

5.103679100812459e-16

# Equi-correlated group knockoffs

+ Given $p \times p$ positive definite matrix $\Sigma$, partition the $p$ features into $m$ groups $G_1,...,G_m$. We want to optimize the following problem
```math
\begin{aligned}
    \min_{S} & \ Tr(|\Sigma - S|)\\
    \text{such that } & S \succeq 0 \text{ and } 2\Sigma - S \succeq 0.
\end{aligned}
```
+ Here $S$ is a group-block-diagonal matrix of the form $S = diag(S_1,...,S_m)$ where each $S_j$ is a positive definite matrix that has dimension $|G_j| \times |G_j|$
+ The equi-correlated idea proposed in [Barber and Dai](https://proceedings.mlr.press/v48/daia16.html) is to let $S_j = \gamma \Sigma_{(G_j, G_j)}$ where $\Sigma_{(G_j, G_j)}$ is the block of $\Sigma$ containing variables in the $j$th group. Thus, instead of optimizing over all variables in $S$, we optimize a scalar $\gamma$. Conveniently, there a simple closed form solution.

First, let's simulate data and generate equi-correlated knockoffs. Our true covariance matrix looks like

```math
\begin{aligned}
\Sigma = 
\begin{pmatrix}
    1 & \rho & \rho^2 & ... & \rho^p\\
    \rho & 1 & & ... & \rho^{p-1}\\
    \vdots & & & 1 & \vdots \\
    \rho^p & \cdots & & & 1
\end{pmatrix}, \quad \rho = 0.9
\end{aligned}
```

Because variables are highly correlated with its neighbors ($\rho = 0.9$), it becomes difficult to distinguish which variables among a group are truly causal. Thus, group knockoffs which test whether a *group* of variables have any signal should have better power than standard (single-variable) knockoffs. 

For simplicity, let simulate data where every 5 variables form a group:

In [48]:
# simulate data
Random.seed!(2022)
n = 1000 # sample size
p = 100  # number of covariates
k = 10   # number of true predictors
Σ = Matrix(SymmetricToeplitz(0.9.^(0:(p-1)))) # true covariance matrix
groupsizes = [5 for i in 1:20] # each group has 5 variables
groups = vcat([i*ones(g) for (i, g) in enumerate(groupsizes)]...) |> Vector{Int}
true_mu = zeros(p)
L = cholesky(Σ).L
X = randn(n, p) * L
zscore!(X, mean(X, dims=1), std(X, dims=1)); # standardize columns of X

Generate group knockoffs as such:

In [49]:
ko_equi = modelX_gaussian_group_knockoffs(X, groups, :equi, Σ, true_mu);

Lets do a sanity check: is $2\Sigma - S$ positive semi-definite?

In [50]:
# compute minimum eigenvalues of 2Σ - S
eigmin(2ko_equi.Σ - ko_equi.S)

5.103679100812459e-16

The min eigenvalue is $\approx 0$ up to numerical precision, so the knockoff structure indeed satisfies the PSD constraint. 

## SDP group knockoffs


+ This extends the equi-correlated construction of [Barber and Dai](https://proceedings.mlr.press/v48/daia16.html)
+ The idea is to choose $S_j = \gamma_j \Sigma_{(G_j, G_j)}$. Note that the difference with the equi-correlated construction is that $\gamma$ is potentially allowed to vary in each group. If $\Sigma$ has unit variance, we optimize the following problem

```math
\begin{aligned}
    \min_{\gamma_1,...,\gamma_m} & Tr(|\Sigma - S|)\\
    \text{such that } & 0 \le \gamma_j \le 1 \text{ for all } j \text{ and }\\
    & 2\Sigma - 
    \begin{pmatrix}
        \gamma_1\Sigma_{(G_1, G_1)} & & 0\\
        & \ddots & \\
        0 & & \gamma_m \Sigma_{(G_m, G_m)}
    \end{pmatrix} \succeq 0
\end{aligned}
```

Now lets generate SDP group knockoffs

In [5]:
@time ko_sdp = modelX_gaussian_group_knockoffs(X, groups, :sdp, Σ, true_mu);

 34.395505 seconds (116.34 M allocations: 7.186 GiB, 4.54% gc time, 95.69% compilation time)


We can also do a sanity check to see if the SDP knockoffs satisfy the PSD constraint

In [6]:
# compute minimum eigenvalues of 2Σ - S
eigmin(2ko_sdp.Σ - ko_sdp.S)

-2.873299238537145e-8

## Second order group knockoffs

In practice, we often do not have the true covariance matrix $\Sigma$ and the true means $\mu$. In that case, we can generate second order group knockoffs via the 3 argument function

In [7]:
ko_equi = modelX_gaussian_group_knockoffs(X, groups, :equi);

This will estimate the covariance matrix, see documentation API for more details. 

## Power and FDR comparison

Lets compare empirical power and FDR for equi and SDP group knockoffs when the targer FDR is 10%.

In [16]:
target_fdr = 0.1
equi_powers, equi_fdrs, equi_times = Float64[], Float64[], Float64[]
sdp_powers, sdp_fdrs, sdp_times = Float64[], Float64[], Float64[]

Random.seed!(2022)
for sim in 1:10
    # simulate y
    βtrue = zeros(p)
    βtrue[1:k] .= rand(-1:2:1, k) .* 0.1
    shuffle!(βtrue)
    correct_groups = get_signif_groups(βtrue, groups)
    ϵ = randn(n)
    y = X * βtrue + ϵ;

    # equi-group knockoffs
    t = @elapsed ko_filter = fit_lasso(y, X, method=:equi, groups=groups)
    idx = findfirst(x -> x == target_fdr, ko_filter.fdr_target)
    power = round(TP(correct_groups, ko_filter.βs[idx], groups), digits=3)
    fdr = round(FDR(correct_groups, ko_filter.βs[idx], groups), digits=3)
    println("Simulation $sim equi-group knockoffs power = $power, FDR = $fdr, time=$t")
    push!(equi_powers, power)
    push!(equi_fdrs, fdr)
    push!(equi_times, t)
    
    # SDP-group knockoffs
    t = @elapsed ko_filter = fit_lasso(y, X, method=:sdp, groups=groups)
    power = round(TP(correct_groups, ko_filter.βs[idx], groups), digits=3)
    fdr = round(FDR(correct_groups, ko_filter.βs[idx], groups), digits=3)
    println("Simulation $sim SDP-group knockoffs power = $power, FDR = $fdr, time=$t")
    push!(sdp_powers, power)
    push!(sdp_fdrs, fdr)
    push!(sdp_times, t)
end

println("\nEqui-correlated group knockoffs have average group power $(mean(equi_powers))")
println("Equi-correlated group knockoffs have average group FDR $(mean(equi_fdrs))");
println("Equi-correlated group knockoffs took average $(mean(equi_times)) seconds");

println("\nSDP group knockoffs have average group power $(mean(sdp_powers))")
println("SDP group knockoffs have average group FDR $(mean(sdp_fdrs))");
println("SDP group knockoffs took average $(mean(sdp_times)) seconds");

Simulation 1 equi-group knockoffs power = 0.125, FDR = 0.0, time=2.053749326
Simulation 1 SDP-group knockoffs power = 0.0, FDR = 0.0, time=6.45688248
Simulation 2 equi-group knockoffs power = 0.0, FDR = 0.0, time=1.777067223
Simulation 2 SDP-group knockoffs power = 0.125, FDR = 0.5, time=2.843801253
Simulation 3 equi-group knockoffs power = 0.333, FDR = 0.0, time=3.566960454
Simulation 3 SDP-group knockoffs power = 0.0, FDR = 0.0, time=2.373457417
Simulation 4 equi-group knockoffs power = 0.333, FDR = 0.0, time=1.670845043
Simulation 4 SDP-group knockoffs power = 0.333, FDR = 0.0, time=2.79045689
Simulation 5 equi-group knockoffs power = 0.25, FDR = 0.0, time=1.317231961
Simulation 5 SDP-group knockoffs power = 0.375, FDR = 0.0, time=6.14065572
Simulation 6 equi-group knockoffs power = 0.333, FDR = 0.0, time=1.737344252
Simulation 6 SDP-group knockoffs power = 0.667, FDR = 0.333, time=3.045115028
Simulation 7 equi-group knockoffs power = 0.125, FDR = 0.0, time=1.873825217
Simulation 7 

## Conclusion

+ Both equicorrelated and SDP group knockoffs control the group FDR to be below the target FDR level. 
+ SDP group knockoffs have slightly better power than equi-correlated group knockoffs
+ Equi-correlated knockoffs are ~2x faster to construct than group-SDP (for $p=100$ covariates and 20 groups). On a separate test with 200 groups and 5 features per group ($p = 1000$), SDP construction were ~45x slower. 
