# Lab 3 - Mateusz Markiewicz (298653)

## 1
Generate the design matrix $X_{500 \times 450}$ such that its elements are iid random variables from $N(0, \frac{1}{\sqrt{N}})$. Then generate the vector of the response variable according to the model
$Y = X \beta + \epsilon ,$ where $\epsilon \sim 2N(0,I)$, $\beta_i = 10$ for $i \in \{1, \cdots , k\}$ and $\beta_i = 0$ for $i \in \{k + 1, \cdots , 450\}$ and $k \in \{5, 20, 50\}$.

For 100 replications of the above experiments estimate the regression coefficients and/or identify important variables using:

i) least squares
    
ii) ridge regression and LASSO with the tuning parameters selected by cross-validation
    
iii) use knockoffs with ridge and LASSO to identify important variables while keeping FDR equal to 0.2.
    
iv) adaptive LASSO I:

first step: calculate weights using cross-validated LASSO (eliminate variables not selected by cross-validated LASSO)

second step: use weighted cross-validated LASSO
    
v) adaptive LASSO II: 

first step: calculate $\hat{\beta}$ using cross-validated LASSO
    
second step: estimate $\hat{\sigma} = \sqrt{\frac{RSS}{n-k}}$, where RSS is from the cross-validated LASSO and k is the number of variables selected by cross-validated LASSO 
    
third step: calculate weights $w_i = \frac{\hat{\sigma}}{|\hat{\beta_i}|}$
    
fourth step: use weighted LASSO with the tuning parameter $\lambda = \hat{\sigma} \varPhi^{-1}(1-\frac{\alpha}{2p})$

vi) adaptive SLOPE - as in point iv) but the in final stage use weighted SLOPE with BH sequence at FDR level 0.2.

vii) extra 5 points - adaptive Bayesian SLOPE at FDR=0.2.

a) For methods iii)-vii) estimate FDR and power.

b) For all methods apart from iii) estimate the mean square errors of the estimators of $\beta$ and $\mu = X\beta$.

In [1]:
library(glmnet);
library(SLOPE);
library(mvtnorm);

Loading required package: Matrix

Loaded glmnet 4.1-1

"package 'SLOPE' was built under R version 4.0.5"


In [2]:
n <- 500;
p <- 450;
k <- c(5, 20, 50);
signal_strength<-10;
l_k <- length(k);

In [3]:
X <- matrix(rnorm(n*p,0,1/sqrt(n)),n,p);

In [4]:
betas <- matrix(rep(0,l_k*p),p,l_k);
for (i in 1:l_k){
    betas[1:k[i],i]<-signal_strength;
}

In [5]:
Xb <- matrix(rep(0,l_k*n),n,l_k);
for (i in 1:l_k){
    Xb[,i]<-X%*%betas[,i];
}

In [6]:
X2 <- matrix(rnorm(n*p,0,1/sqrt(n)),n,p);
cX <- cbind(X, X2)

In [7]:
FDR <- function(betas_hat){
    FDR_res <- rep(0,l_k);
    for (i in 1:l_k){
        all_discoveries <- apply(betas_hat[i, , ] != 0, c(1), sum);
        all_discoveries <- pmax(all_discoveries,1);
        false_discoveries <- apply(betas_hat[i, , (k[i]+1):p] != 0, c(1), sum)
        FDR_res[i] <- mean(false_discoveries/all_discoveries);
    }
    return(FDR_res);
}

In [8]:
power <- function(betas_hat){
    power_res <- rep(0,l_k);
    for (i in 1:l_k){
        true_discoveries <- apply(betas_hat[i, , 1:k[i]] != 0, c(1), sum);
        power_res[i] <- mean(true_discoveries/k[i]);
    }
    return(power_res);
}

In [9]:
mse_betas <- function(betas_hat){
    mse_betas_res <- rep(0,l_k);
    for (i in 1:l_k){
        mse_betas_res[i] <-mean(apply(t(t(betas_hat[i,,]) - betas[,i])**2, c(1), sum));
    }
    return(mse_betas_res);
}

In [10]:
mse_mu <- function(betas_hat){
    mse_mu_res <- rep(0,l_k);
    for (i in 1:l_k){
        mse_mu_res[i] <- mean(apply(t(X%*%(t(betas_hat[i,,]) - betas[,i]))**2, c(1), sum));
    }
    return(mse_mu_res);
}

In [11]:
Knockoff_betas <- function(w, beta_hat, q=0.2){
    sorted <- sort(abs(w), decreasing=TRUE, index.return=TRUE);
    fd<-cumsum(w[sorted$ix]<0); 
    nd<-cumsum(w[sorted$ix]>0);
    fdr<-(fd+1)/nd;
    betas_knockoff <- rep(0,p);
    u1<-which(fdr<q);
    if (length(u1)>0){
        indopt<-max(u1);
        a1<-sorted$ix[1:indopt];
        a2<-which(w>0);
        a3<-intersect(a1,a2);
        betas_knockoff[a3]<-beta_hat[a3];
    }
    return(betas_knockoff);
}

In [12]:
reps <- 100;
betas_ols <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_ridge <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_lasso <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_kridge <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_klasso <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_adlasso <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_adlasso2 <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_adslope <- array(rep(0,l_k*reps*p),c(l_k,reps,p));

In [13]:
for (j in 1:reps){
    epsilon <- 2*rnorm(n);
    lambda_alasso <- qnorm(1-0.1/p);
    for (i in 1:l_k){
        Y <- Xb[,i] + epsilon;
        
        # ols
        obj <- lm(Y~X-1);
        betas_ols[i,j,] <- obj$coefficients;
        
        # ridge
        obj2 <- cv.glmnet(X, Y, alpha=0, intercept=FALSE, standardize=FALSE);
        betas_ridge_ <- coefficients(obj2, s='lambda.min')[2:(p+1),1];
        betas_ridge[i,j,] <- betas_ridge_;
        
        # lasso
        obj3 <- cv.glmnet(X, Y, alpha=1, intercept=FALSE, standardize=FALSE);
        betas_lasso_ <- coefficients(obj3, s='lambda.min')[2:(p+1),1];
        betas_lasso[i,j,] <- betas_lasso_;
        lasso_indxs <- which(abs(betas_lasso_)>0);

        # Knockoff Ridge
        obj4 <- cv.glmnet(cX, Y, alpha=0, intercept=FALSE, standardize=FALSE);
        betas_kridge_temp <- coefficients(obj4, s='lambda.min')[2:(2*p+1),1];
        w_ridge <- abs(betas_kridge_temp[1:p])-abs(betas_kridge_temp[(p+1):(2*p)]);
        betas_kridge[i,j,] <- Knockoff_betas(w_ridge, betas_ridge_);
        
        # Knockoff Lasso
        obj5 <- cv.glmnet(cX, Y, alpha=1, intercept=FALSE, standardize=FALSE);
        betas_klasso_temp <- coefficients(obj5, s='lambda.min')[2:(2*p+1),1];
        w_lasso <- abs(betas_klasso_temp[1:p])-abs(betas_klasso_temp[(p+1):(2*p)]);
        betas_klasso[i,j,] <- Knockoff_betas(w_lasso, betas_lasso_)
        
        # Adaptive Lasso I
        X_ADLasso <- X[,lasso_indxs];
        betas_adlasso_ <- betas_lasso_[lasso_indxs];
        W_ADLasso_ <- betas_adlasso_;
        X_ADLasso_temp <- sweep(X_ADLasso, 2, W_ADLasso_, '*');
        obj6 <- cv.glmnet(X_ADLasso_temp, Y, alpha=1, intercept=FALSE, standardize=FALSE);
        betas_adlasso_2 <- coefficients(obj6)[2:(length(lasso_indxs)+1)] * W_ADLasso_;
        betas_adlasso_3 <- rep(0,p);
        betas_adlasso_3[lasso_indxs] <- betas_adlasso_2;
        betas_adlasso[i,j,] <- betas_adlasso_3;
        
        # Adaptive Lasso II
        Lasso_RSS <- sum((Y- X%*%betas_lasso_)^2)
        X_ADLasso2 <- X[,lasso_indxs];
        betas_adlasso2_ <- betas_lasso_[lasso_indxs];
        sigma_lassoCV <- sqrt(Lasso_RSS/(n-length(lasso_indxs)));
        W_ADLasso_2<-abs(betas_adlasso2_)/sigma_lassoCV;
        X_ADLasso2_temp <- sweep(X_ADLasso2, 2, W_ADLasso_2, '*');
        obj7 <- glmnet(X_ADLasso2_temp, Y, intercept=FALSE, alpha=1, standardize=FALSE, lambda=sigma_lassoCV*lambda_alasso/n);
        betas_adlasso2_2 <- coefficients(obj7)[2:(length(lasso_indxs)+1)] * W_ADLasso_2;
        betas_adlasso2_3 <- rep(0,p);
        betas_adlasso2_3[lasso_indxs] <- betas_adlasso2_2;
        betas_adlasso2[i,j,] <- betas_adlasso2_3;
        
        # Adaptive SLOPE
        W_ADSlope_ <- abs(betas_lasso_ + 1e-6)/sigma_lassoCV;
        X_ADSlope_temp <- sweep(X, 2, W_ADSlope_, '*');
        obj8 <- SLOPE(X_ADSlope_temp, Y, q=0.2, alpha=1/n*sigma_lassoCV, lambda='bh', solver='admm', max_passes=100, scale='none');
        betas_adslope_ <- coefficients(obj8)[2:(p+1)] * W_ADSlope_;
        betas_adslope[i,j,] <- betas_adslope_;
    }
}

In [14]:
colnames1 <- c('K_Ridge','K_LASSO','ADLASSO','ADLASSO2','ADSLOPE')
power_df = data.frame(
    c1<-power(betas_kridge),
    c2<-power(betas_klasso),
    c3<-power(betas_adlasso),
    c4<-power(betas_adlasso2),
    c5<-power(betas_adslope)
);
colnames(power_df) <- colnames1;
rownames(power_df) <- k;

fdr_df = data.frame(
    c1<-FDR(betas_kridge),
    c2<-FDR(betas_klasso),
    c3<-FDR(betas_adlasso),
    c4<-FDR(betas_adlasso2),
    c5<-FDR(betas_adslope)
);
colnames(fdr_df) <- colnames1;
rownames(fdr_df) <- k;


colnames2 <- c('OLS', "Ridge", "LASSO",'ADLASSO','ADLASSO2','ADSLOPE')
betas_mse_df = data.frame(
    c1<-mse_betas(betas_ols),
    c2<-mse_betas(betas_ridge),
    c3<-mse_betas(betas_lasso),
    c4<-mse_betas(betas_adlasso),
    c5<-mse_betas(betas_adlasso2),
    c6<-mse_betas(betas_adslope)
);
colnames(betas_mse_df) <- colnames2;
rownames(betas_mse_df) <- k;

mu_mse_df = data.frame(
    c1<-mse_mu(betas_ols),
    c2<-mse_mu(betas_ridge),
    c3<-mse_mu(betas_lasso),
    c4<-mse_mu(betas_adlasso),
    c5<-mse_mu(betas_adlasso2),
    c6<-mse_mu(betas_adslope)
);
colnames(mu_mse_df) <- colnames2;
rownames(mu_mse_df) <- k;

In [15]:
print('Power:')
print(power_df)

[1] "Power:"
   K_Ridge K_LASSO ADLASSO ADLASSO2 ADSLOPE
5   0.3620  0.4640  0.9940   0.9640  0.9680
20  0.5005  0.9375  0.9970   0.9600  0.9745
50  0.5382  0.9472  0.9958   0.9664  0.9850


In [16]:
print('FDR:')
print(fdr_df)

[1] "FDR:"
     K_Ridge   K_LASSO   ADLASSO  ADLASSO2   ADSLOPE
5  0.1544815 0.1425676 0.6243929 0.1801726 0.2385251
20 0.1293548 0.1687399 0.6531812 0.1611591 0.2685747
50 0.1922487 0.2006646 0.5083831 0.1376820 0.2570011


In [17]:
print('Betas MSE:')
print(betas_mse_df)

[1] "Betas MSE:"
        OLS     Ridge    LASSO  ADLASSO  ADLASSO2   ADSLOPE
5  17552.88  416.2828 139.3958 206.3660  94.44338  89.19459
20 17552.88 1211.9450 455.2724 515.2532 373.04804 327.49461
50 17552.88 2096.0132 923.6777 834.6129 887.33294 729.44346


In [18]:
print('Mu MSE:')
print(mu_mse_df)

[1] "Mu MSE:"
        OLS     Ridge    LASSO  ADLASSO  ADLASSO2   ADSLOPE
5  1809.535  358.3019 129.9146 202.6428  89.72967  84.52688
20 1809.535  765.4374 365.9268 449.4459 320.89496 279.11631
50 1809.535 1059.5171 643.0905 616.8512 683.93059 549.11037


### Power:

Ridge Regression with knockoffs has small powers, especially when k=5 (~0.35). For larger k values the power increase to a value of ~0.5, in some of my experiments power for k=20 was equal to 0.7, but it's still not a good result.

Lasso Regression with knockoffs has small power for k=5, but it was expected. For larger k values power increase to ~0.95. Therefore we can see that the knockoff method doesn't work when k is small.
    
Both versions of the Adaptive Lasso have good and stable power for each k value, however, we can see a bit better performers (in terms of power) of the first version.

The power of the adaptive SLOPE is high (but a bit smaller than the power of adaptive Lasso) and it increases with k.

### False discoveries rate:

We can see that both Ridge and Lasso regressions with knockoffs are able to control FDR at a given level (0.2). For smaller k values their FDRs are even smaller.
    
The second version of the adaptive Lasso is doing much better, as expected.
    
FDR of the adaptive SLOPE is a bit higher than the given level (0.2), but it's still acceptable.

### Mean squared error:

The standard version of Lasso has much smaller MSEs than the standard version of Ridge, as expected. 
    
The second version of adaptive Lasso gives smaller MSE, but we could predict that using the FDRs.The first version gives worse MSE than the standard Lasso.
    
The best in terms of MSEs is the adaptive version of SLOPE.

The first version of adaptive Lasso was capable of selecting nearly all true variables, but it also selects many false discoveries, therefore we have high MSE. Both the second version of adaptive Lasso and adaptive SLOPE perform really well. SLOPE has a bit higher power, but Lasso has a smaller FDR. When k is high we should also consider Lasso with knockoffs.

## 2
Repeat Problem 1 when $X_i \sim N(0,\frac{1}{n}\Sigma)$, where $\Sigma_{ii} = 1$ and $\Sigma_{ij} = 0.5$.

In [19]:
n <- 500;
p <- 450;
k <- c(5, 20, 50);
signal_strength<-10;
l_k <- length(k);

In [20]:
sigma <- matrix(0.5, p, p) 
diag(sigma) <- 1

In [21]:
X <- (rmvnorm(n,numeric(p),sigma))/sqrt(n)

In [22]:
s<-min(eigen(sigma)$values);
s<-min(2*s,1);
sseq=c(rep(s,p));
V=2*diag(sseq)-diag(sseq)%*%solve(sigma)%*%diag(sseq);
mu<-X-X%*%solve(sigma)%*%diag(sseq);

In [23]:
Xn<-mu+rmvnorm(n,rep(0,p),V)/sqrt(n);
cX<-cbind(X,Xn);

In [24]:
betas <- matrix(rep(0,l_k*p),p,l_k);
for (i in 1:l_k){
    betas[1:k[i],i]<-signal_strength;
}

In [25]:
Xb <- matrix(rep(0,l_k*n),n,l_k);
for (i in 1:l_k){
    Xb[,i]<-X%*%betas[,i];
}

In [26]:
reps <- 100;
betas_ols <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_ridge <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_lasso <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_kridge <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_klasso <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_adlasso <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_adlasso2 <- array(rep(0,l_k*reps*p),c(l_k,reps,p));
betas_adslope <- array(rep(0,l_k*reps*p),c(l_k,reps,p));

In [27]:
for (j in 1:reps){
    epsilon <- 2*rnorm(n);
    lambda_alasso <- qnorm(1-0.1/p);
    for (i in 1:l_k){
        Y <- Xb[,i] + epsilon;
        
        # ols
        obj <- lm(Y~X-1);
        betas_ols[i,j,] <- obj$coefficients;
        
        # ridge
        obj2 <- cv.glmnet(X, Y, alpha=0, intercept=FALSE, standardize=FALSE);
        betas_ridge_ <- coefficients(obj2, s='lambda.min')[2:(p+1),1];
        betas_ridge[i,j,] <- betas_ridge_;
        
        # lasso
        obj3 <- cv.glmnet(X, Y, alpha=1, intercept=FALSE, standardize=FALSE);
        betas_lasso_ <- coefficients(obj3, s='lambda.min')[2:(p+1),1];
        betas_lasso[i,j,] <- betas_lasso_;
        lasso_indxs <- which(abs(betas_lasso_)>0);

        # Knockoff Ridge
        obj4 <- cv.glmnet(cX, Y, alpha=0, intercept=FALSE, standardize=FALSE);
        betas_kridge_temp <- coefficients(obj4, s='lambda.min')[2:(2*p+1),1];
        w_ridge <- abs(betas_kridge_temp[1:p])-abs(betas_kridge_temp[(p+1):(2*p)]);
        betas_kridge[i,j,] <- Knockoff_betas(w_ridge, betas_ridge_);
        
        # Knockoff Lasso
        obj5 <- cv.glmnet(cX, Y, alpha=1, intercept=FALSE, standardize=FALSE);
        betas_klasso_temp <- coefficients(obj5, s='lambda.min')[2:(2*p+1),1];
        w_lasso <- abs(betas_klasso_temp[1:p])-abs(betas_klasso_temp[(p+1):(2*p)]);
        betas_klasso[i,j,] <- Knockoff_betas(w_lasso, betas_lasso_)
        
        if (length(lasso_indxs)<2){
            next;
        }
        
        # Adaptive Lasso I
        X_ADLasso <- X[,lasso_indxs];
        betas_adlasso_ <- betas_lasso_[lasso_indxs];
        W_ADLasso_ <- betas_adlasso_;
        X_ADLasso_temp <- sweep(X_ADLasso, 2, W_ADLasso_, '*');
        obj6 <- cv.glmnet(X_ADLasso_temp, Y, alpha=1, intercept=FALSE, standardize=FALSE);
        betas_adlasso_2 <- coefficients(obj6)[2:(length(lasso_indxs)+1)] * W_ADLasso_;
        betas_adlasso_3 <- rep(0,p);
        betas_adlasso_3[lasso_indxs] <- betas_adlasso_2;
        betas_adlasso[i,j,] <- betas_adlasso_3;
        
        # Adaptive Lasso II
        Lasso_RSS <- sum((Y- X%*%betas_lasso_)^2)
        X_ADLasso2 <- X[,lasso_indxs];
        betas_adlasso2_ <- betas_lasso_[lasso_indxs];
        sigma_lassoCV <- sqrt(Lasso_RSS/(n-length(lasso_indxs)));
        W_ADLasso_2<-abs(betas_adlasso2_)/sigma_lassoCV;
        X_ADLasso2_temp <- sweep(X_ADLasso2, 2, W_ADLasso_2, '*');
        obj7 <- glmnet(X_ADLasso2_temp, Y, intercept=FALSE, alpha=1, standardize=FALSE, lambda=sigma_lassoCV*lambda_alasso/n);
        betas_adlasso2_2 <- coefficients(obj7)[2:(length(lasso_indxs)+1)] * W_ADLasso_2;
        betas_adlasso2_3 <- rep(0,p);
        betas_adlasso2_3[lasso_indxs] <- betas_adlasso2_2;
        betas_adlasso2[i,j,] <- betas_adlasso2_3;
        
        # Adaptive SLOPE
        W_ADSlope_ <- abs(betas_lasso_ + 1e-6)/sigma_lassoCV;
        X_ADSlope_temp <- sweep(X, 2, W_ADSlope_, '*');
        obj8 <- SLOPE(X_ADSlope_temp, Y, q=0.2, alpha=1/n*sigma_lassoCV, lambda='bh', solver='admm', max_passes=100, scale='none');
        betas_adslope_ <- coefficients(obj8)[2:(p+1)] * W_ADSlope_;
        betas_adslope[i,j,] <- betas_adslope_;
    }
}

In [28]:
colnames1 <- c('K_Ridge','K_LASSO','ADLASSO','ADLASSO2','ADSLOPE')
power_df = data.frame(
    c1<-power(betas_kridge),
    c2<-power(betas_klasso),
    c3<-power(betas_adlasso),
    c4<-power(betas_adlasso2),
    c5<-power(betas_adslope)
);
colnames(power_df) <- colnames1;
rownames(power_df) <- k;

fdr_df = data.frame(
    c1<-FDR(betas_kridge),
    c2<-FDR(betas_klasso),
    c3<-FDR(betas_adlasso),
    c4<-FDR(betas_adlasso2),
    c5<-FDR(betas_adslope)
);
colnames(fdr_df) <- colnames1;
rownames(fdr_df) <- k;


colnames2 <- c('OLS', "Ridge", "LASSO",'ADLASSO','ADLASSO2','ADSLOPE')
betas_mse_df = data.frame(
    c1<-mse_betas(betas_ols),
    c2<-mse_betas(betas_ridge),
    c3<-mse_betas(betas_lasso),
    c4<-mse_betas(betas_adlasso),
    c5<-mse_betas(betas_adlasso2),
    c6<-mse_betas(betas_adslope)
);
colnames(betas_mse_df) <- colnames2;
rownames(betas_mse_df) <- k;

mu_mse_df = data.frame(
    c1<-mse_mu(betas_ols),
    c2<-mse_mu(betas_ridge),
    c3<-mse_mu(betas_lasso),
    c4<-mse_mu(betas_adlasso),
    c5<-mse_mu(betas_adlasso2),
    c6<-mse_mu(betas_adslope)
);
colnames(mu_mse_df) <- colnames2;
rownames(mu_mse_df) <- k;

In [29]:
print('Power:')
print(power_df)

[1] "Power:"
   K_Ridge K_LASSO ADLASSO ADLASSO2 ADSLOPE
5   0.2240  0.2260   0.730   0.7600  0.7840
20  0.5755  0.8155   0.891   0.9040  0.9250
50  0.2760  0.8790   0.911   0.9094  0.9412


In [30]:
print('FDR:')
print(fdr_df)

[1] "FDR:"
     K_Ridge   K_LASSO   ADLASSO  ADLASSO2   ADSLOPE
5  0.1260566 0.1480514 0.3226147 0.4082288 0.4537907
20 0.2866828 0.1925669 0.2907617 0.2716989 0.3430597
50 0.2904784 0.1850489 0.2161787 0.2008199 0.2741512


In [31]:
print('Betas MSE:')
print(betas_mse_df)

[1] "Betas MSE:"
       OLS     Ridge     LASSO   ADLASSO  ADLASSO2   ADSLOPE
5  36672.3  454.3644  243.8364  313.8162  241.2622  235.3232
20 36672.3 1358.0324  589.9292  722.0914  617.6693  569.1529
50 36672.3 3039.1407 1183.0892 1559.9269 1520.9439 1296.9548


In [32]:
print('Mu MSE:')
print(mu_mse_df)

[1] "Mu MSE:"
        OLS     Ridge    LASSO  ADLASSO ADLASSO2  ADSLOPE
5  1792.552  203.6159 116.7181 193.0652 121.1584 116.1730
20 1792.552  595.7304 277.9815 349.0570 292.7120 266.7723
50 1792.552 1293.7556 489.7624 623.3602 607.0306 523.7840


### Power:
    
We can observe a decrease in powers for all methods, especially for k=5. The first version of adaptive Lasso no longer leads the ranking, now the adaptive version of SLOPE performs the best.

### False discoveries rate:

We can also observe an increase in FDRs for all methods apart from Lasso with knockoffs and the first version of adaptive Lasso. Now the first version of adaptive Lasso performs similarly (or sometimes even better) than the second one.

### Mean squared error:

For k=5 and k=20 we're still getting the smallest MSEs for the adaptive SLOPE. For larger k value standard version of Lasso outperforms the others. 

We can see that when there is a correlation between variables it's much harder to discover true variables. The second version of adaptive Lasso and the adaptive SLOPE still performs well, but it may happen that standard Lasso gives smaller MSE than them.