# SLOPE for Count Data

*Autors: Zach Lau, Joey Hotz and Javier Martinez-Rodriguez*

## Introduction

This project aims to measure the performance of Sorted L-One Penalized Estimation (SLOPE) for variable selection in the context of high-dimensional count data modelled with Poisson regression. SLOPE extends the LASSO penalty by introducing a sequence of non-increasing regularization parameters, resulting in the penalty $\sum_{i=1}^{p}\lambda_{i}|\hat{\beta}_{(i)}|$, where coefficients $|\hat{\beta}|$ are sorted in descending order and $\lambda_1 \ge \dots \ge \lambda_p \ge 0$. While SLOPE, inspired by the Benjamini-Hochberg procedure, has shown theoretical False Discovery Rate (FDR) control in Gaussian linear models, its behaviour in non-Gaussian settings like Poisson regression remains less explored. Therefore, this project uses simulations to compare the variable selection accuracy (measured by FDR and Power) of SLOPE against standard and adaptive LASSO across scenarios varying in predictor dimensionality ($p/n$ ratio), inter-predictor correlation ($\rho$), sparsity ($k$), and signal strength, using a target FDR ($q=0.1$) for SLOPE and cross-validation for LASSO variants.

In order to discuss the details of the SLOPE penalization and the experiments to evaluate its performance using simulation data, this document is divided into X parts. First, it provides a general explanation about SLOPE applied to Gaussian linear models and the effect of choosing different procedures to assign its values, as discussed by Bogdan et al. (2015). In particular, it discusses the FDR provided by the Benjamini-Hochberg procedure. Second, the implementation of SLOPE to Generalized Linear Models (GLM) is addressed, particularly with count data. Third, the simulation settings, computing penalizations, and results are discussed. 

## Explanation of SLOPE

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

## Explanation of SLOPE for Count Data

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

## Experiments

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

### Compiting Penalizations

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

### Experimental Settings

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

In [2]:
# Setup ----
## Packages to use ----
if (!require("pacman")) install.packages("pacman")
# if (!require("mytidyfunctions")) remotes::install_github("JavierMtzRdz/mytidyfunctions")

pacman::p_load(tidyverse, janitor, 
               SLOPE, glmnet, MASS,
              # mytidyfunctions,
               patchwork, here)

## Load fonts ----
extrafont::loadfonts(quiet = TRUE)

## Set theme ------
# mytidyfunctions::set_mytheme(text = element_text(family = "Times New Roman"))


In [None]:
# Generative models
## Simulation Parameters
n <- 1000         
p_values <- c(500, 1000, 2000) 
rho_values <- c(0, 0.5, 0.8)  
k_values <- c(10, 20, 50, 100) # Non-zero betas
signal_strengths <- list( 
  weak = list(beta_min = 0.1, beta_max = 0.5),
  strong = list(beta_min = 0.5, beta_max = 1.5)
)
R <- 50 
q_fdr <- 0.1       # Tq parameter for SLOPE
adapt_lasso_gamma <- 1 # ALasso weights
beta0 <- 0.5

set.seed(538)

## Generative model
generate_data <- function(n, p, rho, k, signal_info, beta0) {

  beta_true <- numeric(p)

  if (k > 0) {
    non_zero_indices <- sample(1:p, k)
    magnitudes <- runif(k, min = signal_info$beta_min, max = signal_info$beta_max)
    signs <- sample(c(-1, 1), k, replace = TRUE)
    beta_true[non_zero_indices] <- magnitudes * signs
  }
  true_support <- which(beta_true != 0)

  # Generate X
  Sigma <- matrix(rho, nrow = p, ncol = p)
  diag(Sigma) <- 1
  X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma)
  X <- scale(X)

  # Count response 
  lambda <- exp(beta0 + X %*% beta_true)
  # lambda <- pmin(lambda, some_large_value)
  y <- rpois(n, lambda)

  return(list(X = X, y = y, beta_true = beta_true, true_support = true_support, beta0 = beta0))
}

### Results

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

## References

Bogdan, Małgorzata, Ewout van den Berg, Chiara Sabatti, Weijie Su, and Emmanuel J. Candès. 2015. “Slope—Adaptive Variable Selection Via Convex Optimization.” The Annals of Applied Statistics 9 (3): 1103–40.
