Skip to content

benoit-liquet/combss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

combss

R implementation of COMBSS (Continuous Optimisation for Best Subset Selection) for generalised linear models.

COMBSS reformulates the NP-hard discrete subset selection problem as a continuous optimisation over the hypercube [0, 1]^p, solved via a Frank-Wolfe homotopy algorithm. The inner ridge problem is solved with glmnet. Supports linear (Gaussian), binary logistic, and multinomial logistic regression.

Installation

From GitHub:

# install.packages("remotes")
remotes::install_github("benoit-liquet/combss")

Quick start

Linear regression

library(combss)
set.seed(1)
n <- 200; p <- 30
beta <- c(3, 2, 1.5, 1, 0.5, rep(0, p - 5))
x <- matrix(rnorm(n * p), n, p)
y <- as.numeric(x %*% beta + rnorm(n) * 0.5)
fit <- combss(x, y, family = "gaussian", q = 10)
fit$subset_list   # selected features for k = 1, ..., 10

or use summary function

summary(fit)
COMBSS fit
  family:    gaussian
  n, p:      200, 30
  q:         10
  lam_ridge: 0
  (no validation data; subset_list only)

Subset path:
  k= 1  features: 1 
  k= 2  features: 1,2 
  k= 3  features: 1,2,3 
  k= 4  features: 1,2,3,4 
  k= 5  features: 1,2,3,4,5 
  k= 6  features: 1,2,3,4,5,22 
  k= 7  features: 1,2,3,4,5,11,22 
  k= 8  features: 1,2,3,4,5,11,18,22 
  k= 9  features: 1,2,3,4,5,11,13,18,22 
  k=10  features: 1,2,3,4,5,11,13,18,19,22 

family = "linear" is accepted as an alias for "gaussian".

Binary logistic regression

ybin <- as.numeric(plogis(x %*% beta) > 0.5)
itr <- 1:140; iva <- 141:200
fit <- combss(x[itr, ], ybin[itr],
              x_val = x[iva, ], y_val = ybin[iva],
              family = "binomial", q = 15)
fit$subset      # best subset by validation accuracy
[1]  1  2  3  4  5  6  8 13 21 22 26
fit$accuracy    # validation accuracy at best k
[1] 0.9666667

Multinomial logistic regression

fit <- combss(x, ymulti, family = "multinomial", q = 20)

LOOCV ridge selection

cv <- combss_cv(x, y, family = "gaussian", q = 10)
cv$best_lambda

Methods

  • print(fit), summary(fit)
  • coef(fit, k) — selected feature indices for subset size k
  • predict(fit, newx, x_train, y_train, k) — refit on chosen subset and predict

References

Authors

See also

License

GPL-3

About

R package for COMBSS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages