Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

parEBEN - Parallel Implementations of the Empirical Bayesian Elastic Net Cross-Validation in R

Colby T. Ford, Ph.D.

parEBEN icon


The Empirical Bayesian Elastic Net (EBEN) algorithm was developed by Huang et al. for handling multicollinearity in generalized linear regression models. Historically, this has been used in the analysis of quantitative trait loci (QTLs) and gene-gene interactions (epistasis). In addition to the algorithm, the group also created the EBEN package for R. This package includes functions to generate the elastic nets for both binomial and gaussian priors. These functions are efficient and do not require large amounts of computational time. However, the package also includes functions for the cross-validation of those models. While essential, this step is a considerably more complex task. The cross-validation functions perform a sweep to determine hyperparameters and minimize prediction error. More specifically, an n-fold cross-validation sweep is performed to minimize error by trying combinations of two parameters (α and λ) in a stepped manner. Experimentally, it has been shown that this can take a rather extended amount of time, especially on larger datasets (as seen in genomics problems).

CV Bottleneck

To combat this complexity issue, the parallelization of the cross-validation functions was performed by employing parallel packages in R. By parallelizing the iterations of the cross-validation over multiple CPU cores or multiple machines of a computing clusters, a drastic time reduction can seen with no negative effect on the resulting EBEN models. By reducing the computation time, regression models on larger, more complex data can be completed without such a delay. This also opens the door for larger datasets to be analyzed as opposed to limiting the research due to time and computing resource constraints. Thus, parallelizing the cross-validation of the EBEN models will prove to be greatly beneficial in future research using cross-validated Bayesian elastic nets.

Time Reduction Benchmark

To interactively view cross-validation time benchmarks between parEBEN and the original EBEN package, click here


You can install the latest stable version from GitHub using the following command:



First, select the parallelization method you wish to use. Currently, all foreach-related methods are supported such as doParallel, doMPI, and doSNOW.

Initialize The Cluster

Note: Refer to the manual for your desired foreach parallelization package as the initialization may differ between methods.

Local Parallel
no_cores <- detectCores()
cl <- makeCluster(no_cores)
#clusterExport(cl, c("CrossValidate"))
Cluster Distribution
# create and register a doMPI cluster if necessary
if (!identical(getDoParName(), 'doMPI')) {
  # set count to (cores_requested-1)
  cl <- startMPIcluster(count=255,verbose=TRUE)
Microsoft Machine Learning Server Distribution
## Set your compute contaxt as Spark, local parallel, MapReduce, etc.
### See:
### Sample Code:

mySparkCluster <- RxSpark(ClusterInfo)

## Register the context using doRSR

Begin the Cross-Validation

## Load in data and required EBEN and parEBEN packages

## Create small sample matrix for testing
n = 50
k = 100
BASIS = BASIS[1:n,1:k]
y  = y[1:n]

parEBENcv <- CrossValidate(BASIS,
                           nFolds = 3,
                           Epis = "no",
                           prior = "gaussian",
                           search = "global"

## Use the optimal values in the EBEN model
EBENoutput <- EBelasticNet.Gaussian(BASIS,
                                    lambda = parEBENcv$lambda.optimal,
                                    alpha = parEBENcv$alpha.optimal,
                                    Epis = "no",
                                    verbose = 1)

To Do List

  • Binomial prior cross-validation script with doParallel.
  • Gaussian prior cross-validation script with doParallel.
  • Binomial prior cross-validation script with doMPI.
  • Gaussian prior cross-validation script with doMPI.
  • Binomial prior cross-validation script with Microsoft ML Server (RevoScaleR/doRSR).
  • Gaussian prior cross-validation script with Microsoft ML Server (RevoScaleR/doRSR).
  • Binomial prior cross-validation script with SparkR.
  • Gaussian prior cross-validation script with SparkR.
  • Binomial prior cross-validation script with CUDA.
  • Gaussian prior cross-validation script with CUDA.
  • Manual File/Usage Instructions.

Publication and How To Cite


Data and materials used in publication can be found here.

Jia Wen, Colby T Ford, Daniel Janies, Xinghua Shi, A Parallelized Strategy for Epistasis Analysis Based on Empirical Bayesian Elastic Net Models, Bioinformatics, , btaa216,

or using BibTeX...

    author = {Wen, Jia and Ford, Colby T and Janies, Daniel and Shi, Xinghua},
    title = "{A Parallelized Strategy for Epistasis Analysis Based on Empirical Bayesian Elastic Net Models}",
    journal = {Bioinformatics},
    year = {2020},
    month = {03},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa216},
    url = {},
    note = {btaa216},
    eprint = {},


This project is licensed under the Apache 2.0 License - see the LICENSE file for details


This project was funded in part by NIH R15HG009565.