An R Package for Sparse PCA with Multiple Principal Components
This package can be installed from CRAN directly (pending CRAN registration):
install.packages("msPCA")Alternatively, it can be installed from this Github repository using the devtools package. You would first need to install devtools:
install.packages("devtools")and then run the following commands:
library(devtools)
install_github('jeanpauphilet/msPCA')The package consists of one main function, msPCA, which takes as input:
- a data matrix (either the correlation or covariance matrix of the dataset),
- the number of principal components (PCs) to be computed, r,
- a list of r integers corresponding to the sparsity of each PC.
It returns an objecti with 4 fields
x_best(p x r array containing the sparse PCs),objective_valueorthogonality_violationruntime.
Here is a short example demonstrating how to use the package. First, you need to load the library.
library(msPCA)Then, define the input variables.
library(datasets)
df <- datasets::mtcars
TestMat <- cor(df)And then simply call the function
mspca(TestMat, 2, c(4,4))Here, we provide more information about the code structure and organization to help developers that would like to improve the method or build up on it.
- R
- RcppExports.R
It offers the R interface, which will call the corresponding C++ functions. Regenerate or change it manually if needed (e.g., if the interface changes). We recommend generating it automatically by usingRcpp::compileAttributes(). - main.R
It contains all the functions of the package. For the functions coded in Rcpp (and exported in the RcppExports.R file), this script provides (i) user-friendly names, (ii) documentation. This script also defines useful supporting functions.
- RcppExports.R
- man/ contains the pages of the manual: one page for the package and one per function. The are generated automatically from the comments in R/main.R via the
devtools::document()command. - src/ contains the source files of the algorithm, in C++.
- ConstantArguments.h
It contains some parameters of the algorithm that are not directly tuneable by the end user. - msPCA_R_CPP.cpp
It contains the implementation of the algorithm. - RcppExports.cpp
It contains the converted function that can be used by R. Regenerate or change it manually if needed (e.g., if the interface changes). It can be generated usingRcpp::compileAttributes(). - Makevars
This is not currently used. Use it to set attributes, such as the version of C++ for compilation. - Makevars.win
This is not currently used. Use it to set attributes, such as the version of C++ for compilation.
- ConstantArguments.h
- test/ contains some template R notebooks
- notebook_mtcars.R compares the PCs generated by msPCA on the mtcars dataset with the ones obtained using several alternative packages (elasticnet, PMA, sparsepca)
- notebook_plot.R provides code to represent the resulting PCs on any 2D-plane
- notebook_synthetic.R compares the performance of msPCA and elasticnet on synthetically generated data with 2 true sparse PCs. Results are stored in the 'msPCA_synthetic_results.csv' file and graphically represented.
- NAMESPACE
It is used to build this package. Change it if needed (e.g., if the interface changes). - DESCRIPTION
It contains the description of this package. - LICENSE
It contains the license information. - msPCA.Rproj
It contains the settings of this R project. It is used by RStudio and often does not need to be changed.
- The essence of this algorithm is in the file "msPCA_R_CPP.cpp" and the file "ConstantArguments.h", where "msPCA_R_CPP.cpp" handles the computation and "ConstantArguments.h" lists all internal arguments.