The cvmatrix
package implements the fast algorithms by Engstrøm [1] for computation of training set X
and Y
based on training set statistics.
For an implementation of the fast cross-validation algorithms combined with Improved Kernel Partial Least Squares [2], see the Python package ikpls
.
-
Install the package for Python3 using the following command:
pip3 install cvmatrix
-
Now you can import the class implementing all the algorithms with:
from cvmatrix.cvmatrix import CVMatrix
import numpy as np from cvmatrix.cvmatrix import CVMatrix N = 100 # Number of samples. K = 50 # Number of features. M = 10 # Number of targets. X = np.random.uniform(size=(N, K)) # Random X data Y = np.random.uniform(size=(N, M)) # Random Y data cv_splits = np.arange(100) % 5 # 5-fold cross-validation # Instantiate CVMatrix cvm = CVMatrix( cv_splits=cv_splits, center_X=True, center_Y=True, scale_X=True, scale_Y=True, ) # Fit on X and Y cvm.fit(X=X, Y=Y) # Compute training set XTX and/or XTY for each fold for val_split in cvm.val_folds_dict.keys(): # Get both XTX and XTY training_XTX, training_XTY = cvm.training_XTX_XTY(val_split) # Get only XTX training_XTX = cvm.training_XTX(val_split) # Get only XTY training_XTY = cvm.training_XTY(val_split)
In examples, you will find:
In benchmarks, we have benchmarked the fast algorithms in cvmatrix
against the straight-forward, naive algorithms implemented in NaiveCVMatrix.
Left: Benchmarking the CVMatrix implementation versus the straight-forward, naive implementation (NaiveCVMatrix) using three common combinations of centering and scaling. Right: Benchmarking the CVMatrix implementation for all possible combinations of centering and scaling.
To contribute, please read the Contribution Guidelines.
- Engstrøm, O.-C. G. (2024). Shortcutting Cross-Validation: Efficiently Deriving Column-Wise Centered and Scaled Training Set $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ Without Full Recomputation of Matrix Products or Statistical Moments
- Dayal, B. S., & MacGregor, J. F. (1997). Improved PLS algorithms. Journal of Chemometrics, 11(1), 73-85.