EigenPro

EigenPro [1-3] is a GPU-enabled fast and scalable solver for training kernel machines. It applies a projected stochastic gradient method with dual preconditioning to enable major speed-ups. It is currently based on a PyTorch backend.

Highlights

Fast: EigenPro is the fastest kernel method at large scale.
Plug-and-play: Our method learns a quality model with little hyper-parameter tuning in most cases.
Scalable: The training time of one epoch is nearly linear in both model size and data size. This is the first kernel method that achieves such scalability without any compromise on testing performance.

Coming Soon

Support for multi-GPU and model-parallelism: We are adding support for multiple GPUs and model-parallelism.

Usage

Installation

pip install git+ssh://git@github.com/EigenPro/EigenPro.git@main

Run Example

Linux:

bash examples/run_fmnist.sh

Windows:

examples\run_fmnist.bat

Jupyter Notebook: examples/notebook.ipynb

See files under examples/ for more details.

Empirical Results

In the experiments described below, P denotes the number of centers (model size), essentially representing the model size, while 'd' signifies the ambient dimension. For all experiments, a Laplacian kernel with a bandwidth of 20.0 was employed.

1. CIFAR5M Extracted Features on single GPU

We used extracted features from the pretrained 'mobilenet-2' network available in the timm library. The benchmarks processed the full 5 million samples of CIFAR5M with d = 1280 for one epoch for two versions of EigenPro and FALKON [4-6]. All of these experiments were run on a single A100 GPU. The maximum RAM we had access to was 1.2TB, which was not sufficient for FALKON with 1M centers.

2. Libri?Speech Extracted Features on single GPU

We used 10 million samples with d = 1024 for one epoch for two versions of EigenPro and FALKON. All of these experiments were run on a single V100 GPU. The maximum RAM available for this experiment was 300GB, which was not sufficient for FALKON with more than 128K centers. The features are extracted using an acoustic model (a VGG+BLSTM architecture in [7]) to align the length of audio and text.

References

Abedsoltan, Amirhesam and Belkin, Mikhail and Pandit, Parthe, "Toward Large Kernel Models," Proceedings of the 40th International Conference on Machine Learning, ICML'23, JMLR.org, 2023. Link
Siyuan Ma, Mikhail Belkin, "Kernel machines that adapt to GPUs for effective large batch training," Proceedings of the 2nd SysMLConference, 2019. Link
Siyuan Ma, Mikhail Belkin, "Diving into the shallows: a computational perspective on large-scale shallow learning," Advances in Neural Information Processing Systems 30 (NeurIPS 2017). Link
Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi, “Kernel methods through the roof: handling billions of points efficiently,” Advances in Neural Information Processing Systems, 2020. Link
Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco, “FALKON: An optimal large scale kernel method,” Advances in Neural Information Processing Systems, 2017. Link
Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi, “Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses,” Advances in Neural Information Processing Systems, 2019. Link
Hui, L. and Belkin, M. "Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks." In International Conference on Learning Representations, 2021. Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EigenPro

Achievements

Achievements

Block or report EigenPro

EigenPro

Highlights

Coming Soon

Usage

Installation

Run Example

Empirical Results

1. CIFAR5M Extracted Features on single GPU

2. Libri?Speech Extracted Features on single GPU

References

Cite us

Popular repositories Loading