Heterogeneous CPU+GPU for SGD

The most common practice is to train deep learning models on GPUs or TPUs. However, this strategy does not employ eﬀectively the extensive CPU and memory resources on the server. We introduce a generic deep learning framework on heterogeneous CPU+GPU architectures to maximize convergence rate and resource utilization simultaneously. Two heterogeneous asynchronous stochastic gradient descent (SGD) algorithms are designed. The first algorithm – CPU+GPU Hogbatch – combines small batches on CPU with large batches on GPU in order to maximize the utilization of both resources. The second algorithm – Adaptive Hogbatch – assigns batches with continuously evolving size based on the relative speed of CPU and GPU. See our arXiv paper for more details.

Experimental Evaluation

Heterogeneous Hogbatch algorithms outperform the CPU and GPU-only solutions in time to convergence by large margins. This is also the case for TensorFlow, which is a GPU-only variant.

Hogwild CPU has the best statistical efficiency. Nonetheless, the Adaptive CPU+GPU algorithm comes within similar performance for all the datasets.

The heterogeneous algorithms provide consistent performance across two different computing architectures with different number of GPUs and GPU type. The batch size threshold controls the difference between CPU+GPU and Adaptive both in number of model updates and utilization. These have a direct impact on the convergence of the loss function.

With few exceptions, for low-dimensional datasets, CPU+GPU is superior, while Adaptive is better for sparse high-dimensional data.

Implementation

C/C++ using the pthreads library
OpenMP 3.7.0-3, Intel MKL 2.187
CUDA 10.0, cuBLAS 10.2.1.243-1
TensorFlow 1.13.1
The threads communicate using our custom asynchronous message queue.

Datasets

The datasets can be downloaded from link for covtype, w8a and real-sim and link for delicious.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Algos		Algos
Scheduler		Scheduler
Test_EventProcessor/source		Test_EventProcessor/source
figures		figures
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heterogeneous CPU+GPU for SGD

Experimental Evaluation

Implementation

Datasets

About

Releases

Packages

Languages

YMA33/CPU-GPU-SGD

Folders and files

Latest commit

History

Repository files navigation

Heterogeneous CPU+GPU for SGD

Experimental Evaluation

Implementation

Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages