Skip to content
/ pcnn Public

Parallel Convolutional Neural Network Framework

Notifications You must be signed in to change notification settings

NU-CUCIS/pcnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PCNN: Parallel Convolutional Neural Network

PCNN is an open-source C/C++ software implementation of Convolutional Neural Network (CNN). It can be used to train and deploy a deep CNN model for general classification/regression problems. The parallelization strategy used in PCNN is data parallelism and developed to run on CPU-based distributed-memory parallel computers, using MPI for inter-process communications. It has been evaluated on Cori, the Cray XC40 supercomputer at NERSC.

This work aims to provide a good scalability of parallel CNN training while achieving the same accuracy as that of sequential training. PCNN exploits the overlap of computation and communication to improve the scalability. In order to maximize the degree of overlap, the gradients are averaged across all the processes using communication-efficient gradient averaging algorithm proposed in [2].

Source Tree Structure

  • ./src: The folder contains the source codes. To build an executable, refer to the src/README.md.
  • Use cases

Questions/Comments

Publications

  1. Sunwoo Lee, Qiao Kang, Reda Al-Bahrani, Ankit Agrawal, Alok Choudhary, and Wei-keng Liao. Improving Scalability of Parallel CNN Training by Adaptively Adjusting Parameter Update Frequency. Parallel and Distributed Computing (JPDC), 159:10–23, January 2022.
  2. Sunwoo Lee, Qiao Kang, Ankit Agrawal, Alok Choudhary, and Wei-keng Liao, Communication-Efficient Local Stochastic Gradient Descent for Scalable Deep Learning. In IEEE International Conference on Big Data, December 2020.
  3. Sunwoo Lee, Qiao Kang, Sandeep Madireddy, Prasanna Balaprakash, Ankit Agrawal, Alok Choudhary, Richard Archibald, and Wei-keng Liao. Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time. In IEEE International Conference on Big Data, December 2019.
  4. Sunwoo Lee, Ankit Agrawal, Prasanna Balaprakash, Alok Choudhary, and Wei-keng Liao. Communication-Efficient Parallelization Strategy for Deep Convolutional Neural Network Training. In the Workshop on Machine Learning in HPC Environments, held in conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2018.
  5. Sunwoo Lee, Dipendra Jha, Ankit Agrawal, Alok Choudhary, and Wei-keng Liao. Parallel Deep Convolutional Neural Network Training by Exploiting the Overlapping of Computation and Communication (best paper finalist). In the 24th International Conference on High-Performance Computing, Data, and Analytics, December 2017.

Project Funding Support

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program, RAPIDS Institute. This work is also supported in part by the DOE awards, United States DE-SC0014330 and DE-SC0019358. Partial support is also acknowledged from NIST CHiMaD.