Paper title:

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

Publication:

MICRO’19

Problem to solve

As researchers seek to deploy deeper and larger DNN topologies however, end-users are faced upon a memory “capacity” wall, where the limited on-device physical memory constrains the algorithm that can be trained. Current trends point to an urgent need for a system architectural solution that satisfies the dual requirements of (a) fast inter-device communication for parallel training, and (b) high performance memory virtualization over a large memory pool to enable memory hungry DNNs to be trainable over accelerator devices.

Major contribution

This work first highlights the importance of device-side interconnects in training scaled up DL algorithms, presenting a quantitative analysis on parallel training in the context of HPC systems with multiple accelerator (GPU/TPU) devices.

This work identifies key system-level performance bottlenecks on DC-DLA and motivates the need for a new system architecture that balances fast communication and user productivity in training large DNN algorithms.

Propose and evaluate a system architecture called MC-DLA that provides transparent memory capacity expansion while also enabling fast inter-device communication. Compared to DC-DLA designs, this paper achieves an average 2.8× performance improvement while expanding the system-wide memory capacity exposed to the accelerators to tens of TBs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beyond the Memory Wall_A Case for Memory-centric HPC System for Deep Learning.md

Beyond the Memory Wall_A Case for Memory-centric HPC System for Deep Learning.md

Files

Beyond the Memory Wall_A Case for Memory-centric HPC System for Deep Learning.md

Latest commit

History

Beyond the Memory Wall_A Case for Memory-centric HPC System for Deep Learning.md

File metadata and controls