Skip to content

Latest commit

 

History

History
72 lines (58 loc) · 3.72 KB

Astra Exploiting Predictability to Optimize Deep Learning.md

File metadata and controls

72 lines (58 loc) · 3.72 KB

Paper title:

Astra: Exploiting Predictability to Optimize Deep Learning

Publication:

ASPLOS’19

Problem to solve:

Despite their popularity, there is a glaring lack of conceptual understanding to reason about DNNs - e.g. which model structure would be a good t to a particular problem, or what hyper-parameters to use for a given structure. As a result, innovation in DNNs happens primarily through trial-and-error. The common way to find out whether a particular model would work is to run the model and see what accuracy it converges to. For large models, running even one such iteration can take several days.

As a result of the trial-and-error methodology of research in DNNs, advancement in AI is gated on systems advances to speed up DNN training. While hardware advances such as more powerful GPUs, and even ASICs help bridge this gap, the software layers come in the way; for example, even with current GPUs, several state-of-the-art models such as text classification only utilize a small fraction of the GPU when run with modern frameworks such as Tensor ow and PyTorch. Software accelerators such as cuDNN speed up specific types of DNN layers by hand-optimizing the specific computation. However, given the engineering effort required, such accelerators only cater to the most popular primitives (e.g. standard convolution or LSTM layers).

Unfortunately, by definition, novel model architectures that AI researchers invent are long tailed, i.e. they typically do not fit into the "popular" primitives that are addressed by hand-coded accelerators. However, it is precisely these "new" models that need to be fine-tuned by repeated trial and error.

Major contribution

Identify and leverage the unique characteristics of the deep learning workload to do extreme tailoring or custom-wiring of the infrastructure for a specific job, resulting in significant efficiency gains;

Propose and evaluate a new architectural framework for optimization of such custom workloads, with aggressive multi-version compilation at a whole-program level that performs parallel exploration of independent choices by using fine-grained profiling.

Demonstrate with a detailed evaluation that the state space of online exploration is manageable with our pruning strategies, and that end-to-end models get significant speedups over native PyTorch and Tensor ow, and even over static optimizers such as XLA.

Identify some simple functionality that new hardware for deep learning needs to conform to in order to enable our adaptation approach.

Lessons learnt

Astra addresses a pressing need in machine learning experimentation to iterate fast on new model architectures in order to make advances. While accelerators such as cuDNN significantly speed up training of deep learning jobs, they are hand-optimized and hence only cater to popular models. Astra bridges this gap by bringing the power of optimization to long tail models, by adopting a novel division of functionality between the compiler and runtime, where the runtime adaptively explores the state space of optimizations by leveraging the unique repetitiveness and predictability of a deep learning training job. With fine-grained profiling and several techniques to perform the exploration in parallel, Astra effectively prunes the state space, unlike probabilistic or learning-based approaches to adaptation. The Astra approach is particularly attractive given the frantic pace of new custom hardware being built for DNNs, which make static optimization expensive. Astra is an example of how tight integration of the systems layer such as compiler to a specific large workload can drive fundamental efficiencies with unconventional yet effective architectures.