Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

awesome-very-deep-learning is a curated list for papers and code about implementing and training very deep neural networks.

Neural Ordinary Differential Equations

ODE Networks are a kind of continuous-depth neural network. Instead of specifying a discrete sequence of hidden layers, they parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed.


  • Neural Ordinary Differential Equations (2018) [original code], introduces several ODENets such as continuous-depth residual networks and continuous-time latent variable models. The paper also constructs continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, the authors show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models. NIPS 2018 best paper.
  • Augmented Neural ODEs (2019), neural ODEs preserve topology, thus their learned flows can't intersect with each other. Therefore some functions can't be learned. Augmented NODEs improve upon this by adding an additional dimension to learn simpler flows.


  1. Authors Autograd Implementation

Value Iteration Networks

Value Iteration Networks are very deep networks that have tied weights and perform approximate value iteration. They are used as an internal (model-based) planning module.


  • Value Iteration Networks (2016) [original code], introduces VINs (Value Iteration Networks). The author shows that one can perform value iteration using iterative usage of convolutions and channel-wise pooling. It is able to generalize better in environments where a network needs to plan. NIPS 2016 best paper.

Densely Connected Convolutional Networks

Densely Connected Convolutional Networks are very deep neural networks consisting of dense blocks. Within dense blocks, each layer receives the feature maps of all preceding layers. This leverages feature reuse and thus substantially reduces the model size (parameters).



  1. Authors' Caffe Implementation
  2. Authors' more memory-efficient Torch Implementation.
  3. Tensorflow Implementation by Yixuan Li.
  4. Tensorflow Implementation by Laurent Mazare.
  5. Lasagne Implementation by Jan Schlüter.
  6. Keras Implementation by tdeboissiere.
  7. Keras Implementation by Roberto de Moura Estevão Filho.
  8. Chainer Implementation by Toshinori Hanya.
  9. Chainer Implementation by Yasunori Kudo.
  10. PyTorch Implementation (including BC structures) by Andreas Veit
  11. PyTorch Implementation

Deep Residual Learning

Deep Residual Networks are a family of extremely deep architectures (up to 1000 layers) showing compelling accuracy and nice convergence behaviors. Instead of learning a new representation at each layer, deep residual networks use identity mappings to learn residuals.



  1. Torch by Facebook AI Research (FAIR), with training code in Torch and pre-trained ResNet-18/34/50/101 models for ImageNet: blog, code
  2. Torch, CIFAR-10, with ResNet-20 to ResNet-110, training code, and curves: code
  3. Lasagne, CIFAR-10, with ResNet-32 and ResNet-56 and training code: code
  4. Neon, CIFAR-10, with pre-trained ResNet-32 to ResNet-110 models, training code, and curves: code
  5. Neon, Preactivation layer implementation: code
  6. Torch, MNIST, 100 layers: blog, code
  7. A winning entry in Kaggle's right whale recognition challenge: blog, code
  8. Neon, Place2 (mini), 40 layers: blog, code
  9. Tensorflow with tflearn, with CIFAR-10 and MNIST: code
  10. Tensorflow with skflow, with MNIST: code
  11. Stochastic dropout in Keras: code
  12. ResNet in Chainer: code
  13. Stochastic dropout in Chainer: code
  14. Wide Residual Networks in Keras: code
  15. ResNet in TensorFlow 0.9+ with pretrained caffe weights: code
  16. ResNet in PyTorch: code
  17. Ladder Network for Semi-Supervised Learning in Keras : code

In addition, this code by Ryan Dahl helps to convert the pre-trained models to TensorFlow.

Highway Networks

Highway Networks take inspiration from Long Short Term Memory (LSTM) and allow training of deep, efficient networks (with hundreds of layers) with conventional gradient-based methods



  1. Lasagne: code
  2. Caffe: code
  3. Torch: code
  4. Tensorflow: blog, code
  5. PyTorch: code

Very Deep Learning Theory

Theories in very deep learning concentrate on the ideas that very deep networks with skip connections are able to efficiently approximate recurrent computations (similar to the recurrent connections in the visual cortex) or are actually exponential ensembles of shallow networks



A curated list of papers and code about very deep neural networks








No releases published


No packages published