Skip to content
Resources for deep learning: papers, articles, courses
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Unified Bellman Equation for Causal Information and Value in Markov Decision Processes, Tiomkin and Tishby, arXiv


  • Evaluating Theory of Mind in Question Answering, Nematzadeh et al, EMNLP 2018. arXiv
  • An Off-policy Policy Gradient Theorem Using Emphatic Weightings, Imani et al, NeurIPS 2018. arXiv
  • Doubly Robust Off-policy Value Evaluation for Reinforcement Learning, Jiang and Li, 2015. arXiv
  • Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning, Thomas and Brunskill, 2016. arXiv
  • Implicit Reparameterization Gradients, Figurnov et al, NeurIPS 2018. arXiv
  • Approximately Optimal Approximate Reinforcement Learning, Kakade and Langford, 2002. pdf. Note: the paper which inspired the likes of TRPO


  • Meta-Learning: A Survey, Vanschoren et al, 2018. arXiv
  • Off-policy Learning with Recognizers, Precup et al, 2005. pdf
  • Meta-Gradient Reinforcement Learning, Xu et al, 2018. arXiv.
  • Expected Policy Gradients, Criosek et al, 2018. arXiv.
  • Mean Actor Critic, Allen et al, 2018. arXiv, web version. The usual policy gradient is an expectation over states and actions, but they suggest to add the the explicit sum over actions back in the expectation over states (Eq. 4). Doing so result in a policy update considering actions not taken in the environment. In domains where Q is good, MAC results in lower variance, otherwise MAC performs worse.
  • Near-Optimal Representation Learning for Hierarchical Reinforcement Learning, Nachum et al, NeurIPS 2018. arXiv. Note: builds on HIRO, but focuses on optimal representations.
  • Data-Efficient Hierarchical Reinforcement Learning, Nachum et al, NeurIPS 2018. arXiv. Note: type of HRL called HIRO. High level policy gives low-level policy a goal state to reach.
  • Neural Ordinary Differential Equations, Chen et al, NeurIPS 2018. arXiv, code. Best paper award
  • Non-delusional Q-learning and value-iteration, Lu et al, NeurIPS 2018. proceedings. Best paper award.
  • Exploration by Random Network Distillation, Burda et al, 2018. arXiv
  • Revisiting the Arcade Learning Environment, Machado et al, 2017. arXiv. Note: known for suggesting sticky actions to make environment non-deterministic. Sticky action: with some prob eps, environment repeats previous action.
  • An Information-Theoretic Optimality Principle for Deep Reinforcement Learning, Leibfried et al, 2017. arXiv. Note: addresses problem of Q-value overestimation
  • Dueling Network Architectures for Deep Reinforcement Learning, Wang et al, 2015. arXiv
  • Deep Reinforcement Learning in Large Discrete Action Spaces, Dulac-Arnold, 2015. arXiv


  • Addressing Function Approximation Error in Actor-Critic Methods, Fujimoto et al, ICML 2018. arXiv. TD3 agent
  • The Mirage of Action-Dependent Baselines in Reinforcement Learning, Tucker et al, 2018. arXiv. Note: decomposes variance into 3 sources: from trajectory, action-dependent baseline, and state visitation. Conclusion: variance-reduction from action-dependent baseline can be minimal.
  • Backpropagation through the Void: Optimizing control variates for black-box gradient estimation, Grathwohl et al, ICLR 2018, arXiv. Note: action dependent baseline, builds on REBAR
  • REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models, Tucker et al, ICLR 2017. arXiv


  • Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols, Havrylov and Titov, NIPS 2017. arXiv. Note: EC with referential games, trained with REINFORCE and Gumbel-Softmax, shows hierarchy of language
  • Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors, Ilyas et al 2018, ICLR 2019 submission. openreview
  • Certified Defenses against Adversarial Examples, Raghunathan et al, 2018, arXiv
  • Speaker-Follower Models for Vision-and-Language Navigation, Fried et al, NIPS 2018, arXiv
  • Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies, Grusky et al, NAACL 2018, arXiv
  • Architectural Complexity Measures of Recurrent Neural Networks, Zhang et al, NIPS 2016, arXiv
  • Gradient Estimation Using Stochastic Computation Graphs, Schulman et al, 2016. arXiv
  • Variational Inference: A Review for Statisticians, Blei et al, 2018. arXiv
  • Variational Inference with Normalizing Flows, Rezende et al, 2016. arXiv
  • Large Scale GAN Training for High Fidelity Natural Image Synthesis, Brock et al, submission to ICLR 2019. arXiv
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al, 2018. arXiv
  • The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Maddison et al, 2017. arXiv. Note, the Concrete is equivalent to the Gumbel-Softmax. -Categorical Reparameterization with Gumbel-Softmax, Jang et al, 2017. arXiv. Note: Gumbel-Softmax is equivalent to the Concrete distribution.


  • Universal Transformers, Dehghani et al, 2018. arXiv, google blog post
  • Phrase-Based & Neural Unsupervised Machine Translation, Lample et al, EMNLP 2018. arXiv
  • Hybrid Reward Architecture for Reinforcement Learning, Seijen et al, 2017. arXiv.


  • Vehicle Communication Strategies for Simulated Highway Driving, Resnick et al, 2017, NIPS 2017 Workshop on Emergent Communication.
  • Emergent Communication through Negotiation, Cao et al, NIPS 2017 Workshop on Emergent Communication.
  • Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples, Athalye et al, ICML 2018. arXiv. Defeats 7 of 9 recently introduced adversarial defense methods. Won best paper at ICML.
  • Meta-Gradient Reinforcement Learning, Xu et al 2018, arXiv


  • Proximal Policy Optimization Algorithms, Schulman et al, 2018. arXiv, openai blog, OpenAIFive [blogpost] which applies scaled up PPO on Dota2
  • What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, Conneau et al, ACL 2018. arXiv. The authors go through 10 probing tasks to find out some of the things the embeddings capture, trained with various architectures.
  • Style Transfer Through Back-Translation, Prabhumoye et al, ACL 2018. arXiv
  • Hierarchical Neural Story Generation, Fan et al, ACL 2018. arXiv. Generate a short story based on a "prompt", impressive results. Also has some cool tricks, like model fusion, a different type of attention, k=10 sampling, etc.
  • Representation Learning for Grounded Spatial Reasoning, Janner et al, ACL 2018. arXiv
  • Generating Sentences by Editing Prototypes, Guu et al, ACL 2018. arXiv
  • A Stochastic Decoder for Neural Machine Translation, Schulz et al, ACL 2018. arXiv
  • The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing, Dror et al, ACL 2018. aclweb
  • Stock Movement Prediction from Tweets and Historical Prices, Xu and Cohen, ACL 2018. pdf
  • Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context, Khandelwal et al, ACL 2018. arXiv
  • Backpropagating through Structured Argmax using a SPIGOT, Peng et al, ACL 2018. arXiv
  • Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum, Levy et al, ACL 2018. arXiv


  • Self-Imitation Learning, Oh et al, 2018. arXiv. Performs on-policy A2C update, and off-polic SIL, which samples positive experiences from a replay buffer and uses a form of AC.
  • Improving Language Understanding with Unsupervised Learning, Radford et al, 2018. openai
  • Prioritized Experience Replay, Schaul et al, ICLR 2016. arXiv
  • Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, Wu et al, 2017. arXiv
  • Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach, Karakida et al, 2018. arXiv
  • On Learning Intrinsic Rewards for Policy Gradient Methods, Zheng et al, 2018. arXiv
  • Breaking the Softmax Bottleneck: A High-Rank RNN Language Model, Yang et al, ICLR 2018. openreview, arXiv. summary. Given a language model output matrix A over time, where each row is is the the vocabulary distribution given context, the authors hypothesize A must be high rank to be express complex language, and the single softmax is not expressive enough. They propose a mixture of many softmax.
  • Measuring the Intrinsic Dimension of Objective Landscapes, Li et al, ICLR 2018. openreview, arXiv, summary. Intrinsic Dimension is the minimal parameter subspace (projected to the total parameters) to achieve a certain performance. It is a measure of model-problem complexity.
  • Control of Memory, Active Perception, and Action in Minecraft, Oh et al, ICML 2016. arXiv
  • Multitask Learning, Rich Caruana, PhD thesis 1997. pdf. Work in the 90s on transfer learning! Chapter 5 discusses auxliary tasks for neural nets! 20 years before the UNREAL paper!
  • Neural Map: Structured Memory for Deep Reinforcement Learning, Parisotto and Salakhutdinov, ICLR 2018. arXiv. Instead of free external memory, have memory locations correlate with agent location, i.e. structured memory. Hugely outperforms memory nets and others on maze problems.
  • On the State of the Art of Evaluation in Neural Language Models, ICLR 2018. openreview. Some simple language models, like LSTM, actually achieve SOTA or near SOTA with proper hyperparams and simple additions, like shared embeddings and variational dropout (see Table 4 ablation).
  • Reinforcement Learning with Unsupervised Auxiliary Tasks, Jaderberg et al, ICLR 2017. openreview. Introduces the UNREAL model. See Caruana PhD thesis above from 1997, discusses auxiliary tasks for better representations!


  • Parameter Space Noise for Exploration, Plappert et al, ICLR 2018. arXiv. Instead of adding noise to action space, add noise to the FA's parameters for better exploration.
  • Continuous control with deep reinforcement learning, Lillicrap et al, ICLR 2016. arXiv. Introduced Deep Deterministic Policy Gradient (DDPG), an actor critic algorithm applicable to continuous action spaces, off-policy.
  • Deterministic Policy Gradient Algorithms, Silver et al, ICML 2014. pdf. DPG is the expected gradient of the action-value function, easier to estimate than the traditional stochastic policy gradient.
  • Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs, Murdoch et al, 2018, ICLR 2018. pdf, arXiv
  • Emergence Of Linguistic Communication From Referential Games With Symbolic And Pixel Input, Lazaridou et al, ICLR 2018. pdf
  • Emergent Communication in a Multi-Modal, Multi-Step Referential Game, Evtimova et al, ICLR 2018. arXiv, code
  • Neural Speed Reading via Skim-RNN, Seo et al, ICLR 2018. arXiv
  • Dynamic Word Embeddings for Evolving Semantic Discovery, Yao et al, 2017. arXiv


  • One Model To Learn Them All, Kaiser et al, 2017. arXiv
  • An Analysis of Temporal-Difference Learning with Function Approximation, Tsitsiklis and Van Roy, 1997. pdf
  • Steps Toward Artificial Intelligence, Minsky, 1961. pdf
  • Eye on the Prize, Nilsson, 1995. pdf
  • The Option-Critic Architecture, Bacon et al. arXiv
  • Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings. He et al, 2017. arXiv
  • Learning to Win by Reading Manuals in a Monte-Carlo Framework, Branavan et al, 2012. arXiv


  • Generating Sentences by Editing Prototypes, Guu et al, 2017. arXiv
  • SenGen: Sentence Generating Neural Variational Topic Model, Nallapati et al, 2017. arXiv
  • Learning Sparse Neural Networks through L0 Regularization, Louizos et al 2017. arXiv
  • Sparsity and the Lasso, Tibshirani and Wasserman, 2015. pdf. Note: related L0 paper above
  • Proving convexity, Loh 2013. pdf. Note: related to L0 paper above
  • Mathematics of Deep Learning, Vidal et al, 2017. arXiv
  • Bayesian Hypernetworks, Krueger et al, 2017. arXiv
  • SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents, Nallapati et al, 2016. arXiv
  • Learning Online Alignments with Continuous Rewards Policy Gradient, Luo et al 2016. arXiv
  • Asynchronous Methods for Deep Reinforcement Learning. Mnih et al, 2016. arXiv. Introduces A3C, Asyncrhonous Advantage Actor Critic
  • On The State of The Art In Neural Language Models, Anonymous, 2017. iclr pdf
  • Natural Language Inference with External Knowledge, Chen et al 2017. arXiv


  • Memory Augmented Neural Networks with Wormhole Connections, Gulcehre et al, 2017. arXiv
  • Emergence of Invariance and Disentangling in Deep Representations, Achille et al, 2017. arXiv
  • Distilling the Knowledge in a Neural Network, Hinton et al, 2015. arXiv
  • Seq2SQL: Generating Stuctured Queries From Natural Language Using Reinforcement Learning, Zhong et al, 2017. arXiv
  • Better Text Understanding Through Image-To-Text Transfer, Kurach, 2017. [arXiv](
  • Data Augmentation Generative Adversarial Networks, Antoniou et al, 2017. arXiv
  • Adversarial Training Methods for Semi-Supervised Text Classification, Miyato et al, 2017. arXiv
  • Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training, Anonymous, 2017. openreview
  • Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling, Inan et al 2017. arXiv
  • Building machines that learn and think for themselves, Botvinick et al, 2017. cambridge
  • Neural Discrete Representation Learning, va den Oord et al, 2017. arXiv
  • InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, Chen et al, 2016. arXiv, blog, code
  • Evolution Strategies, Otoro 2017, blog part 1, 2
  • Matrix Capsules with EM Routing. Anonymous (likely Hinton lab), 2017. openreview.
  • Dynamic Routing Between Capsules, Sabour et al, 2017. arXiv. code-keras, video review
  • Weighted Transformer Network for Machine Translation, Ahmed et al, 2017. arXiv
  • Unsupervised Machine Translation Using Monolingual Corpora Only, Lample et al, 2017. arXiv
  • Non-Autoregressive Neural Machine Translation, Gu et al, 2017. arXiv
  • Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe et al, 2017. arXiv


  • Adversarial Learning for Neural Dialogue Generation, Li et al, 2017. arXiv
  • Frustratingly Short Attention Spans in Neural Language Modeling, Daniluk et al, 2017. arXiv
  • Adversarial Training Methods for Semi-Supervised Text Classification, Miyato et al, 2017. arXiv
  • Progressive Growing of GANs for Improved Quality, Stability, and Variation, Karras et al, 2017. pdf
  • A Closer Look at Memorization in Deep Networks, Arpit et al, 2017. arXiv
  • Understanding deep learning requires rethinking generalization, Zhang et al, 2016. arXiv
  • The Loss Surfaces of Multilayer Networks, Choromanska et al, 2015. arXiv
  • Meta Learning Shared Hierarchies, Frans et al, 2017. arXiv, author blog
  • Mastering the game of Go without human knowledge, Silver et al, 2017. arXiv, blog
  • Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation, Sharma et al, 2017. arXiv
  • GuessWhat?! Visual object discovery through multi-modal dialogue, de Vries et al, 2017. arXiv
  • A Frame Tracking Model for Memory-Enhanced Dialogue Systems, Schulz et al, 2017. arXiv
  • A Deep Reinforced Model for Abstractive Summarization, Paulus et al, 2017. arXiv, author blog
  • (about ROUGE score for summarization) ROUGE: A Package for Automatic Evaluation of Summaries, Chin-Yew Lin, 2004. acl
  • Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al, 2017. arXiv
  • Language Modeling with Gated Convolutional Networks, Dauphin et al, 2017, arXiv
  • Convolutional Sequence to Sequence Learning, Gehring et al, 2017. arXiv
  • Emergence of Grounded Compositional Language in Multi-Agent Populations, Mordatch and Abbeel, 2017. arXiv, author blog. Note: related to Kottur et al 2017.


  • Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog, Kottur et al, 2017. arXiv, code
  • Opening the black box of Deep Neural Networks via Information, Schwartz-Ziv and Tishbly, 2017. arXiv, m-p review
  • End-to-end Neural Coreference Resolution, Lee et al, 2017. arXiv
  • Deep Reinforcement Learning for Mention-Ranking Coreference Models, Clark et al, 2016. arXiv
  • Oriented Response Networks, Zhou et al 2017. arXiv
  • Training RNNs as Fast as CNNs, Lei et al, 2017. arXiv
  • Quasi-Recurrent Neural Networks, Bradbury et al 2017. arXiv, author blog/code
  • A Deep Reinforcement Learning Chatbot, Serban et al, 2017. arXiv
  • Independently Controllable Factors, Thomas et al, 2017. arXiv
  • Attention Is All You Need, Vaswani et al, 2017. arXiv, code, google blog, reddit
  • Attention-over-Attention Neural Networks for Reading Comprehension, Cui et al 2017. arXiv, code
  • Get To The Point: Summarization with Pointer-Generator Networks, See et al 2017. arXiv, author blog, code
  • Massive Exploration of Neural Machine Translation Architectures, Britz et al 2017. arXiv
  • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017. arXiv, 'examples', code-torch, code-PyT


  • A Brief Survey of Deep Reinforcement Learning, Arulkumaran et al 2017. arXiv
  • Regularizing and Optimizing LSTM Language Models, Merity et al 2017. arXiv
  • Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets, Yang et al 2017. arXiv
  • Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders, Zhao et al 2017. arXiv
  • How to Train Your DRAGAN, Kodali et al 2017. arXiv
  • Improved Training of Wasserstein GANs, Gulrajani et al 2017. arXiv, blog, blog, code
  • Wasserstein Gan, Arjovsky et al 2017. arXiv, read-through, Kantorovich-Rubinstein duality, WGAN-tensorflow, blog/code
  • Reading Scene Text in Deep Convolutional Sequences, He et al, 2016. arXiv


  • Recurrent Batch Normalization, Cooijmans et al, 2017. arXiv, code-tf
  • An Actor-Critic Algorithm for Sequence Prediction, Bahdanau et al 2017. arXiv, code
  • Scheduled Sampling for Sequence Prediction with RNN, Bengio et al, 2015 arXiv, summary,
  • Hybrid computing using a neural network with dynamic external memory, published in Nature
  • Neural Turing Machine, arXiv
  • LEARNING END-TO-END GOAL-ORIENTED DIALOG, Bordes et al, 2017. arXiv, code
  • End-To-End Memory Networks, Sukhbaatar et al, 2015, arXiv
  • Memory Networks, arXiv
  • Deep Photo Style Transfer, arXiv
  • Matching Networks for One Shot Learning, Vinyals et al, NIPS 2016. arXiv. summary, code. karpathy notes, Colyer blog





Paper collections


Neural Networks Basics



  • Multi-Task Learning Objectives for Natural Language Processing, blog

Recurrent Neural Network (RNN)


Deep Reinforcement Learning

Online Courses


You can’t perform that action at this time.