Unified Bellman Equation for Causal Information and Value in Markov Decision Processes, Tiomkin and Tishby, arXiv
- Evaluating Theory of Mind in Question Answering, Nematzadeh et al, EMNLP 2018. arXiv
- An Off-policy Policy Gradient Theorem Using Emphatic Weightings, Imani et al, NeurIPS 2018. arXiv
- Doubly Robust Off-policy Value Evaluation for Reinforcement Learning, Jiang and Li, 2015. arXiv
- Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning, Thomas and Brunskill, 2016. arXiv
- Implicit Reparameterization Gradients, Figurnov et al, NeurIPS 2018. arXiv
- Approximately Optimal Approximate Reinforcement Learning, Kakade and Langford, 2002. pdf. Note: the paper which inspired the likes of TRPO
- Meta-Learning: A Survey, Vanschoren et al, 2018. arXiv
- Off-policy Learning with Recognizers, Precup et al, 2005. pdf
- Meta-Gradient Reinforcement Learning, Xu et al, 2018. arXiv.
- Expected Policy Gradients, Criosek et al, 2018. arXiv.
- Mean Actor Critic, Allen et al, 2018. arXiv,
web version. The usual policy gradient is an expectation over states and actions, but they suggest to add the the explicit sum over actions back in the expectation over states (Eq. 4). Doing so result in a policy update considering actions not taken in the environment. In domains where Q is good, MAC results in lower variance, otherwise MAC performs worse.
- Near-Optimal Representation Learning for Hierarchical Reinforcement Learning, Nachum et al, NeurIPS 2018. arXiv. Note: builds on HIRO, but focuses on optimal representations.
- Data-Efficient Hierarchical Reinforcement Learning, Nachum et al, NeurIPS 2018. arXiv. Note: type of HRL called HIRO. High level policy gives low-level policy a goal state to reach.
- Neural Ordinary Differential Equations, Chen et al, NeurIPS 2018. arXiv,
code. Best paper award
- Non-delusional Q-learning and value-iteration, Lu et al, NeurIPS 2018. proceedings. Best paper award.
- Exploration by Random Network Distillation, Burda et al, 2018. arXiv
- Revisiting the Arcade Learning Environment, Machado et al, 2017. arXiv. Note: known for suggesting sticky actions to make environment non-deterministic. Sticky action: with some prob eps, environment repeats previous action.
- An Information-Theoretic Optimality Principle for Deep Reinforcement Learning, Leibfried et al, 2017. arXiv. Note: addresses problem of Q-value overestimation
- Dueling Network Architectures for Deep Reinforcement Learning, Wang et al, 2015. arXiv
- Deep Reinforcement Learning in Large Discrete Action Spaces, Dulac-Arnold, 2015. arXiv
- BISIMULATION METRICS FOR CONTINUOUS MARKOV DECISION PROCESSES, Ferns et al, 2011. pdf
- Addressing Function Approximation Error in Actor-Critic Methods, Fujimoto et al, ICML 2018. arXiv. TD3 agent
- The Mirage of Action-Dependent Baselines in Reinforcement Learning, Tucker et al, 2018. arXiv. Note: decomposes variance into 3 sources: from trajectory, action-dependent baseline, and state visitation. Conclusion: variance-reduction from action-dependent baseline can be minimal.
- Backpropagation through the Void: Optimizing control variates for black-box gradient estimation, Grathwohl et al, ICLR 2018, arXiv. Note: action dependent baseline, builds on REBAR
- REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models, Tucker et al, ICLR 2017. arXiv
- Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols, Havrylov and Titov, NIPS 2017. arXiv. Note: EC with referential games, trained with REINFORCE and Gumbel-Softmax, shows hierarchy of language
- Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors, Ilyas et al 2018, ICLR 2019 submission. openreview
- Certified Defenses against Adversarial Examples, Raghunathan et al, 2018, arXiv
- Speaker-Follower Models for Vision-and-Language Navigation, Fried et al, NIPS 2018, arXiv
- Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies, Grusky et al, NAACL 2018, arXiv
- Architectural Complexity Measures of Recurrent Neural Networks, Zhang et al, NIPS 2016, arXiv
- Gradient Estimation Using Stochastic Computation Graphs, Schulman et al, 2016. arXiv
- Variational Inference: A Review for Statisticians, Blei et al, 2018. arXiv
- Variational Inference with Normalizing Flows, Rezende et al, 2016. arXiv
- Large Scale GAN Training for High Fidelity Natural Image Synthesis, Brock et al, submission to ICLR 2019. arXiv
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al, 2018. arXiv
- The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Maddison et al, 2017. arXiv. Note, the Concrete is equivalent to the Gumbel-Softmax. -Categorical Reparameterization with Gumbel-Softmax, Jang et al, 2017. arXiv. Note: Gumbel-Softmax is equivalent to the Concrete distribution.
- Universal Transformers, Dehghani et al, 2018. arXiv,
google blog post
- Phrase-Based & Neural Unsupervised Machine Translation, Lample et al, EMNLP 2018. arXiv
- Hybrid Reward Architecture for Reinforcement Learning, Seijen et al, 2017. arXiv.
- Vehicle Communication Strategies for Simulated Highway Driving, Resnick et al, 2017, NIPS 2017 Workshop on Emergent Communication.
- Emergent Communication through Negotiation, Cao et al, NIPS 2017 Workshop on Emergent Communication.
- Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples, Athalye et al, ICML 2018. arXiv. Defeats 7 of 9 recently introduced adversarial defense methods. Won best paper at ICML.
- Meta-Gradient Reinforcement Learning, Xu et al 2018, arXiv
- Proximal Policy Optimization Algorithms, Schulman et al, 2018. arXiv,
openai blog, OpenAIFive [
blogpost] which applies scaled up PPO on Dota2
- What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, Conneau et al, ACL 2018. arXiv. The authors go through 10 probing tasks to find out some of the things the embeddings capture, trained with various architectures.
- Style Transfer Through Back-Translation, Prabhumoye et al, ACL 2018. arXiv
- Hierarchical Neural Story Generation, Fan et al, ACL 2018. arXiv. Generate a short story based on a "prompt", impressive results. Also has some cool tricks, like model fusion, a different type of attention, k=10 sampling, etc.
- Representation Learning for Grounded Spatial Reasoning, Janner et al, ACL 2018. arXiv
- Generating Sentences by Editing Prototypes, Guu et al, ACL 2018. arXiv
- A Stochastic Decoder for Neural Machine Translation, Schulz et al, ACL 2018. arXiv
- The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing, Dror et al, ACL 2018. aclweb
- Stock Movement Prediction from Tweets and Historical Prices, Xu and Cohen, ACL 2018. pdf
- Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context, Khandelwal et al, ACL 2018. arXiv
- Backpropagating through Structured Argmax using a SPIGOT, Peng et al, ACL 2018. arXiv
- Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum, Levy et al, ACL 2018. arXiv
- Self-Imitation Learning, Oh et al, 2018. arXiv. Performs on-policy A2C update, and off-polic SIL, which samples positive experiences from a replay buffer and uses a form of AC.
- Improving Language Understanding with Unsupervised Learning, Radford et al, 2018. openai
- Prioritized Experience Replay, Schaul et al, ICLR 2016. arXiv
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, Wu et al, 2017. arXiv
- Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach, Karakida et al, 2018. arXiv
- On Learning Intrinsic Rewards for Policy Gradient Methods, Zheng et al, 2018. arXiv
- Breaking the Softmax Bottleneck: A High-Rank RNN Language Model, Yang et al, ICLR 2018. openreview, arXiv.
summary. Given a language model output matrix A over time, where each row is is the the vocabulary distribution given context, the authors hypothesize A must be high rank to be express complex language, and the single softmax is not expressive enough. They propose a mixture of many softmax.
- Measuring the Intrinsic Dimension of Objective Landscapes, Li et al, ICLR 2018. openreview, arXiv,
summary. Intrinsic Dimension is the minimal parameter subspace (projected to the total parameters) to achieve a certain performance. It is a measure of model-problem complexity.
- Control of Memory, Active Perception, and Action in Minecraft, Oh et al, ICML 2016. arXiv
- Multitask Learning, Rich Caruana, PhD thesis 1997. pdf. Work in the 90s on transfer learning! Chapter 5 discusses auxliary tasks for neural nets! 20 years before the UNREAL paper!
- Neural Map: Structured Memory for Deep Reinforcement Learning, Parisotto and Salakhutdinov, ICLR 2018. arXiv. Instead of free external memory, have memory locations correlate with agent location, i.e. structured memory. Hugely outperforms memory nets and others on maze problems.
- On the State of the Art of Evaluation in Neural Language Models, ICLR 2018. openreview. Some simple language models, like LSTM, actually achieve SOTA or near SOTA with proper hyperparams and simple additions, like shared embeddings and variational dropout (see Table 4 ablation).
- Reinforcement Learning with Unsupervised Auxiliary Tasks, Jaderberg et al, ICLR 2017. openreview. Introduces the UNREAL model. See Caruana PhD thesis above from 1997, discusses auxiliary tasks for better representations!
- Parameter Space Noise for Exploration, Plappert et al, ICLR 2018. arXiv. Instead of adding noise to action space, add noise to the FA's parameters for better exploration.
- Continuous control with deep reinforcement learning, Lillicrap et al, ICLR 2016. arXiv. Introduced Deep Deterministic Policy Gradient (DDPG), an actor critic algorithm applicable to continuous action spaces, off-policy.
- Deterministic Policy Gradient Algorithms, Silver et al, ICML 2014. pdf. DPG is the expected gradient of the action-value function, easier to estimate than the traditional stochastic policy gradient.
- Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs, Murdoch et al, 2018, ICLR 2018. pdf, arXiv
- Emergence Of Linguistic Communication From Referential Games With Symbolic And Pixel Input, Lazaridou et al, ICLR 2018. pdf
- Emergent Communication in a Multi-Modal, Multi-Step Referential Game, Evtimova et al, ICLR 2018. arXiv,
- Neural Speed Reading via Skim-RNN, Seo et al, ICLR 2018. arXiv
- Dynamic Word Embeddings for Evolving Semantic Discovery, Yao et al, 2017. arXiv
- One Model To Learn Them All, Kaiser et al, 2017. arXiv
- An Analysis of Temporal-Difference Learning with Function Approximation, Tsitsiklis and Van Roy, 1997. pdf
- Steps Toward Artificial Intelligence, Minsky, 1961. pdf
- Eye on the Prize, Nilsson, 1995. pdf
- The Option-Critic Architecture, Bacon et al. arXiv
- Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings. He et al, 2017. arXiv
- Learning to Win by Reading Manuals in a Monte-Carlo Framework, Branavan et al, 2012. arXiv
- Generating Sentences by Editing Prototypes, Guu et al, 2017. arXiv
- SenGen: Sentence Generating Neural Variational Topic Model, Nallapati et al, 2017. arXiv
- Learning Sparse Neural Networks through L0 Regularization, Louizos et al 2017. arXiv
- Sparsity and the Lasso, Tibshirani and Wasserman, 2015. pdf. Note: related L0 paper above
- Proving convexity, Loh 2013. pdf. Note: related to L0 paper above
- Mathematics of Deep Learning, Vidal et al, 2017. arXiv
- Bayesian Hypernetworks, Krueger et al, 2017. arXiv
- SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents, Nallapati et al, 2016. arXiv
- Learning Online Alignments with Continuous Rewards Policy Gradient, Luo et al 2016. arXiv
- Asynchronous Methods for Deep Reinforcement Learning. Mnih et al, 2016. arXiv. Introduces A3C, Asyncrhonous Advantage Actor Critic
- On The State of The Art In Neural Language Models, Anonymous, 2017. iclr pdf
- Natural Language Inference with External Knowledge, Chen et al 2017. arXiv
- Memory Augmented Neural Networks with Wormhole Connections, Gulcehre et al, 2017. arXiv
- Emergence of Invariance and Disentangling in Deep Representations, Achille et al, 2017. arXiv
- Distilling the Knowledge in a Neural Network, Hinton et al, 2015. arXiv
- Seq2SQL: Generating Stuctured Queries From Natural Language Using Reinforcement Learning, Zhong et al, 2017. arXiv
- Better Text Understanding Through Image-To-Text Transfer, Kurach, 2017. [arXiv](
- Data Augmentation Generative Adversarial Networks, Antoniou et al, 2017. arXiv
- Adversarial Training Methods for Semi-Supervised Text Classification, Miyato et al, 2017. arXiv
- Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training, Anonymous, 2017. openreview
- Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling, Inan et al 2017. arXiv
- Building machines that learn and think for themselves, Botvinick et al, 2017. cambridge
- Neural Discrete Representation Learning, va den Oord et al, 2017. arXiv
- InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, Chen et al, 2016. arXiv,
- Evolution Strategies, Otoro 2017, blog part 1, 2
- Matrix Capsules with EM Routing. Anonymous (likely Hinton lab), 2017. openreview.
- Dynamic Routing Between Capsules, Sabour et al, 2017. arXiv.
- Weighted Transformer Network for Machine Translation, Ahmed et al, 2017. arXiv
- Unsupervised Machine Translation Using Monolingual Corpora Only, Lample et al, 2017. arXiv
- Non-Autoregressive Neural Machine Translation, Gu et al, 2017. arXiv
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe et al, 2017. arXiv
- Adversarial Learning for Neural Dialogue Generation, Li et al, 2017. arXiv
- Frustratingly Short Attention Spans in Neural Language Modeling, Daniluk et al, 2017. arXiv
- Adversarial Training Methods for Semi-Supervised Text Classification, Miyato et al, 2017. arXiv
- Progressive Growing of GANs for Improved Quality, Stability, and Variation, Karras et al, 2017. pdf
- A Closer Look at Memorization in Deep Networks, Arpit et al, 2017. arXiv
- Understanding deep learning requires rethinking generalization, Zhang et al, 2016. arXiv
- The Loss Surfaces of Multilayer Networks, Choromanska et al, 2015. arXiv
- Meta Learning Shared Hierarchies, Frans et al, 2017. arXiv,
- Mastering the game of Go without human knowledge, Silver et al, 2017. arXiv,
- Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation, Sharma et al, 2017. arXiv
- GuessWhat?! Visual object discovery through multi-modal dialogue, de Vries et al, 2017. arXiv
- A Frame Tracking Model for Memory-Enhanced Dialogue Systems, Schulz et al, 2017. arXiv
- A Deep Reinforced Model for Abstractive Summarization, Paulus et al, 2017. arXiv,
- (about ROUGE score for summarization) ROUGE: A Package for Automatic Evaluation of Summaries, Chin-Yew Lin, 2004. acl
- Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al, 2017. arXiv
- Language Modeling with Gated Convolutional Networks, Dauphin et al, 2017, arXiv
- Convolutional Sequence to Sequence Learning, Gehring et al, 2017. arXiv
- Emergence of Grounded Compositional Language in Multi-Agent Populations, Mordatch and Abbeel, 2017. arXiv,
author blog. Note: related to Kottur et al 2017.
- Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog, Kottur et al, 2017. arXiv,
- Opening the black box of Deep Neural Networks via Information, Schwartz-Ziv and Tishbly, 2017. arXiv, m-p review
- End-to-end Neural Coreference Resolution, Lee et al, 2017. arXiv
- Deep Reinforcement Learning for Mention-Ranking Coreference Models, Clark et al, 2016. arXiv
- Oriented Response Networks, Zhou et al 2017. arXiv
- Training RNNs as Fast as CNNs, Lei et al, 2017. arXiv
- Quasi-Recurrent Neural Networks, Bradbury et al 2017. arXiv,
- A Deep Reinforcement Learning Chatbot, Serban et al, 2017. arXiv
- Independently Controllable Factors, Thomas et al, 2017. arXiv
- Attention Is All You Need, Vaswani et al, 2017. arXiv,
- Attention-over-Attention Neural Networks for Reading Comprehension, Cui et al 2017. arXiv,
- Get To The Point: Summarization with Pointer-Generator Networks, See et al 2017. arXiv,
- β-VAE: LEARNING BASIC VISUAL CONCEPTS WITH A CONSTRAINED VARIATIONAL FRAMEWORK, Higgins et al 2017. pdf
- Massive Exploration of Neural Machine Translation Architectures, Britz et al 2017. arXiv
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017. arXiv, 'examples',
- A Brief Survey of Deep Reinforcement Learning, Arulkumaran et al 2017. arXiv
- Regularizing and Optimizing LSTM Language Models, Merity et al 2017. arXiv
- Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets, Yang et al 2017. arXiv
- Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders, Zhao et al 2017. arXiv
- How to Train Your DRAGAN, Kodali et al 2017. arXiv
- Improved Training of Wasserstein GANs, Gulrajani et al 2017. arXiv,
- Wasserstein Gan, Arjovsky et al 2017. arXiv,
- Reading Scene Text in Deep Convolutional Sequences, He et al, 2016. arXiv
- Recurrent Batch Normalization, Cooijmans et al, 2017. arXiv,
- An Actor-Critic Algorithm for Sequence Prediction, Bahdanau et al 2017. arXiv,
- Scheduled Sampling for Sequence Prediction with RNN, Bengio et al, 2015 arXiv,
- Hybrid computing using a neural network with dynamic external memory, published in Nature
- Neural Turing Machine, arXiv
- LEARNING END-TO-END GOAL-ORIENTED DIALOG, Bordes et al, 2017. arXiv,
- End-To-End Memory Networks, Sukhbaatar et al, 2015, arXiv
- Memory Networks, arXiv
- Deep Photo Style Transfer, arXiv
- Matching Networks for One Shot Learning, Vinyals et al, NIPS 2016. arXiv.
- Optimization As A Model For Few-Shot Learning, Sachin Ravi and Hugo Larochelle, ICLR 2017. openreview, video
- NIPS 2016 Tutorial:Generative Adversarial Networks, annotated,arXiv, blog/code
- Fully Character-Level Neural Machine Translation without Explicit Segmentation, annotated, arXiv
- Neural Machine Translation by Jointly Learning to Align and Translate, annotated, arXiv
- Sequence to Sequence Learning with Neural Networks, annotated, arXiv
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, arXiv
- Implicit Discourse Relation Detection via a Deep Architecture with Gated Relevance Network, annotated, acl
- Learning Structured Output Representation using Deep Conditional Generative Models, Sohn et al 2015. (Conditional VAE) nips, blog/code, code
- Auto-Encoding Variational Bayes, annotated, arXiv, blog/code, - Semi-supervised Variational Autoencoders for Sequence Classification, annotated, arXiv
- Autoencoder review by Keras author Francois Chollet
- UCI machine learning repository. 360 datasets, some very large. Nice sorting feature, such as ">1000 instance/classification/text" results in 14 data sets
- ["Awesome deep learning papers"]https://github.com/terryum/awesome-deep-learning-papers/), a collection of 100 best papers from past few years
- Paper collection by songrotek
- Nature Review article. Lecun, Bengio, Hinton. 2015
- Good short overview
- Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
- Extensive overview
Neural Networks Basics
- Michael Nielsen book on NN
- Hacker's guide to Neural Networks. Andrej Karpathy blog
- Visualize NN training
- A Gentle Introduction to Backpropagation. Sathyanarayana (2014)
- Learning representations by back-propagating errors. Hinton et al, 1986
- Seminal paper by Hinton et al on back-propagation.
- The Backpropagation Algorithm
- Longer tutorial on the topic, 34 pages
- Overview of various optimization algorithms
- Multi-Task Learning Objectives for Natural Language Processing, blog
Recurrent Neural Network (RNN)
- Blog intro, tutorial
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Cho et al. 2014)
- Character-Aware Neural Language Models. Kim et al. 2015.
- The Unreasonable Effectiveness of Recurrent Neural Networks. Karpathy
- Indepth, examples in vision and NLP. Provides code
- Sequence-to-Sequence Learning with Neural Networks. Sutskever et al (2014)
- Ground-breaking work on machine translation with RNN and LSTM
- Training RNN. Sutskever thesis. 2013
- Indepth, self-contained, 85 pages
- Understanding Natural Language with Deep Neural Networks Using Torch (2015)
- See part on predicting next word with RNN.
- LSTM BASED RNN ARCHITECTURES FOR LARGE VOCABULARY SPEECH RECOGNITION
- Awesome Recurrent Neural Networks
- Curated list of RNN resources
- Karpathy cs231 review
- Character-level Convolutional Networks for Text Classification
- Collobert. Natural Language Processing (Almost) from Scratch (2011)
- Spurred interest in applying CNN to NLP.
- Multichannel Variable-Size Convolution for Sentence Classification. Yin, 2015
- Interesting, borrows multichannel from image CNN, where each channel is a different word embedding.
- A CNN for Modelling Sentences. Kalchbrenner et al, 2014
- Dynamic k-max pooling for variable length sentences.
- Semantic Relation Classification via Convolutional Neural Networks with Simple Negative Sampling. Xu et al, 2015
- Text Understanding from Scratch. Zhang, LeCunn. (2015)
- Kim. Convolutional Neural Networks for Sentence Classification (2014)
- Sensitivity Analysis of (And Practitioner's Guide to) CNN for Sentence Classification. Zhang, Wallace (2015)
- Relation Extraction: Perspective from Convolutional Neural Networks. Nguyen, Grishman (2015)
- Convolutional Neural Network for Sentence Classification. Yahui Chen, 2015
- Master's thesis, University of Waterloo
Deep Reinforcement Learning
- Playing Atari with Deep Reinforcement Learning. Mnih et al. (2014)
- Youtube Demo
- Simple Reinforcement Learning with TensorFlow series, part 0
- Basic DQN in Keras,
- Minimal and clean examples,
- Demystifying Deep RL,
- Berkeley course on DRL,
- Deep Learning. Udacity, 2015
- Very brief. It is more about getting a feel for DL and specifically about using TensorFlow for DL.
- Convolutional Neural Networks for Visual Recognition. Stanford, 2016
- Neural Network Course. Université de Sherbrooke, 2013
- Machine Learning Course, University of Oxford(2014-2015)
- Deep Learning for NLP, Stanford (2015)
- Click "syllabus" for full material
- Stanford Deep Learning tutorials
- From basics of Machine Learning, to DNN, CNN, and others.
- Includes code.