Skip to content

Latest commit

 

History

History
211 lines (162 loc) · 13.2 KB

readme.md

File metadata and controls

211 lines (162 loc) · 13.2 KB

This is a list of papers that use the Neural Tangent Kernel (NTK). In each category, papers are sorted chronologically. Some of these papers were presented in the NTK reading group during the summer 2019 at the University of Oxford.

We used hypothes.is to some extent, see this for instance. There are notes for a few of the papers, which you can find linked below the relevant papers.

Schedule

  • 2/08/2019 [notes] Neural Tangent Kernel: Convergence and Generalization in Neural Networks.
  • 9/08/2019 [notes] Gradient Descent Finds Global Minima of Deep Neural Network.
  • 16/08/2019 Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks + insights from Gradient Descent Provably Optimizes Over-parameterized Neural Networks.
  • 23/08/2019 On Lazy Training in Differentiable Programming
  • 13/09/2019 Generalization bounds of stochastic gradient descent for wide and deep networks
  • 18/10/2019 [notes] Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

Neural tangent kernel

Optimization

Infinite limit

  • Neural Tangent Kernel: Convergence and Generalization in Neural Networks -- link
    • Notes
    • 06/2018
    • Original NTK paper.
    • Exposes the idea of the NTK for the first time, although the proof that the Kernel in the limit is deterministic is done tending the number of neurons of each layer to infinity, layer by layer sequentially.
    • It proves positive definiteness of the kernel for certain regimes, thus proving you can optimize to reach a global minimum at a linear rate.

Finite results

Lazy training

  • On Lazy Training in Differentiable Programming -- link
    • 12/2018
    • They show that NTK regime can be controlled by rescaling the model, and show (experimentally) that neural nets in practice perform better than those in lazy regime.
    • Also this seems to be independent of width. So scaling the model is a much easier way to get to lazy training, versus the infinite width + infinitesimal learning rate route??
  • Kernel and deep regimes in overparametrized models -- link
    • 06/2019
    • Large initialization leads to kernel/lazy regime
    • Small initialization leads to deep/active/adaptive regime, which can sometimes lead to better generalization. They claim this is the regime that allows one to "exploit the power of depth", and thus is key to understanding deep learning.
    • The systems they analyze in detail are rather simple (like matrix completion) or artificial (like a very ad-hoc type of neural network)

Generalization

  • Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks -- link
    • 05/2019
    • Seems very similar to the one above. What are the differences? Just that this is SGD vs GD in the above paper?
    • Improves on the Arora2019 paper showing generalization bounds for NTK.
    • I’d be interested in understanding the connection of their bound to classical margin and pac bayes bounds for kernel regression.
    • They don’t show any plots demonstrating how good their bounds are, which probably means they are vacuous though...

Others

  • On the Inductive Bias of Neural Tangent Kernels -- link
    • 05/2019
    • This is just about properties of NTK (so not studying NNs directly).
    • They find that the NTK model has different type of stability to deformations of the input than other NNGPs, and better approximation properties (whatever that means)

ToClassify

  • Enhanced Convolutional Neural Tangent Kernels -- link
    • 11/2019
    • Enhances the NTK for convolutional networks of "On Exact Computation..." by adding some implicit data augmentation to the kernel that encodes some kind of local translation invariance and horizontal flipping.
    • They have experiments that show good empirical performance, in particular they get 89% accuracy for CIFAR-10, matching AlexNet. This is the first time a kernel gets this results.

Some notes

  • NTK depends on initialization.