Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.


Backpropagation is introduced in the intro-to-backprop notebook.

During the lecture you will start to get an appreciation of hyperparameters used in neural networks

  • learning rate - used to scale the magnitude of the gradient update
  • batch size - the number of samples used to calculate gradients
  • weight initialization
  • choice of loss function
  • why gradient clipping & standardization is a best practice


Examples of numpy neural networks are given in the classification and regression notebooks for single hidden layer neural networks.

The practical for this class is to:

  • extend the neural nets to two hidden layers
  • extend the neural nets to n hidden layers
  • implement SGD (batching)
  • optimize the training (learning rate decay, early stopping etc)
  • gradient clipping

Resources & further reading

Tensorflow Neural Network Playground

Backprop theory

Calculus on Computational Graphs: Backpropagation

CS231n - Backprop

Classification neural net from scratch


Understanding softmax and the negative log-likelihood

You can’t perform that action at this time.