diff --git a/img/backprop.png b/img/backprop.png
new file mode 100644
index 0000000..ecff16a
Binary files /dev/null and b/img/backprop.png differ
diff --git a/img/loss.png b/img/loss.png
new file mode 100644
index 0000000..91c5a5b
Binary files /dev/null and b/img/loss.png differ
diff --git a/img/nonconvex.png b/img/nonconvex.png
new file mode 100644
index 0000000..0468f1e
Binary files /dev/null and b/img/nonconvex.png differ
diff --git a/img/sgd2d.png b/img/sgd2d.png
new file mode 100644
index 0000000..191149b
Binary files /dev/null and b/img/sgd2d.png differ
diff --git a/lab05-training.ipynb b/lab05-training.ipynb
index 2fd6442..cd55374 100644
--- a/lab05-training.ipynb
+++ b/lab05-training.ipynb
@@ -1,6 +1,529 @@
{
- "cells": [],
- "metadata": {},
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "# Backpropogation\n",
+ "- Training a network (backpropagation) consists of:\n",
+ " - Initializing weights at “random”.\n",
+ " - Compute the network forward (forward pass)\n",
+ " - Reduce loss by updating weights in opposite direction of gradient of the loss function.\n",
+ " - Repeat the process until an optimized set of weights are calculated."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "# Gradient Descent\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "# Loss Function\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "# Non-Convex Optimization\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Training a network\n",
+ "- Define the network\n",
+ "- Initialize the network with with random/pre-trained weights\n",
+ "- Choose a loss function\n",
+ "- Choose an optimizer\n",
+ "- Prepare Dataset\n",
+ "- Run back propogation algorithm.\n",
+ "- Evaluate the output"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "import mxnet as mx\n",
+ "from mxnet import gluon, nd, autograd\n",
+ "import numpy as np\n",
+ "ctx = mx.gpu()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Define the network"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "net = gluon.nn.Sequential()\n",
+ "\n",
+ "with net.name_scope(): #Returns a name space object managing a child :py:class:`Block` and parameter names.\n",
+ " net.add(gluon.nn.Dense(units=128, activation='relu'))\n",
+ " net.add(gluon.nn.Dense(units=64, activation='relu'))\n",
+ " net.add(gluon.nn.Dense(units=10))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Initialize the network"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "net.initialize(mx.init.Xavier(magnitude=2.24), force_reinit=True, ctx=ctx)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Choose a loss function"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "loss_fn = gluon.loss.SoftmaxCrossEntropyLoss()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Choose an Optimizer"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "trainer = gluon.Trainer(params=net.collect_params(), \n",
+ " optimizer='sgd', \n",
+ " optimizer_params={\"learning_rate\":0.01, \"momentum\": .9, \"wd\":.1})"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Prepare dataset"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((28, 28, 1), 5.0)"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "batch_size = 128\n",
+ "def transform(data, label):\n",
+ " return (data.astype(np.float32)/255, label.astype(np.float32))\n",
+ "\n",
+ "train_dataset = gluon.data.vision.MNIST(train=True, transform=transform)\n",
+ "val_dataset = gluon.data.vision.MNIST(train=False, transform=transform)\n",
+ "\n",
+ "train_data_loader = gluon.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)\n",
+ "val_data_loader = gluon.data.DataLoader(dataset=val_dataset, batch_size=batch_size, shuffle=False)\n",
+ "\n",
+ "(train_dataset[0][0].shape, train_dataset[0][1])\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "