# Training a neural net

## Step 1: Define the neural network

Our neural net has one neuron with one input and hence one weight. The weight is our single parameter.

Goal: We want to _train_ the neural net to return the desired output for a given input.

In [13]:
var y = w * x;

## Step 2: Pick random start values for our parameters

To keep things simple I will be the random number generator.

In [14]:
var w = 0.5;

## Step 3: Training data

We need some.

In [15]:
double x = 1;          // input
double expectedY = 2;  // expected output, want our function to return this value for our given input

## Step 4: Calculate actual output

In [16]:
var y = w * x;
y

## Step 5: Calculate the loss

Want to quantize how far away we are from our goal. So we can then minimize the loss.

In [17]:
var loss = y - expectedY;
loss

## Step 6: Find the gradient of the loss function with respect to each parameter

Want to know which direction we need to nudge our parameters in to reduce the loss. Calculus to the rescue! Could do dl/dw straight up in this simple case, but not possible for more complex examples so lets break it down. Also allows _automating_ the process.

```
dl/dw = d((w * x) - expected)/dw

l = y - expected => dl/dy = 1
y = w * x        => dy/dw = x
Chain rule       => dl/dw = dl/dy * dy/dw = 1 * x = x
```

In [18]:
var grad = x;
grad

## Step 7: Adjust parameters

In [19]:
w = w - 0.01 * grad

`0.01` is a magic number called the "learning rate"

## Step 8: Rinse and repeat using the new parameter values...

...until the loss is acceptably small. Profit.

## However

If we calculate the actual output again, we see that it has gotten further away from the expected output.

In [30]:
y = w * x;
y

And if we calculate the loss again, we can see that it has gotten bigger (in absolute terms)

In [26]:
loss = y - expectedY;
loss

It turns out, that, depending on our training data, which way round we do the subtraction matters. To avoid having this problem, we can square the difference.

In [27]:
loss = Math.Pow(y - expectedY, 2);
loss

But that makes the maths more complicated:

```
dl/dw = d(((w * x) - expected)^2)/dw

l = z^2              => dl/dz = 2z
z = y - expected     => dz/dy = 1
Chain rule           => dl/dy = dl/dz * dz/dy = 2z
y = w * x            => dy/dw = x
Chain rule           => dl/dw = dl/dy * dy/dw = 2zx = 2 * (y - expected) * x
```

In [28]:
grad = 2d * x * (y - expectedY)

Nudge again

In [29]:
w = w - 0.01 * grad 

## Putting it all together

In [42]:
double w = 0.5;
double x = 1;
double expectedY = 2;
double y;
double loss;
double grad;


In [43]:

for (int i = 0; i < 100; i++)
{
    y = w * x;                              // forward pass
    loss = Math.Pow(y - expectedY, 2);
    Console.WriteLine(loss);
    grad = 2d * x * (y - expectedY);        // backward pass
    w = w - 0.01 * grad;
}

2.25
2.1609
2.07532836
1.993145356944
1.9142168008090175
1.83841381549698
1.7656126284033
1.6956943683185295
1.6285448713331159
1.5640544944283243
1.5021179364489625
1.4426340661655834
1.3855057571454266
1.3306397291624679
1.2779463958876338
1.2273397186104837
1.1787370657535086
1.1320590779496693
1.0872295384628625
1.044175248739733
1.0028259088896396
0.9631140028976102
0.9249746883828649
0.8883456907229036
0.8531672013702765
0.8193817801960135
0.7869342617002515
0.7557716649369216
0.7258431070054195
0.697099719968005
0.669494571057272
0.642982586043404
0.6175204756360854
0.5930666648008964
0.5695812248747808
0.5470258083697395
0.5253635863582977
0.5045591883385091
0.4845786444803042
0.4653893301588842
0.4469599126845924
0.42926030014228267
0.4122615922566483
0.3959360332032852
0.38025696628843514
0.36519879042341313
0.3507369183226459
0.33684773635706916
0.3235085659973291
0.3106976267838349
0.298394000763195
0.2865775983329725
0.27522912543898687
0.26433005207160293
0.25386258200956

In [44]:
y

## One more thing

Can extend this to more complicated functions, as long as they are differentiable; neural nets are such functions with a particular structure.

$f(\sum w_i x_i + b)$

In [46]:
double[] weights = [0.5, 0.5];
double[] xs =      [1  , 2  ];
var actualYs = xs.Zip<double, double, double>(weights, (x, w) => x * w)
                 .Sum();
actualYs


* A cool thing: can automate the gradient calculation = "autograd"

* Backpropagation & gradient descent

* Hope to get good outputs for inputs that the network wasn't trained on