<a href="https://colab.research.google.com/github/RoddyJaques/NNsDL-Nielsen/blob/main/Neural_Networks_and_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks and Deep Learning
Working through the exercises in  Michael Nielsen's book [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/).

##[Chapter 1](http://neuralnetworksanddeeplearning.com/chap1.html)
*__Exercise 1:__ Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant,* $c>0$ *. Show that the behaviour of the network doesn't change.*

For a perceptron:<br>
\begin{eqnarray}
  \mbox{output} = \left\{ 
    \begin{array}{ll} 
      0 & \mbox{if } w\cdot x + b \leq 0 \\
      1 & \mbox{if } w\cdot x + b > 0 
    \end{array}
  \right.
\end{eqnarray}

If each inequality is multiplied by a positive constant *c*:

\begin{eqnarray} 
  c\sum_j w_j x_j+cb \leq 0 \\
  c\sum_j w_j x_j+cb > 0  \\
\end{eqnarray}

The sign of the left hand side of the inequalities are unchanged, which determines the output of the perceptron and hence the output of perceptron in unchanged.<br><br>

*__Exercise 2:__ Suppose we have the same setup as the last problem - a network of perceptrons. Suppose also that the overall input to the network of perceptrons has been chosen. We won't need the actual input value, we just need the input to have been fixed. Suppose the weights and biases are such that* $w \cdot x + b \neq 0$ *for the input* $x$ *to any particular perceptron in the network. Now replace all the perceptrons in the network by sigmoid neurons, and multiply the weights and biases by a positive constant* $c>0$ *. Show that in the limit as* $c→∞$ *the behaviour of this network of sigmoid neurons is exactly the same as the network of perceptrons. How can this fail when* $w⋅x+b=0$ *for one of the perceptrons?*

For a sigmoid neuron with weights and biases multiplied by a positive constant $c>0$ :

\begin{eqnarray} 
  \mbox{output} = \frac{1}{1+\exp(-c(\sum_j w_j x_j+b))}
\end{eqnarray}

In the limit $c→∞$ and $w⋅x+b < 0$:

\begin{eqnarray} 
  \exp(-c(\sum_j w_j x_j+b)) → \exp(∞) → ∞
\end{eqnarray}
so:
\begin{eqnarray} 
  \frac{1}{1+\exp(-c(\sum_j w_j x_j+b))} → \frac{1}{1+∞} → 0
\end{eqnarray}

which is an equivalent result for the output a perceptron.<br>

In the limit $c→∞$ and $w⋅x+b > 0$:

\begin{eqnarray} 
  \exp(-c(\sum_j w_j x_j+b)) → \exp(-∞) → 0
\end{eqnarray}
so:
\begin{eqnarray} 
  \frac{1}{1+\exp(-c(\sum_j w_j x_j+b))} →\frac{1}{1+0} → 1
\end{eqnarray}

which is again an equivalent result for the output a perceptron.<br>


In the limit $c→∞$ and $w⋅x+b = 0$:
\begin{eqnarray} 
  \exp(-c(\sum_j w_j x_j+b)) → \exp(0) → 1
\end{eqnarray}
so:
\begin{eqnarray} 
  \frac{1}{1+\exp(-c(\sum_j w_j x_j+b))} →\frac{1}{1+1} → 0.5
\end{eqnarray}

for this case the output is different than the result for perceptron. Output would equal 0 for a perceptron.

*__Exercise 3:__There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.*

I'm going to use a perceptron layer because it requires the least maths. (I found a solution [here](https://nbviewer.org/github/nndl-solutions/NNDL-solutions/blob/master/notebooks/chap-1-using-neural-nets-to-recognize-handwritten-digits.ipynb) that uses a sigmoid layer.)

First consider the weights and bias for the first neuron, which will only be be activated for digits 8 and 9 as shown below.

For the old and new output layers for digit 8:

\begin{align}
output_{old} \approx
\begin{bmatrix}
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    1 \\
    0 
    \end{bmatrix} 
\; → \; output_{new} =
\begin{bmatrix}
    1 \\
    0 \\
    0 \\
    0 
    \end{bmatrix} 
\end{align}

And for digit 9:

\begin{align}
output_{old} \approx
    \begin{bmatrix}
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    1 
  \end{bmatrix}
\; → \; output_{new} =
\begin{bmatrix}
    1 \\
    0 \\
    0 \\
    1 
    \end{bmatrix} 
\end{align}

The weight vector can be zero for all digits except 0 and 9, as their input is irrelevant to this neuron, and because the behaviour is the same when . So we can use the below weight vector:

\begin{align}
w = 
\begin{bmatrix}
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    0 \\
    1 \\
    1 
    \end{bmatrix} 
\end{align}

Now consider the equation for output of a perceptron:
\begin{eqnarray}
  \mbox{output} = \left\{ 
    \begin{array}{ll} 
      0 & \mbox{if } w\cdot x + b \leq 0 \\
      1 & \mbox{if } w\cdot x + b > 0 
    \end{array}
  \right.
\end{eqnarray}

With the weight vector chosen above, $w⋅x = x_8 + x_9$

In the case of the digit being 8: <br>
\begin{eqnarray}
 0.99 \le x_8 ≤ 1 \\
 0 \le x_9 < 0.01
 \end{eqnarray}

And in the case of the digit being 9: <br>
\begin{eqnarray}
 0 \le x_8 < 0.01 \\
 0.99 \le x_9 \le 1
 \end{eqnarray}

Using this and the equation for a perceptron, a value for the bias can be found.
First consider the case when the neuron's output is 1, and the values for $x_8$ and $x_9$ are minimised:
\begin{eqnarray} 
 x_8 + x_9 + b > 0
\end{eqnarray}

\begin{eqnarray} 
 0.99 + 0 + b > 0
\end{eqnarray}

\begin{eqnarray} 
 0.99 + b > 0
\end{eqnarray}

Then consider the case when the neuron's output is 0, and the values for $x_8$ and $x_9$ are maximised:

\begin{eqnarray} 
 x_8 + x_9 + b \le 0
\end{eqnarray}

\begin{eqnarray} 
 0.01 + 0.01 + b \le 0
\end{eqnarray}

\begin{eqnarray} 
 0.02 + b \le 0
\end{eqnarray}

From this we find:
\begin{eqnarray}
     -0.99 < b \le -0.02
\end{eqnarray}

So for this neuron $b = -0.5$ will be an acceptable bias. 
Repeating the same steps for the other 3 neurons gives:

Neuron 1: $w = [0,0,0,0,0,0,0,0,1,1], b= -0.5$<br>
Neuron 2: $w = [0,0,0,0,1,1,1,1,0,0], b= -0.5$<br>
Neuron 3: $w = [0,0,1,1,0,0,1,1,0,0], b= -0.5$<br>
Neuron 4: $w = [0,1,0,1,0,1,0,1,0,1], b= -0.5$<br>

##[Chapter 2](http://neuralnetworksanddeeplearning.com/chap2.html)


##[Chapter 3](http://neuralnetworksanddeeplearning.com/chap3.html)


##[Chapter 4](http://neuralnetworksanddeeplearning.com/chap4.html)


##[Chapter 5](http://neuralnetworksanddeeplearning.com/chap5.html)


##[Chapter 6](http://neuralnetworksanddeeplearning.com/chap6.html)

http://neuralnetworksanddeeplearning.com/chap1.html#the_architecture_of_neural_networks