<div style="text-align: right"><sub>This notebook is distributed under the <a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank">Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license</a>.</sub></div>
<h1>Hands on Machine Learning  <span style="font-size:10px;"><i>by <a href="https://webgrec.ub.edu/webpages/000004/ang/dmaluenda.ub.edu.html" target="_blank">David Maluenda</a></i></span></h1>

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a href="https://atenea.upc.edu/course/view.php?id=85709" target="_blank">
      <img src="https://github.com/dmaluenda/hands_on_machine_learning/raw/master/resources/upc_logo_49px.png" width="130"/>
    </a>
  </td>
  <td>
  </td>
  <td>   <!-- gColab -->
    <a href="https://colab.research.google.com/github/dmaluenda/hands_on_machine_learning/blob/master/01_Basics_NeuralNetworks.ipynb" target="_blank">
      <img src="https://github.com/dmaluenda/hands_on_machine_learning/raw/master/resources/colab_logo_32px.png" />
      Run in Google Colab
    </a>
  </td>
  <td>   <!-- github -->
    <a href="https://github.com/dmaluenda/hands_on_machine_learning/blob/master/01_Basics_NeuralNetworks.ipynb" target="_blank">
      <img src="https://github.com/dmaluenda/hands_on_machine_learning/raw/master/resources/github_logo_32px.png" />
      View source on GitHub
    </a>
  </td>
  <td>   <!-- download -->
    <a href="https://raw.githubusercontent.com/dmaluenda/hands_on_machine_learning/master/01_Basics_NeuralNetworks.ipynb"  target="_blank" download="01_Basics_NeuralNetworks">
      <img src="https://github.com/dmaluenda/hands_on_machine_learning/raw/master/resources/download_logo_32px.png" />
      Download notebook
      </a>
  </td>
</table>

# $\text{I}$. Neural Networks with Pure Python

Hands on "Machine Learning on Classical and Quantum data" course of
[Master in Photonics - PHOTONICS BCN](https://photonics.masters.upc.edu/en/general-information)
[[UPC](https://photonics.masters.upc.edu/en) +
[UB](https://www.ub.edu/web/ub/en/estudis/oferta_formativa/master_universitari/fitxa/P/M0D0H/index.html?) +
[UAB](https://www.uab.cat/web/estudiar/la-oferta-de-masteres-oficiales/informacion-general-1096480309770.html?param1=1096482863713) +
[ICFO](https://www.icfo.eu/lang/studies/master-studies)].

Tutorial 1

This notebook shows how to:
- implement the forward-pass (prediction, inference or evaluation) of a fully connected neural network in a few lines of pure python
- understand the activation functions meanings and usages
- do that efficiently using batches
- illustrate the results for randomly initialized neural networks
- understand the role of weights and biases in networks


**References**:

[1] [Machine Learning for Physicists](https://machine-learning-for-physicists.org/) by Florian Marquardt.<br>
[2] [NumPy](https://numpy.org/doc/stable/user/whatisnumpy.html): the fundamental package for scientific computing in Python.<br>
[3] [Matplotlib](https://matplotlib.org/stable/tutorials/introductory/usage.html): a comprehensive library for creating static, animated, and interactive visualizations in Python.


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#0.-Imports:-only-numpy-and-matplotlib" data-toc-modified-id="0.-Imports:-only-numpy-and-matplotlib-0">0. Imports: only numpy and matplotlib</a></span></li><li><span><a href="#1.-A-very-simple-neural-network-(no-hidden-layer)" data-toc-modified-id="1.-A-very-simple-neural-network-(no-hidden-layer)-1">1. A very simple neural network (no hidden layer)</a></span><ul class="toc-item"><li><span><a href="#1.1-Simple-implementation-(single-input-data-processing)" data-toc-modified-id="1.1-Simple-implementation-(single-input-data-processing)-1.1">1.1 Simple implementation (single input data processing)</a></span></li><li><span><a href="#1.2-More-compact-(working-with-functions)" data-toc-modified-id="1.2-More-compact-(working-with-functions)-1.2">1.2 More compact (working with functions)</a></span></li><li><span><a href="#1.3-Multiple-inputs" data-toc-modified-id="1.3-Multiple-inputs-1.3">1.3 Multiple inputs</a></span></li><li><span><a href="#1.4-Activation-functions" data-toc-modified-id="1.4-Activation-functions-1.4">1.4 Activation functions</a></span></li></ul></li><li><span><a href="#2.-Network-with-one-hidden-layer" data-toc-modified-id="2.-Network-with-one-hidden-layer-2">2. Network with one hidden layer</a></span></li><li><span><a href="#3.-'batch'-processing-of-Neural-Networks" data-toc-modified-id="3.-'batch'-processing-of-Neural-Networks-3">3. 'batch' processing of Neural Networks</a></span><ul class="toc-item"><li><span><a href="#3.1-Matrix/Vector/Tensor-multiplication" data-toc-modified-id="3.1-Matrix/Vector/Tensor-multiplication-3.1">3.1 Matrix/Vector/Tensor multiplication</a></span></li><li><span><a href="#3.2-Batch-implementation-of-one-hidden-layer-network" data-toc-modified-id="3.2-Batch-implementation-of-one-hidden-layer-network-3.2">3.2 Batch implementation of one hidden layer network</a></span></li><li><span><a href="#3.3-Data-preprocessing-for-batch-implementation" data-toc-modified-id="3.3-Data-preprocessing-for-batch-implementation-3.3">3.3 Data preprocessing for batch implementation</a></span></li><li><span><a href="#3.4-A-network-with-MANY-hidden-layers" data-toc-modified-id="3.4-A-network-with-MANY-hidden-layers-3.4">3.4 A network with MANY hidden layers</a></span></li></ul></li><li><span><a href="#4.-Fancy-visualization-of-Neural-Networks-with-Pure-Python" data-toc-modified-id="4.-Fancy-visualization-of-Neural-Networks-with-Pure-Python-4">4. Fancy visualization of Neural Networks with Pure Python</a></span><ul class="toc-item"><li><span><a href="#4.1-Some-internal-routines-for-fancy-plotting-the-network" data-toc-modified-id="4.1-Some-internal-routines-for-fancy-plotting-the-network-4.1">4.1 Some internal routines for fancy plotting the network</a></span></li><li><span><a href="#4.2-No-hidden-layer-NN" data-toc-modified-id="4.2-No-hidden-layer-NN-4.2">4.2 No hidden layer NN</a></span></li><li><span><a href="#4.3-Deep-dense-NN" data-toc-modified-id="4.3-Deep-dense-NN-4.3">4.3 Deep dense NN</a></span></li><li><span><a href="#4.4-More-layers-and-some-activation-function-combination" data-toc-modified-id="4.4-More-layers-and-some-activation-function-combination-4.4">4.4 More layers and some activation function combination</a></span></li><li><span><a href="#4.5-Something-not-random" data-toc-modified-id="4.5-Something-not-random-4.5">4.5 Something not random</a></span></li></ul></li></ul></div>

## 0. Imports: only numpy and matplotlib

In [1]:
# "numpy" library for linear algebra
import numpy as np

# "matplotlib" for plotting
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['figure.dpi'] = 300  # highres display
from matplotlib.axes._axes import _log as mpl_ax_logger
mpl_ax_logger.setLevel('ERROR')  # ignore warnings
from mpl_toolkits.axes_grid1.inset_locator import inset_axes  # for nice inset

# time control to count it and manage it
from time import time

# just to play like in a GUI
from ipywidgets import interact, Text

## 1. A very simple neural network (no hidden layer)

The behavior of a neural network (NN) with $N_0$ input neurons and $N_1$ output neurons (no hidden layer) is

\begin{equation}
z_i = \sum_j^{N_0} w_{ij} x_j + b_i \quad ; \quad i=1\dots N_1
\label{eq:simpleNN}
\end{equation}

\begin{equation}
y_i = f(z_i)
\label{eq:actFunction}
\end{equation}

where $x_j$ is the input value of the $j$-th input neuron,
$y_i$ is the output value of the $i$-th output neuron,
$w_{ij}$ is the weight of the connection between the $j$-th input neuron with the $i$-th output neuron,
$b_i$ is the bias of the $i$-th output neuron,
$z_i$ is the linear output of the layer (linear superposition between inputs and weights and biases), 
and $f(·)$ is the activation function (usually it is non-linear: for instance a sigmoid function).

Notice that we can define a matrix $w$ of size $N_1\times N_0$ (rows $\times$ columns),
which contains all $w_{ij}$ connection weights.
In the same way, we can condensate all $x_i$ input neurons and $b_j$ biases in $x$ and $b$ column vectors of size $N_0$ and $N_1$, respectively.
Thus, Eq. (1) can be seen as a simple matrix multiplication and a vector sum.

### 1.1 Simple implementation (single input data processing)

Implement a basic neural network of 3 input and 2 output neurons. Use the sigmoid function $$f_{sig}(z)=\frac{1}{1+e^{-z}}$$ as activation function.

Use random values for weights and biases, in such a way they fall in the range $[-3, 3]$.

> You can use [`np.random.uniform(low, high, size)`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.uniform.html) to generate the random numbers.

Evaluate that neural network (defined by that weights, biases and activation function) for the input vector $x=(0.5, 0.3, 0.2)$

Print the input vector $x$, the weights $w$, the biases $b$, the linear superposition $z$ and the final result $y$, as well as, their shapes (sizes)

Check the shape of every array. Do they make sense?

### 1.2 More compact (working with functions)

It is very convenient to work with specific functions, instead of code snippers (small parts of the code that typically are copy/pasted everywhere).

Let's still stay with the simplest network, thus you can use the code above, but define a function to evaluate the network the $y$ output vector (return) for a given (arguments) $x$ input vector, $w$ weights matrix, and $b$ biases vector.

In [None]:
def my_function(some_arguments):  # change the name and the arguments of this function
    
    # my code (same than a couple of cells above)
    
    return # return the result

Let's play with a different number of neurons. We want to visualize the behavior of having different weights and biases on a neural network.
A good approach to that is working with 2 input and 1 output neurons.

> **NOTE**: Neural Networks with 2 input neurons and 1 output neuron are ideal to illustrate how it
works, because it allows us to visualize its behavior in a single picture. Let's see.

Thus, define a weights and biases matrices to deal with 2 inputs and 1 output neural network, where the weights cover the range $[-20, 20]$, whereas biases cover the range $[-1, 1]$

Uses the function above to get the result for the input $x=(-0.2, 0.1)$ and print the result. Check the range and the size (shape) of the result.

### 1.3 Multiple inputs

Let's see how the network acts with many different inputs $x$.

Explore pairs of $x=(x_1, x_2)$ values from -0.5 to 0.5, having $m=300$ different values each, and apply the network to every pair of $x$ input values.

The result have to be a $300\times300$ array. Plot it with [`plt.imshow()`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html) and properly set the axis labels to cover the range $[-0.5, 0.5]$. Also, add a color bar to check the output range.

> Initialize an array with the [`np.zeros(size)`](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html) function and, then, you can use a couple of `for` loops to fill this numpy array ([indexing and slicing](https://numpy.org/doc/stable/user/basics.indexing.html)).

What represent this image? What represent the axis of that image and what represent the colors.

Which kind of image is this? Does it has any specific orientation? Recall the sigmoid function.

Print the weights and biases. Could you relate this values with the image?

### 1.4 Activation functions

Activation functions are a key point on neural networks, since they introduce nonlinearity on the computations. There are many types of activation functions, where some work very well for some problems and others for other problems kinds.

Some info about activation functions:

[https://en.wikipedia.org/wiki/Activation_function](https://en.wikipedia.org/wiki/Activation_function)

[https://www.analyticsvidhya.com/blog/2021/04/activation-functions-and-their-derivatives-a-quick-complete-guide](https://www.analyticsvidhya.com/blog/2021/04/activation-functions-and-their-derivatives-a-quick-complete-guide)

Make a new function to implement a neural network layer, like before, but using another activation function and check its behavior (basically, repeat the last work).

Check the differences between this resulting image and the one before. Orientation? Output range?

## 2. Network with one hidden layer

Let's increase the complexity of the neural network by adding a hidden layer with some inner neurons.

The idea here is to have multiple weight matrices to connect one layer with the next (this is one weight matrix for each pair of subsequent layers).

The function that "applies a layer" (i.e. goes from one layer to the next) is exactly the same as the function evaluating the simple network made before.

Let's work with a three layers network, thus with one hidden layer in addition to the input and the output layers. Since we want to visualize the resulting application in an image like before, we will still dealing with 2 input and 1 output neurons. However, let's set **30 hidden neurons** in the inner layer.

Generate the weights matrices and the biases vectors with random numbers, like before. Which size (shape) should they have?

To easily visualize the behavior of the hidden layer, make the range of the first weights matrix larger than the rest of the numbers. Let's say $[-10, 10]$ for the weights connections from input to hidden layer, whereas leaving $[-1, 1]$ for the rest.

Ok, now, to apply the whole neural network, we have to apply every layer, step by step.

Thus, create a function to return the $y$ final result for a given (arguments) $x$ input vector, $w^{l=1}$ weights matrix from input to hidden layers, $b^{l=1}$ biases vector of the hidden layer, $w^{l=2}$ weights matrix from hidden to output layers, and $b^{l=2}$ biases vector of the output layer.

Inside this function, you should use a couple of times the single layer function made before.

Check the function you have just done above by applying the input vector $x=(0.3, -0.2)$, and print the result

Ok, let's apply the network to a range of input pairs, like before.

Generate an image showing the result of the network when applied to pairs of inputs from -0.5 to 0.5, using 300 samples in each direction, like before.

Check the elapsed time to do fill the $y$ output array (the resulting image).

How long it took? What changed? The loop size or what is done in every iteration?

This is just for one single hidden layer. Imagine what happen with larger networks. We will see how to deal with this time issue in the next section.

Show the generated image.

What differences are between this image and those produced by the no-hidden-layer network? Could you identify any orientation on it?

## 3. 'batch' processing of Neural Networks

As we said before, the code above takes quite long, while the network is so small, i.e. it is not efficient.
The reason is because we are looping over all the input values, and doing a matrix calculation for each iteration. This is not the most efficient way to operate.

We will see how to do it avoiding looping, but using *batch* processing.

**Goal: apply a network to many samples in parallel taking advantage of array computations.**

### 3.1 Matrix/Vector/Tensor multiplication

For instance, create a no-hidden-layer network of $N_0=8$ input neurons, and $N_1=3$ output neurons.

Let's try to process $M=50$ different inputs at once. Notice, **$M$ is usually called _batchsize_**.

The idea is to have an extra dimension for the $x$ to be able to hold the different vector inputs.

Thus,

 1. Create an input array with 2 dimensions: one for every input neuron and one for every different input.

 2. Create a weights and biases arrays. Should they have an extra dimension to hold that several inputs, as well?

 3. Make the matrix multiplication in such a way to obtain the result prior to activation function, i.e. $z$. Which size (shape) should $z$ have?
 
> Play with the dimension orders and multiplication order to get a valid matrix multiplication.

What represent each dimension of the $x$ array?

What represent each dimension of the $z$ array?

Do weights and bias depend on the $M$ *batchsize*?

### 3.2 Batch implementation of one hidden layer network

Let's define the same network than before, with one hidden layer of 30 neurons, and 2-input/1-output neurons.

Let's try to use the same weights and biases than before. How they have to be modified?

Set up a function to go from one layer to the next in such a way it is able to manage batch processing, according to what learned above.

Create a function to apply a whole network (two layers in this case).

Consider to use an argument for the $x$ input and just two additional arguments: one for the weights and the other for the biases.

Take into account that for two layers network, we deal with two weights matrices and two biases vectors. However, for 10 layers network you would need many arguments (21). Moreover, you do not know how many layers a general network has. Then, consider to take a list of numpy array in the arguments corresponding to $[\omega]$ weights and $[b]$ biases. Then, iterate over that lists inside the function. In this way, you may use that function for any network having an unknown number of layers.

Let's check the implementation with a batchsize $M=m\times m = 300 \times 300 = 90000$, just to compare with the previous approach (looping).

Thus, apply your network to an input random array. Which shape must this input have? And the result?

### 3.3 Data preprocessing for batch implementation

Before, we had a two nested loops running all possible values for each input neuron.
This was inefficient, but easy to understand how to calculate every combination of input values.
Now, we have to think a way to put all that possibilities in a batch array to calculate them at once.
How to do that?

Use `np.linspace()` and `np.meshgrid()` functions to create a couple of 2D arrays of $300\times300$ holding values in range $[-0.5, 0.5]$, one vertically and the other horizontally. Check the documentation how that functions work.

Notice that one array increases with the horizontal direction and remains constant on the
vertical, like the $X$ coordinate. While the other runs just on the opposite,
like $Y$ coordinate. Then, we can use them just like simple Cartesian coordinates.

Be careful, do not get confused between $X$-$Y$ coordinates of the space, with the $x$-$y$ input-output vectors of the network, defined before.

Ok. Now we have two separated 2D-arrays ($300\times300$, each), while the network expects an input array of `batchsize`$=M=90000$ for two input neurons.

Then, use `np.flatten()` and `np.stack()` functions to transform two $300\times300$ arrays to a single $90000\times2$ array.

Apply that stacked array to the network.

Check the elapsed time to do it and compare it with the two nested loops implementation done before.

Check the shape of the result.

What represent each dimension and component of this output vector.

We want to visualize this output in an image, like before. However, we have a flattened array.

Let's go back from a flattened array to a 2D array: $(M\times 1) \rightarrow (m\times m)$ and `imshow` this image.

> Check the `np.reshape()` function

Is this image looking like before? Why? Which advantage have this approach with respect the two nested loops?

### 3.4 A network with MANY hidden layers

We can create a $[\omega]$ list of weights containing many weight between many layers.
Also, we can do the same for the $[b]$ biases.

Thus, create a big neural network having, let's say, 20 hidden layers, with some number of neurons each in the range $[20, 40]$. You can randomly set the specific number of neurons for each layer.

Set the weights matrices randomly in the range $[-5, 5]$ and the biases vectors in $[-1, 1]$.

Let's set the input layer with 2 neurons and 1 single output neuron, to be able to visualize the result.

Run this big neural network to the flatten and stacked input done before to cover $x=(x_1, x_2)$ input pairs in the range $[-0.5, 0.5]$, and `imshow` the result. Again, check the time consumed to apply this big network. 

Comment the image obtained.

## 4. Fancy visualization of Neural Networks with Pure Python

In this section you do not need to write code, just play with it to visualize different networks and activation functions to get familiar with the behavior of different hyper-parameters (number of layers, number of neurons, activation functions...) and to earn intuition.

### 4.1 Some internal routines for fancy plotting the network

This cell below contains code to show networks in tree plots where each branch color is proportional to its corresponding weight and the neuron color to its bias value. It is done in pure python/matplotlib.

In [3]:
BLUE_COLOR = [0, 0.4, 0.8]  # RGB color for the full-range negative value
ORANGE_COLOR = [1, 0.3, 0]  # RGB color for the full-range positive value

def plot_connection_line(ax, X, Y, W, vmax=10, linewidth=3):
    """ Draw a fancy line from (X[0], Y[0]) to (X[1], Y[1])
        according to the weight W into the frame ax.
    """
    t = np.linspace(0,1,20)  # free parameter to draw lines
    
    if W > 0:  # Color depending on the weight's sign
        col = ORANGE_COLOR
    else:
        col = BLUE_COLOR
    
    # fancy line from (X0, Y0) to (X1, Y1)
    xx = X[0] + t*(X[1] - X[0])  # Linear in horizontal
    yy = Y[0] + (3*t**2 - 2*t**3) * (Y[1] - Y[0])  # Round borders
    
    # plotting the line according to the weight
    ax.plot(xx, yy, alpha=min(1, abs(W)/vmax),
            color=col, linewidth=linewidth)

    
def plot_neuron_alpha(ax, X, Y, B, size=100.0, vmax=10):
    """ Draw a single neuron in position (X, Y) according to 
        the bias B, into the frame ax.
    """
    if B > 0:
        col = ORANGE_COLOR
    else:
        col = BLUE_COLOR
        
    ax.scatter([X], [Y], marker='o', color=col, alpha=min(1, abs(B)/vmax), 
               s=size, zorder=10)

    
def plot_neuron(ax, X, Y, B, size=100.0, vmax=10):
    """ Draw a single neuron in position (X, Y) independently to 
        the bias B, into the frame ax.
    """
    if B > 0:
        col = ORANGE_COLOR
    else:
        col = BLUE_COLOR
        
    ax.scatter([X], [Y], marker='o', color=col, s=size, zorder=10)
    
    
def visualize_network(weights, biases, activations, M=400,
                      x0range=[-3,3], x1range=[-3,3],
                      size=400.0, linewidth=5.0, maxv=1.):
    """
    Visualize a neural network with 2 input 
    neurons and 1 output neuron (plot output vs input in a 2D plot)
    
    weights is a list of the weight matrices for the
    layers, where weights[j] is the matrix for the connections
    from layer j to layer j+1 (where j==0 is the input)
    
    weights[j][m,k] is the weight for input neuron k going to output neuron m
    (note: internally, m and k are swapped, see the explanation of
    batch processing in lecture 2)
    
    biases[j] is the vector of bias values for obtaining the neurons 
    in layer j+1, biases[j][k] is the bias for neuron k in layer j+1
    
    activations is a list of the activation functions for
    the different layers: choose 'linear','sigmoid', 
    'jump' (i.e. step-function), and 'reLU'
    
    M is the resolution (MxM grid)
    
    x0range is the range of y0 neuron values (horizontal axis)
    x1range is the range of y1 neuron values (vertical axis)
    """
    
    if type(weights) == str:
        weights = eval(weights)
    if type(biases) == str:
        biases = eval(biases)

    # Let's transpose the weight to be able the batch processing
    swapped_weights = []
    for j in range(len(weights)):
        swapped_weights.append(np.transpose(weights[j]))
        
    # Let's create a set of input-pairs by means of a mesh grid
    x0, x1 = np.meshgrid(np.linspace(x0range[0], x0range[1], M),
                         np.linspace(x1range[0], x1range[1], M))
    x = np.zeros([M*M, 2])
    x[:, 0] = x0.flatten()
    x[:, 1] = x1.flatten()
    
    # Let's apply the NN
    y_out = apply_net(x, swapped_weights, biases, activations)

    # We will plot a diagram at left and the result at right
    fig, ax = plt.subplots(ncols=2, nrows=1, figsize=(8,4))
    
    
    # For the diagram
    
    #  1: posX and posY are arrays containing the positions of neurons
    posX = [[0, 0]]  # same column for both (at left)    
    posY = [[-0.5, +0.5]]  # for 2 inputs, let's putted centered in high

    
    vmax = 0.0 # for finding the maximum weight
    vmaxB = 0.0 # for maximum bias
    for j in range(len(biases)):  # for every layer on the NN
        n_neurons = len(biases[j])  # neurons in the current layer
        
        posX.append(np.full(n_neurons, j+1))  # next column to the previous one
        posY.append(np.array(range(n_neurons)) - 0.5 * (n_neurons-1)) # spread
        
        vmax = maxv#np.maximum(vmax, np.max(np.abs(weights[j])))  # to get the maximum
        vmaxB = maxv#np.maximum(vmaxB, np.max(np.abs(biases[j])))

    #   2: plot connections
    for j in range(len(biases)):  # for each layer
        for k in range(len(posX[j])):  # for each neuron
            for m in range(len(posX[j+1])):  # for each following neuron
                plot_connection_line(ax[0],  # first column of the plot
                                     [posX[j][k], posX[j+1][m]], # [X0,X1]
                                     [posY[j][k], posY[j+1][m]], # [Y0,Y1]
                                     swapped_weights[j][k,m],    # its weight
                                     vmax=vmax,  # to get normalized plots
                                     linewidth=linewidth)
    
    #   3: plot neurons
    for k in range(len(posX[0])):  # input neurons (have no bias!)
        plot_neuron(ax[0], posX[0][k], posY[0][k],
                    vmaxB, vmax=vmaxB, size=size)
        
    for j in range(len(biases)): # all other neurons
        for k in range(len(posX[j+1])):
            plot_neuron_alpha(ax[0], posX[j+1][k], posY[j+1][k],
                              biases[j][k], vmax=vmaxB, size=size)
    
    ax[0].axis('off')
    
    # now: the output of the network
    img = ax[1].imshow(np.reshape(y_out, [M,M]),
                       origin='lower',
                       extent=[x0range[0],x0range[1],x1range[0],x1range[1]])
    ax[1].set_xlabel('$x_1$')
    ax[1].set_ylabel('$x_2$')
    
    axins1 = inset_axes(ax[1],
                        width="40%",  # width = 50% of parent_bbox width
                        height="5%",  # height : 5%
                        loc='upper right')

    imgmin = np.min(y_out)
    imgmax = np.max(y_out)
    color_bar = fig.colorbar(img, cax=axins1, orientation="horizontal",
                             ticks=np.linspace(imgmin,imgmax,3))
    cbxtick_obj = plt.getp(color_bar.ax.axes, 'xticklabels')
    plt.setp(cbxtick_obj, color="white")
    axins1.xaxis.set_ticks_position("bottom")

    ax[1].set_title(' , '.join(activations))
    
    plt.show()

In the cell below we implement and example of a general network able to use different activation functions in each layer. Check it, but it should be very similar to your functions done above.

In [4]:
def apply_layer(x, w, b, activation):
    """ Batch processing of a single layer:
           x: input values  -> shape: [batchsize, num_neurons_in]
           w:    weight matrix -> shape: [n_neurons_in, n_neurons_out]
           b:    bias vector   -> length: n_neurons_out
    
           activation is some string of the following ones:
             - sigmoid
             - jump
             - linear
             - reLU
    
           returns the values of the output neurons in the next layer 
              -> shape: [batchsize, n_neurons_out]
    """
    z = np.dot(x, w) + b
    if activation == 'sigmoid':
        return 1 / (1 + np.exp(-z))
    elif activation == 'jump':
        return np.array(z>0, dtype='float')
    elif activation == 'linear':
        return z
    elif activation == 'reLU':
        return (z > 0) * z

    
def apply_net(x, weights, biases, activations):
    """ Apply a whole network of multiple layers.
          y_in: input values  -> shape: [batchsize, num_neurons_in]
          weights, biases and activations must be any iterable 
          which length is the layers' number containing
              weight matrix  -> shape: [n_neurons_in, n_neurons_out]
              bias vector    -> length: n_neurons_out
              activation str -> sigmoid, jump linear or reLU
          Alternatively, they can be extended matrices
          where a simple slicing generates the proper weight, 
          bias and activation.
    """
    y = x.copy()
    for j in range(len(biases)):
        y = apply_layer(y, weights[j], biases[j], activations[j])
    return y

### 4.2 No hidden layer NN

Let's visualize a simple network (no hidden layer) with different activation functions.

Notice that no hidden layer means that weight, biases and activations are list of one single item.

Play with different weights combinations to see its behavior. Do the same with the bias and the activation function.

In [5]:
print("You should see here three sliders and a text box to select the weights, "
      "bias, and activation function.\nIf you do not see them, try to restart "
      "the Jupyter Notebook application.")

@interact(w1=(-10.,10.), w2=(-10.,10.), b=(-10.,10.), activation=["sigmoid","jump","linear","reLU"])
def draw(w1=-3.4, w2=4.6, b=2.8, activation='sigmoid'):
    weights=[ [      # a list of matrices (length 1 in this case)
        [w1, w2]  # from 2 input neurons to a single output neuron: 1x2
        ] ]

    biases=[   # a list of vectors (length 1 in this case)
        [b]  # bias for 1 single output neuron: 1 value
        ]
   
    visualize_network(weights, biases, [activation], maxv=10)

You should see here three sliders and a text box to select the weights, bias, and activation function.
If you do not see them, try to restart the Jupyter Notebook application.


interactive(children=(FloatSlider(value=-3.4, description='w1', max=10.0, min=-10.0), FloatSlider(value=4.6, d…

How are the weights and bias related with the resulting image? And the activation function? 

### 4.3 Deep dense NN

Let's visualize a Neural Network of 1 hidden layer of 3 neurons using different activation functions

In [6]:
@interact(activation_1=["sigmoid","jump","linear","reLU"],
          activation_2=["sigmoid","jump","linear","reLU"])
def draw(weights="[[[0.2, 0.9],[-0.5, 0.3],[0.8, -1]],[[-0.3,0.7,0.5]]]",
         biases="[[0.1, -0.5, -0.5],[-0.2]]",
         activation_1='jump', activation_2='sigmoid'):

    visualize_network(weights, biases, [activation_1, activation_2], maxv=1)


interactive(children=(Text(value='[[[0.2, 0.9],[-0.5, 0.3],[0.8, -1]],[[-0.3,0.7,0.5]]]', description='weights…

Check the inclination of the lines having one hidden layer. How many lines are there when using a jump-sigmoid combination? And if the combination is sigmoid-jump? What about jump-jump? And sigmoid-sigmoid? How many lines in that combinations and how that lines are?

What happens if we set the first activation function as linear?

### 4.4 More layers and some activation function combination

In [7]:
# now more complicated, just for fun
@interact(activation_1=["sigmoid","jump","linear","reLU"],
          activation_2=["sigmoid","jump","linear","reLU"],
          activation_3=["sigmoid","jump","linear","reLU"])
def draw(w='[ [ [0.2, 0.9], [-0.5, 0.3], [0.8, -1.3],[-0.3, -0.9], [-0.8, -1.2]],'
           '  [ [0.2, 0.8,-0.6, -0.9, 0.3], [0.5, 0.1, 0.3, -0.7,-0.9], [-0.3, 0.7, 0.5, -0.3, 0.4]],'
           '  [ [-0.3, 0.7, -0.3] ]  ]',
         b='[[0.1, -0.5, -0.5, 0.3, 0.2], [0.2,-0.5,0.3], [0.5]]',
         activation_1='jump', activation_2='sigmoid', activation_3='reLU'):
   
    visualize_network(w, b, activations=[activation_1, activation_2, activation_3], maxv=1)
    

interactive(children=(Text(value='[ [ [0.2, 0.9], [-0.5, 0.3], [0.8, -1.3],[-0.3, -0.9], [-0.8, -1.2]],  [ [0.…

Which is the behavior of the firsts weights-biases layer? And the lasts?

Let's apply a `factor` to scale all weights and biases!

In [8]:
@interact(activation_1=["sigmoid","jump","linear","reLU"],
          activation_2=["sigmoid","jump","linear","reLU"],
          factor=(0., 60.))
def draw(weights="[[[0.2, 0.9],[-0.5, 0.3],[0.8, -1]],[[-0.3,0.7,0.5]]]",
         biases="[[0.1, -0.5, -0.5],[-0.2]]",
         activation_1='sigmoid', activation_2='linear',
         factor=10):
    
    weights = eval(weights)
    biases = eval(biases)

    # this needs np.array(), because you cannot do factor*<python-list>
    ws = [factor*np.array(matrix) for matrix in weights]
    bs = [factor*np.array(vector) for vector in biases]
    
    visualize_network(ws, bs, [activation_1, activation_2], maxv=60)


interactive(children=(Text(value='[[[0.2, 0.9],[-0.5, 0.3],[0.8, -1]],[[-0.3,0.7,0.5]]]', description='weights…

What happen when increasing the scale of weights and biases while using the sigmoid function?

### 4.5 Something not random

Many superimposed lines can be used to construct arbitrary shapes, with only a single hidden layer.

In [9]:
@interact(factor=(1, 100), n_lines=(1, 20))
def draw(factor=10, n_lines=3):
    phi = np.linspace(0, 2*np.pi, n_lines+1)  # Angular variable
    phi = phi[:-1]  # the last value is 2pi, which is equivalent to the 0

    weight_hidden = np.zeros([n_lines, 2])   # comment this shape
    weight_hidden[:,0] = factor*np.cos(phi)  # x=cos(phi)
    weight_hidden[:,1] = factor*np.sin(phi)  # y=sin(phi)

    bias_hidden = np.full(n_lines, factor*(+0.5))  # all neurons acts equally

    visualize_network(weights=[ 
                                weight_hidden,           # from input to hidden
                                np.full([1,n_lines],1.0) # from hidden to output
                              ],
                      biases=[ 
                                bias_hidden,
                                [0.0]
                             ],
                      activations=['sigmoid',  # activation for hidden
                                   'reLU'    # activation for output
                                  ],
                      size=30.0, maxv=60)

interactive(children=(IntSlider(value=10, description='factor', min=1), IntSlider(value=3, description='n_line…

Play with different `factor` above to see how sigmoid behavior changes.

`n_lines` sets the number of lines on the figure, but what does it represent in the NN? Play with different number of lines.

Why weights are made of sines and cosines?

What are biases here? Play with it.

Why the weights corresponding from hidden to output layer are full of ones?

Why the output's activation function is linear?

Draw a sharp and big six-pointed star using the code above.

Draw a blurred and small circle using the code above