<img src="images/kiksmeisedwengougent.png" alt="Banner" width="1100"/>

<div>
    <font color=#690027 markdown="1">
<h1>RELU</h1>    </font>
</div>

<div class="alert alert-box alert-success">
To build a neural network, one or more <b>activation functions</b> are needed. ReLU is a commonly used activation function. It is a non-linear function that allows classes that are not linearly separable to be separated.</div>

### Importing the necessary modules

In [None]:
import pandas as pd

import matplotlib.pyplot as plt
import numpy as np

You will be working with 15 given points in the plane. Some points are blue, others are green. The coordinate and color of each point is given.<br>The points represent two classes.<br>The intention is to separate the green and blue points from each other.

<div>
    <font color=#690027 markdown="1">
<h2>1. Reading the data</h2>    </font>
</div>

Read the dataset using the `pandas` module.

In [None]:
points = pd.read_csv("data/data.csv", header=None)  # table to be read has no heading

<div>
    <font color=#690027 markdown="1">
<h2>2. Displaying the loaded data</h2>    </font>
</div>

View the data by executing the `points` instruction. The dataset consists of the x and y coordinates of the points and the color of each point. <br>The x and y coordinates are features, the color is a label. <br> Because there are two types of labels, it is said that the points are distributed over **two classes**.

In [None]:
points

This table is a table with 15 rows and 3 columns: after all, there are 15 points, 2 features and 1 label. <br><br>The features:
- first column: x-coordinate;
- second column: y-coordinate.

The label:
- third column: color.

<div class="alert alert-box alert-info">
In machine learning, two features are usually represented by x1 and x2 and the label by y.</div>

<div>
    <font color=#690027 markdown="1">
<h2>3. Investigating whether the points can be separated from each other</h2>    </font>
</div>

<div>
    <font color=#690027 markdown="1">
<h3>3.1 Visualizing the data</h3>    </font>
</div>

To visualize the data, you need the x and y coordinates, so the features x1 and x2, of the points.

In [None]:
x1 = points[0]            # x-coordinate is in column with index 0
x2 = punten[1]            # y-coordinate is in column with index 1
x1 = np.array(x1)         # adjust format
x2 = np.array(x2)
X = np.stack((x1, x2), axis = 1)    # correct format, axis=1 sets x1 and x2 as columns

In [None]:
# let's take a lookprint(x1)print(x2)print(X)print(X.shape)

<div>
    <font color=#690027 markdown="1">
<h3>3.2 Displaying the data in a scatter plot</h3>    </font>
</div>

In [None]:
plt.figure()

plt.scatter(x1[:6], x2[:6], color="blue", marker="x")
plt.scatter(x1[6:], x2[6:], color="green", marker="<")

plt.show()

It is clear that these points are **not linearly separable**: it is impossible to find one straight line that separates the green points from the blue ones.<br>It works with two half rights or one curve.

You can, for example, construct two semi-lines that both go through $(1,1)$; the left semi-line goes through $(-2,10)$ and the right one through $(2,8)$. These lines have the respective equations $y=-3x+4$ and $y=7x-6$.

<div>
    <font color=#690027 markdown="1">
<h2>4. Classification</h2>    </font>
</div>

<div>
    <font color=#690027 markdown="1">
<h3>4.1 Decision boundary</h3>    </font>
</div>

As shown in the following script, one can indeed create a separation with two half-lines.

To know for which x-values we should best draw the separating rights, we look at the range on the x-axis.

In [None]:
# range x-axis
print(x1.min(), x1.max())

In [None]:
# separation ('decision boundary')
# dividing lines are determined by points on relevant straight lines
x_1 = np.linspace(-3, 1, 10)   # line segment on domain [-3, 1]
x_2 = np.linspace(1, 3, 10)    # line segment on domain [1, 3]
y_r_1 = 7 * x_2 - 6            # equation of increasing straight line
y_r_2 = -3 * x_1 + 4           # equation of a descending line

plt.figure()

# point cloud data
plt.scatter(x1[:6], x2[:6], color="blue", marker="x")
plt.scatter(x1[6:], x2[6:], color="green", marker="<")
# plotting dividing lines
plt.plot(x_2, y_r_1, color="black")
plt.plot(x_1, y_r_2, color="black")

plt.show()

<div class="alert alert-box alert-info">
The black border is called the <em>decision boundary</em>.</div>

<div>
    <font color=#690027 markdown="1">
<h3>4.2 Visualizing the two classes</h3>    </font>
</div>

The *decision boundary* creates two areas. These areas can be visualized with two different colors.

In [None]:
x_1 = np.linspace(-3.5, 1, 10)
x_2 = np.linspace(1, 3.5, 10)
y_r_1 = 7 * x_2 - 6
y_r_2 = -3 * x_1 + 4
# grid within the graph screen with resolution = 0.2
xx1 = np.arange(x1.min()-1, x1.max()+1, 0.2)
xx2 = np.arange(x2.min()-1, x2.max()+2, 0.2)

plt.figure()

# point cloud data
plt.scatter(x1[:6], x2[:6], color="blue", marker="x")
plt.scatter(x1[6:], x2[6:], color="green", marker="<")
# plot dividing lines
plt.plot(x_2, y_r_1, color="black")
plt.plot(x_1, y_r_2, color="black")
# colored areas: each point (a,b) is assigned a color
for a in xx1:    
    for b in xx2:
        if (7 * a - b - 6 <= 0) and (-3 * a - b + 4 <= 0):
            coloring = "lightblue"
        else:
            coloring = "lightgreen"
        plt.plot(a, b, marker='.', color=coloring)        
plt.show()

The points in the light blue area belong to one class and the points in the light green area belong to the other class.

<div>
    <font color=#690027 markdown="1">
<h2>5. Classification with ReLU</h2>    </font>
    </font>
</div>

<div>
    <font color=#690027 markdown="1">
<h3>5.1 ReLU</h3>    </font>
</div>

The ReLU function is a *non-linear function*. This function has a *multiple prescription*.
$$ReLU(x) = max(0,x)$$

so
$$ReLU: \begin{cases} x \longmapsto 0 \;,  \; x < 0 \\ 
        x \longmapsto x \;,  \; x \geq 0 \end{cases}  $$  

ReLU stands for *rectified linear unit*.

The graph of the ReLU function:

<img src="images/relu.png" alt="Banner" width="400"/>

ReLU thus sets all negative values to zero.

<div>
    <font color=#690027 markdown="1">
<h3>5.2 Separating non-linearly separable data using ReLU</h3>    </font>
</div>

Below, the code is adjusted to use the ReLU function. This makes it clear that with ReLU, data that is not linearly separable can still be divided into different areas.<br>The light blue area gets label '0' and the light green area gets label '1'.<br>The Heaviside function is used to determine which class a point belongs to.

<img src="images/schemanb.png" alt="Banner" width="900"/>

Explanation: <br>y = 7 x - 6 is the equation of the ascending line, i.e. 7 x - y - 6  = 0  and in the blue area 7 x - y - 6 is always <= 0;y = -3 x + 4 is the equation of the descending straight line, i.e. -3 x - y + 4 = 0 and in the blue area, -3 x - y + 4 is always <= 0.

This means for the given points (input) if you fill them in these equations of the dividing lines: <br>z1 = 7 x1 - x2 - 6    and in the blue area this is always <= 0;<br>z2 = -3 x1 - x2 + 4 and in the blue area this is always <= 0.<br>So in the blue area, both z1 and z2 are <= 0, not in the green area.

Then let the activation function ReLU take effect:<br>h1 = relu(z1) and in the blue area this is definitely 0; <br>h2 = relu(z2) and in the blue area this is definitely 0. <br>h1 and h2 are the neurons of the hidden layer.So in the blue area, both h1 and h2 are 0, but not in the green area.

This means that if one adds up those neurons h1 and h2, the sum for the blue area is always 0 and for the green area it is not.

The model has three *layers*: an input and output layer and one hidden layer (*input layer, output layer, hidden layer*).

In [None]:
def relu(x):
    """ReLU(x) = max(x,0). """
    return np.maximum(x,0)

def hidden(x, y):
    "Neurons of hidden layer."
    h1 = relu(7 * x - y - 6)
    h2 = relu(-3 * x - y + 4)    
    return h1, h2

def output(x, y):
    "Classification."
    klasse = np.heaviside(sum(hidden(x, y)), 0)
    return klasse

# decision boundary
x_1 = np.linspace(-3.5, 1, 10)
x_2 = np.linspace(1, 3, 10)
y_r_1 = 7 * x_2 - 6
y_r_2 = -3 * x_1 + 4

# grid with resolution 0.2
xx1 = np.arange(x1.min()-1, x1.max()+1, 0.2)
xx2 = np.arange(x2.min()-1, x2.max()+4, 0.2)

plt.figure()

# point cloud data
plt.scatter(x1[:6], x2[:6], color="blue", marker="x")
plt.scatter(x1[6:], x2[6:], color="green", marker="<")
# dividing lines
plt.plot(x_2, y_r_1, color="black")
plt.plot(x_1, y_r_2, color="black")
# coloring areas
for a in xx1: 
    for b in xx2:        
        if output(a, b) == 0:
            coloring = "lightblue"
        else:
            coloring = "lightgreen"
        plt.plot(a, b, marker='.', color=coloring)
plt.show()

<div>
    <font color=#690027 markdown="1">
<h3>5.3 Functioning of ReLU: from non-linearly separable to linearly separable</h3>    </font>
</div>

In [None]:
plt.figure()   

plt.scatter(x1[:6], x2[:6], color="blue", marker="x")
plt.scatter(x1[6:], x2[6:], color="green", marker="<")

plt.show()

The points are **not linearly separable**.

In [None]:
def relu(tensor):
    """Relu(x) = max(0,x)."""    
    return np.maximum(0, tensor)
# y = 7 * x - 6 equation rising straight line 7 * x - y - 6  = 0   in the blue area this is always <= 0
# y = -3 * x + 4  equation of descending straight line   -3 * x - y + 4 = 0   in blue area this is always <= 0

# filling in given points (input) in equations of dividing lines
z1 = 7* x1 - x2  - 6    # in blue area this is always <= 0
z2 = -3 *x1 - x2 + 4    # in the blue area this is always <= 0

# applying activation function ReLU
h1 = relu(z1)       # neuron hidden layer; in blue area this is definitely 0
h2 = relu(z2)       # neuron hidden layer; in blue area this is definitely 0
# in the blue area both h1 and h2 are 0 but not in the green area

print(h1)
print(h2)

In [None]:
plt.figure(figsize=(16,5))

plt.subplot(1,2,1)                                        # plot with multiple images
plt.scatter(x1[:6], h1[:6], color="blue", marker="x")
plt.scatter(x1[6:], h1[6:], color="green", marker="<")
plt.title("Enter points in equation. ascending straight line\nand let ReLU act on result")
plt.xlabel("x1")
plt.ylabel("h1")

plt.subplot(1,2,2)
plt.scatter(x1[:6], h2[:6], color="blue", marker="x")
plt.scatter(x1[6:], h2[6:], color="green", marker="<")
plt.title("Enter points in eq. decreasing straight line\nand let ReLU act on result")
plt.xlabel("x1")
plt.ylabel("h2")

plt.show()

In [None]:
# to output layer; neurons that come in as input to the output layer are added up
# in the blue area the sum is 0 and in the green it is not
n_output = h1 + h2   # applying heaviside function (threshold value 0) results in classes 0 and 1, n_output = z'
print(n_output)

In [None]:
plt.figure()

plt.scatter(x1[:6], n_output[:6], color="blue", marker="x")
plt.scatter(x1[6:], n_output[6:], color="green", marker="<")

plt.show()

These points are linearly separable and can therefore be assigned to a class.

In [None]:
# apply activation function to this: applying the heaviside function (threshold value 0) yields classes 0 and 1
klasse = np.heaviside(n_output, 0)
print(klasse)

The first 6 points are the blue points; so indeed, they are classified under class 0. The other points are the green points and they are classified under class 1.

<div>
    <font color=#690027 markdown="1">
<h3>5.4 Layered Model</h3>    </font>
</div>

This is the structure of a neural network with a layered model. The model has three *layers*: an input and output layer and one hidden layer (*input layer, output layer, hidden layer*).

The model is determined by the number of neurons, the weights (*weights*), the biases (the added weights) and the activation functions.

In the model, there are connections between the two neurons of the input layer and the two neurons of the hidden layer. These connections have a weight determined by the coefficients of the equations of the separating lines. There is also a *bias* determined by the constant terms in these equations. The activation function there is ReLU.<br>There are also connections between the two neurons of the hidden layer and the neuron of the output layer. The weights of these connections are 1. The neurons that are input into the output layer are added together. Finally, the model assigns a class to the data using the Heaviside function.<br>

To understand the code that follows, one takes the step towards a matrix notation.

$ W = \begin{bmatrix} -6 & 4 \\ 7 & -3 \\ -1 & -1 \end{bmatrix}$; -6 and 4 are the biases.<br>$ W' = \begin{bmatrix} 0 \\ 1  \\  1 \end{bmatrix} $; 0 is the bias. <br>$ X = \begin{bmatrix} 1 \\ x1  \\  x2 \end{bmatrix} $

From input layer to hidden layer: $ W^{T} \cdot X_{i} $. The result is a matrix Z.<br>The activation function ReLU is applied to the matrix Z and an additional row is added to the result, thus obtaining the matrix H of the hidden layer.<br>From hidden layer to output layer: $ W'^{T} \cdot H $ followed by the Heaviside function.
This can also be noted as follows:

$ W_{ih} = \begin{bmatrix} 7 & -3 \\ -1 & -1 \end{bmatrix}$ and $B_{ih}  = \begin{bmatrix} -6 & 4 \end{bmatrix} $ <br>$ W_{ho} = \begin{bmatrix} 1  \\  1 \end{bmatrix} $ <br>$ X_{i} = \begin{bmatrix} x1  \\  x2 \end{bmatrix} $
From input layer to hidden layer: $ W_{ih}^{T}.\cdot X_{i} + B_{ih}^{T} $ followed by ReLU. The result is a matrix $X_{h}$.<br>From hidden layer to output layer: $ W_{ho}^{T}.\cdot X_{h} $ followed by the Heaviside function.

The model is a *feed forward* model and is *fully connected*.

In [None]:
def relu(tensor):
    """Relu(x) = max(0,x)."""
    return np.maximum(0,tensor)

def output(x):
    "Classification."
    class = np.heaviside(x,0)    
    return class

class Model:
    """Model with three layers, two neurons per layer."""    
    
    def __init__(self):
        """self has two parameters: number of input neurons, number of output neurons."""        
        self.wih = np.array([[7, -3], [-3,-1]])    # weights between input layer and hidden layer        
        
        self.biasih = np.array([[-6, 4]])          # bias between input layer and hidden layer        
        self.who = np.array([[1], [1]])            # weights between hidden layer and output layer        
        self.activation_functionh = relu        
        self.activation_functiono = output      
        
    def predict(self, features):"""Fit data."""        
    inputs = kenmerken.T                   
    # kenmerken is a matrix with x1 and x2 underneath each other
    hidden_inputs = np.dot(self.wih.T, inputs) + self.biasih.T    # linear combination of inputs with corresponding weights        
    hidden_outputs = self.activation_functionh(hidden_inputs) # relu is activation function in hidden layer        
    final_inputs = np.dot(self.who.T, hidden_outputs)           # linear combination of hidden layer output with corresponding weights
    final_outputs = self.activation_functiono(final_inputs)   # classification with output
    return final_outputs

Create a model of the `Model` class.

In [None]:
model = Model()      # constructing model via constructor

In [None]:
# testing the model
print("Enter coordinate of a point.")
co_x = float(input("x-coordinate is: "))
co_y = float(input("y-coordinate is: "))
X = np.array([[co_x, co_y]])
print(X)
model.predict(X)

<div>
    <font color=#690027 markdown="1">
<h3>5.5 Visualizing Classes</h3>    </font>
</div>

In [None]:
# create grid with resolution 0.2
x_1 = np.linspace(-3.5, 1, 10)
x_2 = np.linspace(1, 3, 10)
xx1 = np.arange(x1.min()-1, x1.max()+1, 0.2)
xx2 = np.arange(x2.min()-1, x2.max()+4, 0.2)

plt.figure()

# data points
plt.scatter(x1[:6], x2[:6], color="blue", marker="x")
plt.scatter(x1[6:], x2[6:], color="green", marker="<")

# assign correct color to each point in grid
for a in xx1:    
    for b in xx2:
        X = np.array([[a, b]])          # matrix with 1 row        
        if model.predict(X) == 0:
            coloring = "lightblue"
        else:
            coloring = "lightgreen"        
        plt.plot(a, b, marker='.', color=coloring)
        
plt.show()

<div class="alert alert-box alert-info">
The data consists of points with two <b>features</b> and a corresponding <b>label</b>. The label can take two values; there are two <b>classes</b>. A boundary between the classes is a <b>decision boundary</b>. <br>The model is a neural network with an <b>input layer</b>, a <b>hidden layer</b> with ReLU activation function and an <b>output layer</b> with Heaviside activation function. <br>    
The classes are not linearly separable, but can still be separated from each other using the <b>non-linear function ReLU</b>.</div>

<div>
<h2>With support from</h2></div>

<img src="images/kikssteun.png" alt="Banner" width="1100"/>

<img src="images/cclic.png" alt="Banner" align="left" width="80"/><br><br>
Notebook KIKS, see <a href="http://www.aiopschool.be">AI At School</a>, by F. wyffels & N. Gesquière, is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license</a>.