# Perceptron

Problem: Classifying 2-d points into 2 classes

For this problem we are going to use the Perceptron algorithm:

Perceptron characteristics:
- The simplest neural network
- One McCulloch-Pitts neuron
- Supervised
- Calculates the decision boundary between two classes
- Calculating the predicted class:
    - Activation function with weights W and bias w<sub>0</sub>
    \begin{equation}
    u = w_1 * x_1 + w_2 * x_2 + w_3 * x_3 + w_0
    \end{equation}
    - Step function:
    ![Step functions](step_function.png)
    - If classes are -1 and 1, the step function classifies a point to class -1 if u is less than 0, and to class 1 if u is larger than or equal to 0
    - If classes are 0 and 1, the step function classifies a point to class 0 if u is less than 0, and to class 1 if u is larger than or equal to 0
    \begin{equation}
    \hat{y} = step(u) = step(w_1 * x_1 + w_2 * x_2 + w_0)
    \end{equation}
- Perceptron neuron:
![Perceptron](https://images.deepai.org/glossary-terms/perceptron-6168423.jpg)

Training:
- Initialize weights W to random values
- For each training point / example:
    - Calculate the activation value u
    - Apply the step function on the value of u for calculating the predicted class
    - If the output is different than the target, e.g. predicted class 0 while target is 1, adjust the weights as such:
    - Adjust weights so as u approximates the y target as close as possible
    \begin{equation}
    w^k = w^{k-1} + β * (y - \hat{y}) * x
    \end{equation}
    where β is the learning rate and k is the corresponding epoch
- For linearly separable data, if all points are classified correctly, the algorithm converges and stops adjusting the weights.
- For non-linearly separable data, the algorithm continues trying to adjust the weights forever, since the algorithm can never converge, or until the maximum number of epochs is reached.

------------------

Architecture:
- One McCulloch-Pitts neuron

We import the libraries we are going to need

In [1]:
import numpy as np
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns

For the needs of our problem, we create a make_points method, which we use to create the data points of the problem. All points are created using a uniform distribution.

The method accepts the following parameters:
+ low: array with the minimum values of each dimension
+ high: array with the maximum values of each dimension
+ dims: number of dimensions
+ n: number of points to be created
+ target: class of the created points

In [2]:
def make_points(low, high, dims, n, target):
    points = []
    for i in range(dims):
        points.append(np.random.uniform(low[i], high[i], n))
    points.append(np.ones(n) * target)
    
    return np.column_stack(points)

We create yet another plot_training method, which will display the plots for each epoch (iteration) of the training process. For this method and similar methods in the rest of the notebooks, epochs will always be set to 100.

Besides displaying all plots, plot_training handles the training process as well.

Three plots:
- Static plot with the points and their true classes; not updated
- Dynamic plot with the points and their predicted classes, as well as the moving decision boundary
- Dynamic plot with the points' class prediction

In [3]:
def plot_training(model, X_train, t_train):
    fig, axes = plt.subplots(3, 1)
    for ax in axes:
        ax.set_xlabel('X')
        ax.set_ylabel('Y')
    axes[0].set_xlim(-0.1, 1)
    axes[0].set_ylim(-0.1, 1)
    axes[1].set_xlim(-0.1, 1)
    axes[1].set_ylim(-0.1, 1)
    axes[2].set_xlim(-0.2, 1.1)
    axes[2].set_ylim(-0.2, 1.1)

    unique_classes = np.unique(t_train)
    n_iter = 100
    num_train_samples = len(X_train)
    
    # Plot 1 - static
    sns.scatterplot(x=X_train[:,0], y=X_train[:,1], 
                    hue=t_train, style=t_train,
                    ax=axes[0])
    
    for i in range(n_iter):
        for ax in axes[1:]:
            ax.clear()

        axes[0].set_title('Epoch ' + str(i + 1) + ' out of ' + str(n_iter))
        model = model.partial_fit(X_train, t_train, unique_classes)

        train_pred = model.predict(X_train)
        
        # Plot 2
        sns.scatterplot(x=X_train[:,0], y=X_train[:,1], 
                        hue=train_pred, style=train_pred,
                        ax=axes[1])
        # Decision boundary plot
        b = model.intercept_[0]
        w1, w2 = model.coef_.T
        c = -b / w2
        m = -w1 / w2
        xd = np.array([-0.1, 1])
        yd = m * xd + c
        axes[1].plot(xd, yd)
        
        # Plot 3
        sns.scatterplot(x=range(num_train_samples), y=train_pred,
                        hue=train_pred, style=train_pred,
                        ax=axes[2])
        
        fig.canvas.draw()
        plt.pause(0.1)
        
    return model

The problem is split into 4 parts:
1. Linearly separable data
2. Non-linearly separable data 1
3. Non-linearly separable data 2
4. Non-linearly separable data 3

In the following sections, we will see the procedure applied to each part of the problem, which is identical for each part.

### Linearly separable data

Define a number of points

In [4]:
n = 200

Create the points of each class with make_points

In [5]:
lin_separable = np.row_stack((make_points([0, 0], [0.3, 0.3], 2, n//2, 0),
                              make_points([0.7, 0.7], [0.9, 0.9], 2, n//2, 1)))

Split into training and test set
Χωρίζω σε training και test δεδομένα

In [6]:
X_train, X_test, t_train, t_test = train_test_split(lin_separable[:,:2], lin_separable[:,2], 
                                                   stratify=lin_separable[:,2])

Sort the points based on their class. This helps to better visualize the third plot

In [7]:
X_train = X_train[t_train.argsort()]
t_train.sort()
X_test = X_test[t_test.argsort()]
t_test.sort()

We create the model, optionally selecting a maximum number of epochs (max_iter) and learning rate (eta0). The model is created using scikit-learn.

In [8]:
model = Perceptron(max_iter=1, eta0=1e-10)

We train the model using plot_training, displaying in the meantime the plots at each epoch

In [9]:
%matplotlib notebook
model = plot_training(model, X_train, t_train)

<IPython.core.display.Javascript object>

At the end, I test my model on test data and measure how well they are classified with yet another plot

In [10]:
num_test_samples = len(X_test)

In [11]:
test_pred = model.predict(X_test)

In [12]:
# %matplotlib inline
plt.figure(figsize=(8,8))
plt.scatter(range(num_test_samples), test_pred, c='r', marker='o')
plt.scatter(range(num_test_samples), t_test, c='b', marker='.')

plt.show()

<IPython.core.display.Javascript object>

### Linearly inseparable data 1

We follow the same procedure for all other parts of the problem

In [13]:
n = 400

In [14]:
non_separable1 = np.row_stack((make_points([0, 0], [0.3, 0.3], 2, n//2, 0),
                               make_points([0, 0.4], [0.3, 0.9], 2, n//4, 1),
                               make_points([0.4, 0], [0.9, 0.9], 2, n//4, 1)))

In [15]:
X_train, X_test, t_train, t_test = train_test_split(non_separable1[:,:2], non_separable1[:,2], 
                                                    stratify=non_separable1[:,2])

In [16]:
X_train = X_train[t_train.argsort()]
t_train.sort()
X_test = X_test[t_test.argsort()]
t_test.sort()

In [17]:
model = Perceptron(max_iter=1)

In [18]:
%matplotlib notebook
model = plot_training(model, X_train, t_train)

<IPython.core.display.Javascript object>

In [19]:
num_test_samples = len(X_test)

In [20]:
test_pred = model.predict(X_test)

In [21]:
# %matplotlib inline
plt.figure(figsize=(8,8))
plt.scatter(range(num_test_samples), test_pred, c='r', marker='o')
plt.scatter(range(num_test_samples), t_test, c='b', marker='.')

plt.show()

<IPython.core.display.Javascript object>

### Linearly inseparable data 2

In [22]:
n = 400

In [23]:
non_separable2 = np.row_stack((make_points([0.4, 0.4], [0.6, 0.6], 2, n//2, 0),
                               make_points([0, 0], [0.9, 0.3], 2, n//8, 1),
                               make_points([0, 0.7], [0.9, 0.9], 2, n//8, 1),
                               make_points([0, 0], [0.3, 0.9], 2, n//8, 1),
                               make_points([0.7, 0], [0.9, 0.9], 2, n//8, 1),))

In [24]:
X_train, X_test, t_train, t_test = train_test_split(non_separable2[:,:2], non_separable2[:,2], 
                                                    stratify=non_separable2[:,2])

In [25]:
X_train = X_train[t_train.argsort()]
t_train.sort()
X_test = X_test[t_test.argsort()]
t_test.sort()

In [26]:
model = Perceptron(max_iter=1, eta0=1e-4)

In [27]:
%matplotlib notebook
model = plot_training(model, X_train, t_train)

<IPython.core.display.Javascript object>

In [28]:
num_test_samples = len(X_test)

In [29]:
test_pred = model.predict(X_test)

In [30]:
# %matplotlib inline
plt.figure(figsize=(8,8))
plt.scatter(range(num_test_samples), test_pred, c='r', marker='o')
plt.scatter(range(num_test_samples), t_test, c='b', marker='.')

plt.show()

<IPython.core.display.Javascript object>

### Linearly inseparable data 3

In [31]:
n = 400

In [32]:
non_separable3 = np.row_stack((make_points([0, 0], [0.3, 0.3], 2, n//4, 0),
                               make_points([0.7, 0.7], [0.9, 0.9], 2, n//4, 0),
                               make_points([0.7, 0], [0.9, 0.3], 2, n//4, 1),
                               make_points([0, 0.7], [0.3, 0.9], 2, n//4, 1)))

In [33]:
X_train, X_test, t_train, t_test = train_test_split(non_separable3[:,:2], non_separable3[:,2], 
                                                    stratify=non_separable3[:,2])

In [34]:
X_train = X_train[t_train.argsort()]
t_train.sort()
X_test = X_test[t_test.argsort()]
t_test.sort()

In [35]:
model = Perceptron(max_iter=1, eta0=1e-4)

In [36]:
%matplotlib notebook
model = plot_training(model, X_train, t_train)

<IPython.core.display.Javascript object>

In [37]:
num_test_samples = len(X_test)

In [38]:
test_pred = model.predict(X_test)

In [39]:
# %matplotlib inline
plt.figure(figsize=(8,8))
plt.scatter(range(num_test_samples), test_pred, c='r', marker='o')
plt.scatter(range(num_test_samples), t_test, c='b', marker='.')

plt.show()

<IPython.core.display.Javascript object>