<img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
<h3 style="text-align: center;"><b>Phystech School of Applied Mathematics and Informatics (PSAMI) MIPT</b></h3>

---

<h2 style="text-align: center;"><b>Rosenblatt's perceptron<br><br>(neuron with threshold activation)</b></h2>

---

In this notebook you will learn how to:  

- Implement class **`Perceptron()`** -- neuron with threshold activation
- train and validate your perceptron on generated and real data (files with real data are in '/data' folder) 
- compare quality of your model with models from module `scikit-learn` (`sklearn.linear_model.Perceptron()`)

<h2 style="text-align: center;"><b>Intro</b></h2>

Almost every machine learning alghorithm solving either *classification* or *regression* problem works this way:

1. (*initialization stage*) Human defines **Hyperparameters**, i.e. values which are not "learned" during training process 
2. (*training stage*) Algorithm is runned on data, **training** on it and tuning its **parameters** (don't confuse with *hyper*parameters) in some defined way (e.g., using *gradient decsend method* or *error correction method*), based on loss funtion. Loss function displays, how and where model is wrong
3.  (*prediction stage*) Model is ready, and now we can make **predictions** on new data 

In [None]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

<h2 style="text-align: center;"><b>Class Perceptron</b></h2>

In this section we will solve **binary classification** problem:  
- *Input data*: matrix $X$ sized $(n, m)$ and column $y$ of zeros and ones sized $(n, 1)$. Rows of matrix correspond to objects, column - features (i.e. $i$ is a set of features (*feature description*) of an object $X_i$).
- *Output data*: column $\hat{y}$ of zeros and ones sized $(n, 1)$ - algorithm predictions.

Model of neuron in biology and deep learning:  

![title](http://lamda.nju.edu.cn/weixs/project/CNNTricks/imgs/neuron.png)

\**pic from http://cs231n.github.io/neural-networks-1/*

To understand how we will update weights of the model, we need to define loss funtion which we will minimize. In this case we have binary classification problem (classes are 0 and 1). Let's take **mean square error** as our loss function:  

$$Loss(w, x) = \frac{1}{2n}\sum_{i=1}^{n} (\hat{y_i} - y_i)^2 = \frac{1}{2n}\sum_{i=1}^{n} (f(w \cdot X_i) - y_i)^2$$  

Here $w \cdot X_i$ - dot product, and $f(w \cdot X_i)$ - threshold activation:  

$$
f(z) =
\begin{cases}
1, &\text{if } w \cdot X_i > 0 \\
0, &\text{if } w \cdot X_i \le 0
\end{cases}
$$  

**Note:** It is supposed, that $b$ - free term - is a part of weights vector: $w_0$. So, if we add column of ones to the left side of $X$, we will get $b$ as a free term in dot product (figure out why it works on a piece of paper -- you will easily get it). But in our implementation of `Perceptron()` let's calculate $b$ separately (to make it clearer).

**Implement loss function $Loss$:**

In [None]:
def Loss(y_pred, y):#numpy arrays as input(n = y.size)
    return # Your code here

Since out *threshold function* doesn't have a derivative (Have you seen its plot? It looks simple, but derivative doesn't think so), we can't use gradient decsent, after all:  



$$ \frac{\partial Loss}{\partial w} = \frac{1}{n} X^T\left(f(w \cdot X) - y\right)f'(w \cdot X)$$  

where $f^{'}(w \cdot X)$ - can't be calculated in $x=0$. but we need to know how to update weights, else how to train algorithm to distinguish apples from pears?  

So let's update weights with this rule:   

$$w^{j+1} = w^{j} - \alpha\Delta{w^{j}}$$ 

where:  

$$\Delta{w} = \frac{1}{n}X^T(\hat{y} - y) = \frac{1}{n}X^T(f(w^j \cdot X) - y)$$  

(don't forget, that with $w_0 = b$ feature $x_0$ = 1), where $w \cdot X$ - matrix multiplication of column of weights $w$ with matrix of objects -features $X$, and index $j$ -- gradient descent iteration number.

This rule is some special case of gradient descent for this case
Это правило является неким частным случаем градиентного спуска для данного случая (*[Delta rule](https://en.wikipedia.org/wiki/Delta_rule)*).

Now we finally ready to write our own **`Perceptron()`**. Here is some code as a backbone. Try to use **Numpy** as much as you can since it is faster than simple python arithmetics.

*Note*: In code below `y_pred` is $\hat{y}$ from formulas above.

In [None]:
class Perceptron:
    def __init__(self, w=None, b=0):
        """
        :param: w -- weights vector
        :param: b -- bias scalar
        """
        # Let's leave an opportunity for a user to set weights and biases directly
        self.w = w
        self.b = b
        
    def activate(self, x):
        # You code here
        
    def forward_pass(self, X):
        """
        This function computes an answer of the perceptron given a set of objects
        :param: X -- matrix of objects sized (n, m), every row - separate object
        :return: vector sized (n, 1) of zeros and ones containing model answers 
        """
        # You code here
    
    def backward_pass(self, X, y, y_pred, learning_rate=0.005):
        """
        Updates weights values given objects
        :param: X -- matrix of objects sized (n, m)
                y -- right answers vector sized (n, 1)
                learning_rate - "speed of learning" (symbol alpha in formulas above)
        This method doesn't return anything, it only corrects weights using gradient
        descend.
        """
        
        # You code here
    
    def fit(self, X, y, num_epochs=300):
        """
        Descend in a minimum
        :param: X -- matrix of objects sized (n, m)
                y -- right answers vector sized (n, 1)
                num_epochs -- number of training steps
        :return: Loss_values -- vector of loss values
        """
        self.w = np.zeros((X.shape[1], 1))  # column (m, 1)
        self.b = 0  # bias (free term)
        losses = []  # loss values on every step of fitting
        
        # You code here
        # for i in range(num_epochs):
        
        return losses

Class is ready. Let's check, if the code is right. Below are several cells with test code. Your goal is to check if results match with answers.

**Check forward_pass():**

In [None]:
w = np.array([1., 2.]).reshape(2, 1)
b = 2.
X = np.array([[1., 2., -1.], [3., 4., -3.2]])

perceptron = Perceptron(w, b)
y_pred = perceptron.forward_pass(X.T)
print ("y_pred = " + str(y_pred))

|Must be||
|------|-------|
|**y_pred**|[1, 1, 0]|

**Check backward_pass():**

In [None]:
y = np.array([1, 0, 1]).reshape(3, 1)

In [None]:
perceptron.backward_pass(X.T, y, y_pred)

print ("w = " + str(perceptron.w))
print ("b = " + str(perceptron.b))

|Must be||
|-|-|
|**w**| [[ 0.995], [1.988]] |
|**b**| 2.0 |

Let's check how loss function changees during learning process on real data - dataset "Apples and pears":

In [None]:
data = pd.read_csv("./data/apples_pears.csv")

In [None]:
data.head()

In [None]:
plt.figure(figsize=(10, 8))
plt.scatter(data.iloc[:, 0], data.iloc[:, 1], c=data['target'], cmap='rainbow')
plt.title('Apples and pears', fontsize=15)
plt.xlabel('simmetry', fontsize=14)
plt.ylabel('yellowness', fontsize=14)
plt.show();

**Question:** Which class correspons to apples (which color are they on the plot)?

**Answer:** < Your answer >

Extract questions(features) and answers(targets)

In [None]:
X = data.iloc[:,:2].values  # matrix of objects-features
y = data['target'].values.reshape((-1, 1))  # classes (column of zeros and ones)

**Loss plot**  
Loss function should decrease and as a result is should be close to zero

In [None]:
%%time
perceptron = # Your code here
losses = # Your code here

plt.figure(figsize=(10, 8))
plt.plot(losses)
plt.title('Loss function', fontsize=15)
plt.xlabel('Iteration number', fontsize=14)
plt.ylabel('$Loss(\hat{y}, y)$', fontsize=14)
plt.show()

Let's see how perceptron classifies objects from dataset

In [None]:
plt.figure(figsize=(10, 8))
plt.scatter(data.iloc[:, 0], data.iloc[:, 1], c=perceptron.forward_pass(X).ravel(), cmap='spring')
plt.title('Apples and pears', fontsize=15)
plt.xlabel('symmetry', fontsize=14)
plt.ylabel('yellowness', fontsize=14)
plt.show();

It worked perfect. However, note that this is very easy linear-separated data.

<h3 style="text-align: center;"><b>Gender recognition by voice</b></h3>

In this task we are going to compare quality of our perceptron to an algorithm from a framework called `sklearn` on a dataset from website [Kaggle](https://www.kaggle.com) - [Gender Recognition by Voice](https://www.kaggle.com/primaryobjects/voicegender). In this dataset features are various characteristics of human voice and there are two classes - gender of a speaker (man/woman). You can learn more from [the page of dataset](https://www.kaggle.com/primaryobjects/voicegender). Our goal is just to compare theese 2 alghorithms.

**! Notice, the name of the class from sklearn is skPerceptron** (it's imported with different name in order to avoid name collision with our class Perceptron)

In [None]:
import pandas as pd
from sklearn.linear_model import Perceptron as skPerceptron
from sklearn.metrics import accuracy_score

In [None]:
data_path = './data/voice.csv'
data = pd.read_csv(data_path)
data['label'] = data['label'].apply(lambda x: 1 if x == 'male' else 0)

In [None]:
data.head()

In [None]:
# Shuffle the data - data is sorted by column 'label' in the file
data = data.sample(frac=1)

In [None]:
X_train = data.iloc[:int(len(data)*0.7), :-1]  # matrix objects-features
y_train = data.iloc[:int(len(data)*0.7), -1]  # ground-true label (man/woman)

X_test = data.iloc[int(len(data)*0.7):, :-1]  # matrix objects-features
y_test = data.iloc[int(len(data)*0.7):, -1]  # ground-true label (man/woman)

Here we train our perceptron and perceptron from `sklearn` on this data:


In [None]:
perceptron_sk = skPerceptron()
perceptron_my = Perceptron()

In [None]:
perceptron_sk.fit(X.train.values, y.train.values)
perceptron_my.fit(X.train.values, y.train.values)

Let's check their accuracy on test part of dataset<br>
Here by **accuracy** we understand (right answers / all answers)

In [None]:
print('Accuracy of our perceptron: {:d}'.format(accuracy_score(<Your code here>) * 100))
print('Accuracy of perceptron from sklearn: {:.1f} %'.format(accuracy_score(<Your code here>) * 100))

**Question:** Does perceptron show good results? What is your opinion why? You can answer any thoughts on it.

**Answer:**< Your answer >

### Important

Worth mentioning that perceptron isn't realy used in practical applications. We demonstrated it to you to show you, what is the origin of current state-of-the-art model. Actually one neuron with threshold activation funtion is neither used in multilayer neural networks or any real applications. However it plays an important role in learning how weights of the models update given errors and lets us to introduce you to much more useful neuron with other **smooth activation functions**.

<h2 style="text-align: center;"><b>Useful links</b></h2>

1). Lecture Notes of Stanford university: http://cs231n.github.io/neural-networks-1/  
2). [Wikipedia about perceptron](https://en.wikipedia.org/wiki/Perceptron)