<img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
<h3 style="text-align: center;"><b>Phystech School of Applied Mathematics and Informatics (PSAMI) MIPT</b></h3>

---

<h2 style="text-align: center;"><b>Neuron with sigmoid</b></h2>

---

### You should solve notebook `[seminar]perceptron.ipynb` first!

In this notebook you will learn how to: 
- implement class **`Neuron()`** with sigmoid activation function
- train and validate this class on generated and real data (files with real data are in /data folder)
- compare quality of your model with models from module `scikit-learn` (`sklearn.linear_model.Perceptron()`)

In this notebook you will implement neuron with various activation functions: Sigmoid, ReLU, LeakyReLU и ELU.

In [None]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

As a reminder, **sigmoid curve (sigmoid)** looks like this:  
    
$$\sigma(x)=\frac{1}{1+e^{-z}}$$

Its plot:

In [None]:
plt.plot(np.linspace(-5, 5, 100), 1 / (1 + np.exp(-np.linspace(-5, 5, 100))))

---

**Task 1**

Derive a derivative of this function

---

In this case we are facing a binary classification problem again. Let's use the same loss function **mean square error**, but instread of threshold activation we'll use sigmoid:

$$MSE\_Loss(\hat{y}, y) = \frac{1}{n}\sum_{i=1}^{n} (\hat{y_i} - y_i)^2 = \frac{1}{n}\sum_{i=1}^{n} (\sigma(w \cdot X_i) - y_i)^2$$ 
 

Here $w \cdot X_i$ - dot product, and $\sigma(w \cdot X_i) =\frac{1}{1+e^{-w \cdot X_i}} $ - sigmoid ($i$ -- object's number in dataset).  

**Note:** It is supposed, that $b$ - free term - is a part of weights vector: $w_0$. So, if we add column of ones to the left side of $X$, we will get $b$ as a free term in dot product (figure out why it works on a piece of paper -- you will easily get it). But in our implementation of `Perceptron()` let's calculate $b$ separately (to make it clearer).

We will take a derivative of loss by weights and descend in weights space in the direction of the fastest loss decrease. Formula of updating weights in gradient descenе:

$$w^{j+1} = w^{j} - \alpha \frac{\partial Loss}{\partial w} (w^{j})$$

$w^j$ -- weights vector in $j$-th iteration

Unwrap it:

* For weight $w_j$:

$$ \frac{\partial Loss}{\partial w_j} = 
\frac{2}{n} \sum_{i=1}^n \left(\sigma(w \cdot x_i) - y_i\right)(\sigma(w \cdot x_i))_{w_j}' = \frac{2}{n} \sum_{i=1}^n \left(\sigma(w \cdot x_i) - y_i\right)\sigma(w \cdot x_i)(1 - \sigma(w \cdot x_i))x_{ij}$$

* Gradient of $Loss$ by weights vector is a vector, $j$-th component of which equals $\frac{\partial Loss}{\partial w_j}$ (again, weights number is $m$):

$$\begin{align}
    \frac{\partial Loss}{\partial w} &= \begin{bmatrix}
           \frac{2}{n} \sum_{i=1}^n \left(\sigma(w \cdot x_i) - y_i\right)\sigma(w \cdot x_i)(1 - \sigma(w \cdot x_i))x_{i1} \\
           \frac{2}{n} \sum_{i=1}^n \left(\sigma(w \cdot x_i) - y_i\right)\sigma(w \cdot x_i)(1 - \sigma(w \cdot x_i))x_{i2} \\
           \vdots \\
           \frac{2}{n} \sum_{i=1}^n \left(\sigma(w \cdot x_i) - y_i\right)\sigma(w \cdot x_i)(1 - \sigma(w \cdot x_i))x_{im}
         \end{bmatrix}
\end{align}=\frac{1}{n} X^T (\sigma(w \cdot X) - y)\sigma(w \cdot X)(1 - \sigma(w \cdot X))$$

Implement sigmoid and its derivative

In [None]:
def sigmoid(x):
    """Sigmoid function"""
    pass

In [None]:
def sigmoid_derivative(x):
    """Derivative of sigmoid function"""
    pass

Now it's time to write a neuron with sigmoid activation. Code here will be almost the same as in perceptron, but weights will update differently and different activation function

In [None]:
class Neuron:
    def __init__(self, w=None, b=0):
        """
        :param: w -- weights vector
        :param: b -- bias scalar
        """
        # Let's leave an opportunity for a user to set weights and biases directly
        self.w = w
        self.b = b
        
    def activate(self, x):
        # You code here
        
    def forward_pass(self, X):
        """
        This function computes an answer of the perceptron given a set of objects
        :param: X -- matrix of objects sized (n, m), every row - separate object
        :return: vector sized (n, 1) of zeros and ones containing model answers 
        """
        n = X.shape[0]
        y_pred = np.zeros((n, 1))  # y_pred(icted) - predicted classes
        
        # You code here
        
        return y_pred
    
    def backward_pass(self, X, y, y_pred, learning_rate=0.005):
        """
        Updates weights values given objects
        :param: X -- matrix of objects sized (n, m)
                y -- right answers vector sized (n, 1)
                learning_rate - "speed of learning" (symbol alpha in formulas above)
        This method doesn't return anything, it only corrects weights using gradient
        descend.
        """
        # Your code here
    
    def fit(self, X, y, num_epochs=300):
        """
        Descend in a minimum
        :param: X -- matrix of objects sized (n, m)
                y -- right answers vector sized (n, 1)
                num_epochs -- number of training steps
        :return: Loss_values -- vector of loss values
        """
        self.w = np.zeros((X.shape[1], 1))  # column (m, 1)
        self.b = 0  # bias (free term)
        losses = []  # loss values on every step of fitting
        
        for i in range(num_epochs):
            # Your code here
        
        return losses

### Neuron testing

Here your goal is to test our new neuron **on the same data ("Apples and pears")** the same way as you did with perceptron (you can freely copy code, but be aware, some of code may be different).
As the result your goal is to get: 
* plot, showing changes of $Loss$ function depending on iteratins number
* plot with coloring of dataset by neuron with sigmoid


**Check forward_pass()**

In [None]:
w = np.array([1., 2.]).reshape(2, 1)
b = 2.
X = np.array([[1., 3.],
              [2., 4.],
              [-1., -3.2]])

neuron = Neuron(w, b)
y_pred = neuron.forward_pass(X)
print ("y_pred = " + str(y_pred))

|Should be||
|------|-------|
|**y_pred**|[0.99987661, 0.99999386,0.00449627]|

**Check backward_pass()**

In [None]:
y = np.array([1, 0, 1]).reshape(3, 1)

In [None]:
neuron.backward_pass(X, y, y_pred)

print ("w = " + str(neuron.w))
print ("b = " + str(neuron.b))

|Shold be||
|------|-------|
|**w**|[0.99985106, 1.99952388]|
|**b**|2.000148326741343|

See how it performs on the data (use cells from Perceptron notebook)

In [None]:
# You code here (use multiple cells)

---

In **homework** you will need to implement `Neuron()` with other activation functions: ReLu, LeakyReLU, ELU, SeLU, Swish (some of them). Weights and values of loss functions will be checked in the test.

<h3 style="text-align: center;"><b>Useful links</b></h3>

0). You must check this artcile by Stanford: http://cs231n.github.io/neural-networks-1/

1). Great article on activation functions: https://www.jeremyjordan.me/neural-networks-activation-functions/

2). [Video by Siraj Raval](https://www.youtube.com/watch?v=-7scQpJT7uo)

3). Modern paper on activation functions. One of the hype functions is $swish(x) = x\sigma (\beta x)$: https://arxiv.org/pdf/1710.05941.pdf (by the way, *neural acrhitecture search* was used in search of this function)

4). **SeLU** has some interesting properties, proven with probability theory: https://arxiv.org/pdf/1706.02515.pdf (yes, this paper consists of 102 pages)