# Classification using Neural Networks

<img style="float: left;" width="600" src="images/brain.jpg">

## Perceptron

* In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers.
* It is a type of linear classifier (a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector)
    * $\mathbf {w} \cdot \mathbf {x} +b$

<img style="float: left;" width="600" src="images/perceptron_node.png">

#### Activation function

* The output of a perceptron is a linear function of the input
    * However, we want only to trigger the neuron when some condition is fulfilled (that is how our brain works)
    * The activation function allows to mimic that behavior, avoiding -inf/+inf as output
* In artificial neural networks, the activation function is a equation that defines the output of that node given an input or set of inputs
* For the perceptron, activation function is defined as:
$f(\mathbf {x} )={\begin{cases}1&{\text{if }}\ \mathbf {w} \cdot \mathbf {x} +b>0,\\0&{\text{otherwise}}\end{cases}}$

* where $\mathbf {w}$  is a vector of real-valued weight and $b$ is the bias. 

    * The bias shifts the decision boundary away from the origin

* Alternative activation functions:
    * Sigmoid or Logistic
    * Tanh — Hyperbolic tangent
    * ReLu -Rectified linear units
    

In [2]:
import numpy as np # package for scientific computing in Python (similar to Matlab)

def perceptron(x, w, b):
    return ((w * x) + b)

def activation(perceptron_values):
    y = perceptron_values >= 0.0 #returns 1 if value > 0, otherwise returns 0
    return y



### Loss Function

* For perceptrons, hinge loss is used as the loss function:

\begin{equation}\mathcal{L}(y) = \max(0, 1 - y \cdot \hat{y}) \end{equation}

* Lets write the perceptron output again:

\begin{equation} f(\mathbf {x} ) = \hat{y} = {\begin{cases}1&{\text{if }}\ \mathbf {w} \cdot \mathbf {x} +b>0,\\0&{\text{otherwise}}\end{cases}} \end{equation}

* The derivatives with respect of  $w$ and $b$ are:

\begin{equation} \frac{\partial \mathcal{L} }{\partial w}=\begin{cases}-y\cdot x&{\text{if }} y \cdot \hat{y}<1\\0&{\text{otherwise}}\end{cases} \end{equation}

TEM QUE RESOLVER ESSA COISA!
LOSS NAO CHEGA NA REGRA DE ATUALIZACAO

### Training

* Training a perceptron means updating the weight and bias vectors in order to minimize the loss function
* The minimization can be achieved by using gradient descent
* Gradient Descent algorithm:

\begin{equation} \theta' = \theta - \alpha \frac{\partial J(\theta)}{\partial \theta_i} \begin{equation}

* We have a linear function $\mathbf{w} \cdot \mathbf {x} +b$, therefore the update rule with respect to $w$ is:

\begin{equation} \mathbf{w}' = \mathbf{w} - \alpha \frac{\partial J(\mathbf{w})}{\partial \mathbf{w}} \end{equation}

\begin{equation} \mathbf{w}' = \mathbf{w} - \alpha (y - \hat{y}) * x\end{equation}



In [3]:
def update_weights(x, y, w, learning_rate):
    w = w + learning_rate * (expected - predicted) * x
    return w

def update_weights(x, y, b, learning_rate):
    bias = bias + learning_rate * (expected - predicted)
    return b

def train(x, y, n_steps)