# Adaptive linear neuron (Adaline) model
The difference betwwen the perceptron and the Adaline neuron is that the activation function of the Adaline neuron is a linear function rather than step function as in the case of the perceptron.

The activation function $\phi(\mathbf{w}^{T} \mathbf{x})$ of the Adaline neuron is a linear function: <br>
$\phi(\mathbf{w}^{T} \mathbf{x}) = \mathbf{w}^{T} \mathbf{x}$

## The objective function
How do we update the weights? We should probably update them in such way that decreases the difference between the observed output and the predicted output. However, we probably are interested in minimization of some *norm* of this difference. <br>

The typical cost function that is widely used is the **Sum of Squared Errors (SSE)** between the observed output and the predicted output. This function takes the form of <br>
$J(\mathbf{w}) = \frac{1}{2} \sum_{i}{\left( y^{(i)} - \phi \left(z^{(i)} \right) \right)^{2}}$

## Minimization of the objective function using gradient descent
A simple, yet powerful algorithm for minimization of the cost function is **gradient descent** algorithm. Here, we aim to find the weights that minimize the objective function $J(\mathbf{w})$. *The "optimality" of this method comes from the fact that the gradient of a scalar field creates the vector that points in the direction of the steepest ascent of the function - try to prove this)*. <br>

The weights are updated as follows: <br>
$\mathbf{w} := \mathbf{w} + \Delta \mathbf{w}$ 
<br>
$\Delta \mathbf{w} = - \eta \nabla J(\mathbf{w})$ <br>

The component of the gradient of the cost function in the "direction" of each weight can then be written as <br> 
$\frac{\partial{J}}{\partial{w_{j}}} = - \sum_{i}{\left( y^{(i)} - \phi \left(z^{(i)} \right)  \right) x^{(i)}_{j}}$
<br> <br>
Hence the weight update can be written as 
$\Delta w_{j} = - \eta \sum_{i}{\left( y^{(i)} - \phi \left(z^{(i)} \right)  \right) x^{(i)}_{j}}$
<br> <br>
The weights are updated simultaneuosly, based on all samples in the training set, and this is why the approach is also sometimes called **batch gradient descent**.

In [None]:
# Implementation of the Adaline in Python
import numpy as np

class AdalineGD(object):
    """ADaptive LInear NEuron classifier.
    
    Parameters
    -----------
    
    eta : float
        Learning rate (between 0.0 and 1.0)
    n_iter : int
        Passes over the training dataset,
    random_state : int
        Random number generator seed for random weight initialization.
    
    Attributes
    ----------
    w_ : 1d-array
        Weights after fitting.
    cost_ : list
        Sum-of-squares cost function value in each epoch.
        
    """
    def __init__(self, eta=0.01, n_iter=50, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state
    
    def fit(self, X, y):
        """Fit training data.
        
        Parameters
        ----------
        X : {array-like}, shape = {n_samples, n_features}
        Training vectors, where n_samples is the number of samples and 
        n_features is the number of features.
        y : array-like, shape = {n_samples}
        
            Target values.
        
        Returns
        ----------
        self : object
        
        """
        
        rgen = np.random.RandomState(self.random_state)
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size =1+X.shape[1])
        
        self.cost_ = []
        
        for i in range(self.n_iter):
            net_input = self.net_input(X)
            output = self.activation(net_input)
            errors = (y - output)
            self.w_[1:] += self.eta * X.T.dot(errors)
            self.w_[0] += self.eta * errors.sum()
            cost = (errors**2).sum() / 2.0
            self.cost_ = append(cost)
        return self
    
    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_[1:]) + self.w_[0]
    
    def activation(self, X):
        """Compute linear activation"""
        return X
    
    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(self.net_input(X)) >= 0.0, 1, -1)