This jupyter notebook is prepared by [Chun-Kit Yeung](https://ckyeungac.com)

# Introduction
**What do we cover in this tutorial?**

In this tutorial, we will use the Perceptron and ADALINE to tackle the iris classification problem introduced in tutorial 2. However, since the iris dataset has 3 labels while both perceptron and ADALINE are a binary classifier, we will only use the first two classes in the iris dataset.

# Preprocess the iris data

In [None]:
import numpy as np
from sklearn.datasets import load_iris

# load the iris dataset
iris = load_iris()

In [None]:
iris.target[:100] # the first 100 elements are either 0, or 1

In [None]:
# select the target data (first 100 data points) with slicing
X_iris = iris.data[:100]
y_iris = iris.target[:100]

In [None]:
# spliting the data into training set and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_iris, y_iris, test_size=0.2, random_state=42)

In [None]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

# Using Perceptron from sklearn
Here, we first train a perceptron (documentation can be found [here](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html)) provided by scikit-learn.

In [None]:
# import the Perceptron library from sklearn and train the perceptron
from sklearn.linear_model import Perceptron
# import the accuracy_score from sklearn
from sklearn.metrics import accuracy_score

In [None]:
# Define and train the perceptron
clf = Perceptron(fit_intercept=True, eta0=0.5) # eta0 is the learning rate
clf.fit(X_train, y_train)

In [None]:
y_pred_train = clf.predict(X_train)
accuracy_score(y_train, y_pred_train)

In [None]:
y_pred_test = clf.predict(X_test)
accuracy_score(y_test, y_pred_test)

# Using ADALINE from sklearn
**Unfortunately, there is no ADALINE in scikit-learn.** Yet, ADALINE is simply just a special case of artificial neural network with only 1 input layer and 1 output layer. As an altenative, we simply consider **a multi-layer perceptron (MLP) with no hidden layer** to be the ADALINE (even thought they are not exactly the same.)

Yet, so as to align with the course materials, I implemented a class called AdalineSGD (available in ''*DIY: Build your own ADALINE with stochastic gradient descent*'') for you to use.

In [None]:
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(
    hidden_layer_sizes=(), # empty tuple -> no hidden layer used.
    activation='tanh', # tanh return a value from [-1, 1], similar to step function
    solver='sgd', # stochastic gradient descent
)
clf.fit(X_train, y_train)

In [None]:
y_pred_train = clf.predict(X_train)
accuracy_score(y_train, y_pred_train)

In [None]:
y_pred_test = clf.predict(X_test)
accuracy_score(y_test, y_pred_test)

# DIY: Build your own Preceptron

In [None]:
class Perceptron2(object):
    """Perceptron classifier.
    
    Parameters
    ------------
    eta: float 
        Learning rate (between 0.0 and 1.0)
    n_iter: int
        Number of epochs, i.e. number of iteration passes over the training dataset.
        
    Attributes
    ------------
    w_: 1d-array
        Weights after fitting.
    errors_: list
        Number of misclassifications in every epoch.
    random_state : int
        The seed of the pseudo random number generator.
    """
    
    def __init__(self, eta=0.01, n_iter=100, random_state=1):
        """Initialize the class variables"""
        # Todo
        pass
            
    def _initialize_weights(self, m):
        """Randomly initialize weights"""
        # Todo
        pass
    
    def net_input(self, X):
        """Calculate net input"""
        # Todo
        pass
    
    def _predict(self, X):
        """Return class label after unit step for internal usage."""
        # Todo
        pass
    
    def predict(self, X):
        """Return class label after unit step for external usage."""
        # Todo
        pass
                
    def _update_weights(self, xi, target):
        """Apply Perceptron learning rule to update the weights for a single sample."""
        # Todo
        pass
    
    def fit(self, X, y):
        """Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
            Training vectors, where n_samples is the number of samples and
            n_features is the number of features.
        y : array-like, shape = [n_samples]
            Target values.

        Returns
        -------
        self : object

        """
        # Todo:
        # initialize the weight
        # make a copy of y: y_
        # convert the value of 0 to -1 in y_
        # update the weight
        pass

In [None]:
clf = Perceptron2()
clf.fit(X_train, y_train)

In [None]:
y_pred_train = clf.predict(X_train)
accuracy_score(y_train, y_pred_train)

In [None]:
y_pred_test = clf.predict(X_test)
accuracy_score(y_test, y_pred_test)

# DIY: Build your own ADALINE with stochastic gradient descent

In [None]:
from numpy.random import seed

class AdalineSGD(object):
    """ADAptive LInear NEuron classifier.

    Parameters
    ------------
    eta : float
        Learning rate
    n_iter : int
        Number of iteration passes over the training dataset.

    Attributes
    -----------
    w_ : 1d-array
        Weights after fitting.
    errors_ : list
        Number of misclassifications in every epoch.
    shuffle : bool (default: True)
        Shuffles training data every epoch if True to prevent cycles.
    random_state : int
        Set random state for shuffling and initializing the weights.
        
    """
    def __init__(self, eta=0.01, n_iter=50, shuffle=True, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.w_initialized = False
        self.shuffle = shuffle
        if random_state:
            seed(random_state)
            
    def _shuffle(self, X, y):
        """Shuffle training data"""
        r = np.random.permutation(len(y))
        return X[r], y[r]
    
    def _initialize_weights(self, m):
        """Randomly initialize weights"""
        self.w_ = np.random.normal(loc=0.0, scale=0.01, size=1 + m)
        self.w_initialized = True
            
    def _update_weights(self, xi, target):
        """Apply Adaline learning rule to update the weights for a single sample."""
        output = self.activation(xi)
        error = (target - output)
        self.w_[1:] += self.eta * xi.dot(error)
        self.w_[0] += self.eta * error
        cost = 0.5 * error**2
        return cost
    
    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_[1:]) + self.w_[0] 

    def activation(self, X):
        """Compute linear activation"""
        return self.net_input(X)
    
    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(X) >= 0.0, 1, 0)
        
    def fit(self, X, y):
        """ Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
            Training vectors, where n_samples is the number of samples and
            n_features is the number of features.
        y : array-like, shape = [n_samples]
            Target values.

        Returns
        -------
        self : object

        """
        # initialize the weight
        self._initialize_weights(X.shape[1])
        
        # make a copy of y: y_
        y_ = np.copy(y) # e.g. it is np.array([1,1,0,1,0,0,1])
        
        #convert the value of 0 to -1 in y_
        y_[y_==0] = -1 # e.g. it is now np.array([1,1,-1,1,-1,-1,1])
        
        for i in range(self.n_iter):
            # shuffle the data if required
            if self.shuffle:
                X, y_ = self._shuffle(X, y_)
            
            # update the weight sample by sample
            for xi, target in zip(X, y_):
                self._update_weights(xi, target)
            
        return self

In [None]:
clf = AdalineSGD()
clf.fit(X_train, y_train)

In [None]:
y_pred_train = clf.predict(X_train)
accuracy_score(y_train, y_pred_train)

In [None]:
y_pred_test = clf.predict(X_test)
accuracy_score(y_test, y_pred_test)

In [None]:
# see what is the learnt weight
clf.w_