This jupyter notebook is prepared by [Chun-Kit Yeung](https://ckyeungac.com)

# Introduction
**What do we cover in this tutorial?**

In this tutorial, we will use the Perceptron and ADALINE to tackle the iris classification problem introduced in tutorial 2. However, since the iris dataset has 3 labels while both perceptron and ADALINE are a binary classifier, we will only use the first two classes in the iris dataset.

# Preprocess the iris data

In [1]:
import numpy as np
from sklearn.datasets import load_iris

# load the iris dataset
iris = load_iris()

In [2]:
iris.target[:100] # the first 100 elements are either 0, or 1

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1])

In [3]:
# select the target data (first 100 data points) with slicing
X_iris = iris.data[:100]
y_iris = iris.target[:100]

In [4]:
# spliting the data into training set and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_iris, y_iris, test_size=0.2, random_state=42)

In [5]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((80, 4), (20, 4), (80,), (20,))

# Using Perceptron from sklearn
Here, we first train a perceptron (documentation can be found [here](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html)) provided by scikit-learn.

In [6]:
# import the Perceptron library from sklearn and train the perceptron
from sklearn.linear_model import Perceptron
# import the accuracy_score from sklearn
from sklearn.metrics import accuracy_score

In [7]:
# Define and train the perceptron
clf = Perceptron(fit_intercept=True, eta0=0.5) # eta0 is the learning rate
clf.fit(X_train, y_train)

Perceptron(alpha=0.0001, class_weight=None, eta0=0.5, fit_intercept=True,
      n_iter=5, n_jobs=1, penalty=None, random_state=0, shuffle=True,
      verbose=0, warm_start=False)

In [8]:
y_pred_train = clf.predict(X_train)
accuracy_score(y_train, y_pred_train)

1.0

In [9]:
y_pred_test = clf.predict(X_test)
accuracy_score(y_test, y_pred_test)

1.0

# Using ADALINE from sklearn
**Unfortunately, there is no ADALINE in scikit-learn.** Yet, ADALINE is simply just a special case of artificial neural network with only 1 input layer and 1 output layer. As an altenative, we simply consider **a multi-layer perceptron (MLP) with no hidden layer** to be the ADALINE (even thought they are not exactly the same.)

Yet, so as to align with the course materials, I implemented a class called AdalineSGD (available in ''*DIY: Build your own ADALINE with stochastic gradient descent*'') for you to use.

In [10]:
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(
    hidden_layer_sizes=(), # empty tuple -> no hidden layer used.
    activation='tanh', # tanh return a value from [-1, 1], similar to step function
    solver='sgd', # stochastic gradient descent
)
clf.fit(X_train, y_train)

MLPClassifier(activation='tanh', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='sgd', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [11]:
y_pred_train = clf.predict(X_train)
accuracy_score(y_train, y_pred_train)

1.0

In [12]:
y_pred_test = clf.predict(X_test)
accuracy_score(y_test, y_pred_test)

1.0

# DIY: Build your own ADALINE with stochastic gradient descent

In [13]:
from numpy.random import seed

class AdalineSGD(object):
    """ADAptive LInear NEuron classifier.

    Parameters
    ------------
    eta : float
        Learning rate
    n_iter : int
        Number of iteration passes over the training dataset.

    Attributes
    -----------
    w_ : 1d-array
        Weights after fitting.
    errors_ : list
        Number of misclassifications in every epoch.
    shuffle : bool (default: True)
        Shuffles training data every epoch if True to prevent cycles.
    random_state : int
        Set random state for shuffling and initializing the weights.
        
    """
    def __init__(self, eta=0.01, n_iter=50, shuffle=True, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.w_initialized = False
        self.shuffle = shuffle
        if random_state:
            seed(random_state)
            
    def _shuffle(self, X, y):
        """Shuffle training data"""
        r = np.random.permutation(len(y))
        return X[r], y[r]
    
    def _initialize_weights(self, m):
        """Randomly initialize weights"""
        self.w_ = np.random.normal(loc=0.0, scale=0.01, size=1 + m)
        self.w_initialized = True
            
    def _update_weights(self, xi, target):
        """Apply Adaline learning rule to update the weights for a single sample."""
        output = self.activation(xi)
        error = (target - output)
        self.w_[1:] += self.eta * xi.dot(error)
        self.w_[0] += self.eta * error
        cost = 0.5 * error**2
        return cost
    
    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_[1:]) + self.w_[0] 

    def activation(self, X):
        """Compute linear activation"""
        return self.net_input(X)
    
    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(X) >= 0.0, 1, 0)
        
    def fit(self, X, y):
        """ Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
            Training vectors, where n_samples is the number of samples and
            n_features is the number of features.
        y : array-like, shape = [n_samples]
            Target values.

        Returns
        -------
        self : object

        """
        # initialize the weight
        self._initialize_weights(X.shape[1])
        
        # make a copy of y: y_
        y_ = np.copy(y) # e.g. it is np.array([1,1,0,1,0,0,1])
        
        #convert the value of 0 to -1 in y_
        y_[y_==0] = -1 # e.g. it is now np.array([1,1,-1,1,-1,-1,1])
        
        for i in range(self.n_iter):
            # shuffle the data if required
            if self.shuffle:
                X, y_ = self._shuffle(X, y_)
            
            # update the weight sample by sample
            for xi, target in zip(X, y_):
                self._update_weights(xi, target)
            
        return self

In [14]:
clf = AdalineSGD()
clf.fit(X_train, y_train)

<__main__.AdalineSGD at 0x7f4764a99320>

In [15]:
y_pred_train = clf.predict(X_train)
accuracy_score(y_train, y_pred_train)

1.0

In [16]:
y_pred_test = clf.predict(X_test)
accuracy_score(y_test, y_pred_test)

1.0

In [17]:
# see what is the learnt weight
clf.w_

array([-0.11398174, -0.12280366, -0.31600092,  0.50167422,  0.40759171])

# DIY: Build your own Preceptron

In [18]:
class Perceptron2(object):
    """Perceptron classifier.
    
    Parameters
    ------------
    eta: float 
        Learning rate (between 0.0 and 1.0)
    n_iter: int
        Number of epochs, i.e. number of iteration passes over the training dataset.
        
    Attributes
    ------------
    w_: 1d-array
        Weights after fitting.
    errors_: list
        Number of misclassifications in every epoch.
    random_state : int
        The seed of the pseudo random number generator.
    """
    
    def __init__(self, eta=0.01, n_iter=100, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.w_initialized = False
        self.random_state = random_state
        if random_state:
            seed(random_state)
            
    def _initialize_weights(self, m):
        """Randomly initialize weights"""
        self.w_ = np.random.normal(loc=0.0, scale=0.01, size=1 + m)
        self.w_initialized = True
    
    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_[1:]) + self.w_[0]
    
    def _predict(self, X):
        """Return class label after unit step for internal usage."""
        return np.where(self.net_input(X) >= 0.0, 1, -1)
    
    def predict(self, X):
        """Return class label after unit step for external usage."""
        return np.where(self.net_input(X) >= 0.0, 1, 0)
                
    def _update_weights(self, xi, target):
        """Apply Perceptron learning rule to update the weights for a single sample."""
        output = self._predict(xi)
        self.w_[1:] += self.eta * xi * (target - output)
        self.w_[0] += self.eta * (target - output)
    
    def fit(self, X, y):
        """Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
            Training vectors, where n_samples is the number of samples and
            n_features is the number of features.
        y : array-like, shape = [n_samples]
            Target values.

        Returns
        -------
        self : object

        """
        # initialize the weight
        self._initialize_weights(X.shape[1])
        
        # make a copy of y: y_
        y_ = np.copy(y)
        
        #convert the value of 0 to -1 in y_
        y_[y_==0] = -1
        
        for _ in range(self.n_iter):
            for xi, target in zip(X, y_):
                self._update_weights(xi, target)
        return self

In [19]:
clf = Perceptron2()
clf.fit(X_train, y_train)

<__main__.Perceptron2 at 0x7f4764a99e48>

In [20]:
y_pred_train = clf.predict(X_train)
accuracy_score(y_train, y_pred_train)

1.0

In [21]:
y_pred_test = clf.predict(X_test)
accuracy_score(y_test, y_pred_test)

1.0

In [22]:
# see what is the learnt weight
clf.w_

array([-0.00375655, -0.03611756, -0.14128172,  0.18927031,  0.08665408])