# reorder columns - pass a list as a list and index
#order we want
cols = ['Student', 'Class','Gender', 'Math', 'Science', 'English','History']
# overwrite the old dataframe with the same dataframe but new column order
grades = grades[cols]

## PERCEPTRON CLASSIFICATION

In [1]:
from sklearn import datasets
import numpy as np

In [2]:
iris = datasets.load_iris()
X = iris.data[:, [2, 3]]
y = iris.target
print('Class labels:', np.unique(y))
# 

Class labels: [0 1 2]


Using the train_test_split function from scikit-learn's model_selection module, we randomly split the X and y arrays into 30 percent test data (45 examples) and 70 percent training data (105 examples).

Note that the "train_test_split" function already shuffles the training datasets internally before splitting; otherwise, all examples from class 0 and class 1 would have ended up in the training datasets, and the test dataset would consist of 45 examples from class 2. Via the random_state parameter, we provided a fixed random seed ("random_state=1") for the internal pseudo-random number generator that is used for shuffling the datasets prior to splitting. Using such a fixed "random_state" ensures that our results are reproducible. Lastly, we took advantage of the built-in support for stratification via "stratify=y". In this context, stratification means that the train_test_split method returns training and test subsets that have the same proportions of class labels as the input dataset(same amount of classifications (0,1 or 2) per segment(train or test)). 

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

sc = StandardScaler() #instance of StandardScaler object
sc.fit(X_train) # Using the fit method, StandardScaler estimated the parameters, 𝜇 (sample mean) and 𝜎 (standard deviation), 
                # for each feature dimension from the "training data". 
    
X_train_std = sc.transform(X_train) # once StandardScaler fitted, we proceed to apply/transform our training data 
                                    # and stored it on new variable.
X_test_std = sc.transform(X_test)   # and same process applied to "test data"

Having standardized the training data, we can now train a perceptron model. Most algorithms in scikit-learn already support multiclass classification by default via the one-vs.-rest (OvR) method, which allows us to feed the three flower classes to the
perceptron all at once.

In [4]:
from sklearn.linear_model import Perceptron

ppn = Perceptron(eta0=0.1, random_state=1) # Here, the model parameter, eta0, is equivalent to the learning rate, eta, that we 
                                           # used in our own perceptron implementation, and the n_iter parameter defines the 
                                           # number of epochs (passes over the training dataset).
                                           # "random_state" for seed purposes and "eta0" is based on trial an error(so far...).
        
ppn.fit(X_train_std, y_train)            # after obj initialization we proceed to fit the model with data from previous steps



Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=0.1,
      fit_intercept=True, max_iter=None, n_iter=None, n_iter_no_change=5,
      n_jobs=None, penalty=None, random_state=1, shuffle=True, tol=None,
      validation_fraction=0.1, verbose=0, warm_start=False)

In [5]:
 y_pred = ppn.predict(X_test_std)   # we can make predictions via the predict method

print('Misclassified examples: %d' % (y_test != y_pred).sum()) # comparing real values on "y_test" vs our result from previous line


# Instead of the misclassification error, many machine learning practitioners report the classification accuracy of a model
# scikit-learn implements a large variety of different performance metrics that are available via the metrics module including 
# accuracy of a model

from sklearn.metrics import accuracy_score
print('Scikit-learn metric Accuracy: %.3f' % accuracy_score(y_test, y_pred))

# Alternatively, each classifier in scikit-learn has a score method, which computes a classifier's prediction accuracy 
# by combining the predict call with accuracy_score
print('Scikit-learn Score Method: %.3f' % ppn.score(X_test_std, y_test))

Misclassified examples: 11
Scikit-learn metric Accuracy: 0.756
Scikit-learn Score Method: 0.756
