## A "Hello World" Example of Machine Learning - Revisit

Loading the Iris dataset from scikit-learn. 

The first column represents Sepal length, 

the second column represents Sepal width,  

the third column represents the petal length, 

and the fourth column the petal width 

....of the flower samples. 

The classes (type of species) are already converted to integer labels where:

0=Iris-Setosa, 1=Iris-Versicolor, 2=Iris-Virginica.

Here, we are using only two features: the third and fourth columns. 

In [137]:
from sklearn import datasets
import numpy as np


iris = datasets.load_iris()
iris.data


array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [138]:
X = iris.data[:, [0, 1, 2, 3]]
#X = iris.data[:, [0, 3]]


In [139]:
y = iris.target

In [140]:
print('Class labels:', np.unique(y))

Class labels: [0 1 2]


Scikit-learn algorithms support multi-class classification via the One-Versus-Rest(OvR) method. 

Splitting data into 70% training and 30% test data:

In [141]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=1, stratify=y)

In [142]:
print('Labels counts in y:', np.bincount(y))
print('Labels counts in y_train:', np.bincount(y_train))
print('Labels counts in y_test:', np.bincount(y_test))

Labels counts in y: [50 50 50]
Labels counts in y_train: [35 35 35]
Labels counts in y_test: [15 15 15]


### Standardizing the features:

In [143]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler() #center the distribution around zero (mean), with a standard deviation of 1.
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

In [144]:
from sklearn.linear_model import Perceptron

ppn = Perceptron(max_iter=100, eta0=0.1, random_state=42)
ppn.fit(X_train_std, y_train)

Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=0.1,
           fit_intercept=True, max_iter=100, n_iter_no_change=5, n_jobs=None,
           penalty=None, random_state=42, shuffle=True, tol=0.001,
           validation_fraction=0.1, verbose=0, warm_start=False)

### Test the model with the hold-out test set

In [145]:
y_pred = ppn.predict(X_test_std)
print('Misclassified samples: ' + str((y_test != y_pred).sum()))

Misclassified samples: 3


In [146]:
from sklearn.metrics import accuracy_score

print('Accuracy: ' + str(accuracy_score(y_test, y_pred)))

Accuracy: 0.9333333333333333


In [147]:
#X_new = [[1.1, 0.2,],[0.4, 1.9], [1.4, 0.2]]
X_new = [[4.1, 3.2, 1.2, 0.3],[3.4, 2.9, 1.8, 0.7], [4.4, 3.4, 1.1, 0.4]]

y_new = ppn.predict(X_new)
y_new

array([1, 1, 1])

### Evaluate the model using cross validation

In [148]:
from sklearn.model_selection import cross_val_score
cross_val_score(ppn, X_train_std, y_train, cv=4, scoring="accuracy")

array([0.92592593, 0.77777778, 0.66666667, 0.875     ])

In [149]:
help('sklearn.model_selection.cross_val_score')

Help on function cross_val_score in sklearn.model_selection:

sklearn.model_selection.cross_val_score = cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv='warn', n_jobs=None, verbose=0, fit_params=None, pre_dispatch='2*n_jobs', error_score='raise-deprecating')
    Evaluate a score by cross-validation
    
    Read more in the :ref:`User Guide <cross_validation>`.
    
    Parameters
    ----------
    estimator : estimator object implementing 'fit'
        The object to use to fit the data.
    
    X : array-like
        The data to fit. Can be for example a list, or an array.
    
    y : array-like, optional, default: None
        The target variable to try to predict in the case of
        supervised learning.
    
    groups : array-like, with shape (n_samples,), optional
        Group labels for the samples used while splitting the dataset into
        train/test set.
    
    scoring : string, callable or None, optional, default: None
        A string (see mode

2-features: sccuracy score: array([0.88888889, 0.48148148, 0.7037037 , 0.95833333])
...but really we got (for 2-features): array([0.88888889, 0.92592593, 0.77777778, 0.79166667])

### Exercise 1: What if we use all four features to train the model? Are the results better? Why? 
I would say that our results were better when we used all four features to train the model as opposed to just two-- as can be seen in the fact that our number of misclassified samples dropped from 10 to 3, when we added all four features to our training data and our accuracy went from 0.7777777777777778 to 0.9333333333333333.

This is likely due, largely to the fact that the values associated with the samples within the other 2 features we added were likely strongly correlated to the specific features that they were labeled with-- which all helps our machine learning algorithm to better determine how best to categorize each of the samples.

### Exercise 2: Try with the scikit-learn stochastic gradient descent model instead of perceptron. Use all four features. Evaluate with cross-validation how does the model perform in terms of accuracy using both two features and four features. 


...so in conclusion, in answer to Exercise 2's primary question:
 "Try with the scikit-learn stochastic gradient descent model instead of perceptron.
 Use all four features. 
 Evaluate with cross-validation 
 how does the model perform in terms of accuracy using both 
 two features and four features?"

 When were using 2 features with the scikit-learn stochastic gradient descent model
 we got the following number of misclassified samples and accuracy:
    
 Misclassified samples: 1
 Accuracy: 0.9777777777777777

 ......whereas with four features we got this:
 Misclassified samples: 5
 Accuracy: 0.8888888888888888

 --as stated earlier this was our conclusions regarding perceptron:

"I would say that our results were better when we used all four features to train the model as opposed to just two-- as can be seen in the fact that our number of misclassified samples dropped from 10 to 3, when we added all four features to our training data and our accuracy went from 0.7777777777777778 to 0.9333333333333333."


Therefore the data would lead me to conclude that we got more accurate results using the stochastic gradient descent model when we were only applying 2 features, whereas when 4 features were used, perceptron was more accurate.  

However-- I think more broadly we might conclude that the stochastic gradient descent model was-- on average, a bit more accurate than the perceptron model when all the accuracy data is examined.



In [150]:
# scikit-learn stochastic gradient descent model
from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier(random_state=42)

sgd.fit(X_train_std, y_train)

#ppn = SGDClassifier(loss='perceptron')
#lr = SGDClassifier(loss='log')
#svm = SGDClassifier(loss='hinge')



SGDClassifier(alpha=0.0001, average=False, class_weight=None,
              early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
              l1_ratio=0.15, learning_rate='optimal', loss='hinge',
              max_iter=1000, n_iter_no_change=5, n_jobs=None, penalty='l2',
              power_t=0.5, random_state=42, shuffle=True, tol=0.001,
              validation_fraction=0.1, verbose=0, warm_start=False)

In [134]:
y_pred = sgd.predict(X_test_std)
print('Misclassified samples: ' + str((y_test != y_pred).sum()))

from sklearn.metrics import accuracy_score

print('Accuracy: ' + str(accuracy_score(y_test, y_pred)))

#X_new = [[1.1, 0.2,],[0.4, 1.9], [1.4, 0.2]]
X_new = [[4.1, 3.2, 1.2, 0.3],[3.4, 2.9, 1.8, 0.7], [4.4, 3.4, 1.1, 0.4]]

y_new = sgd.predict(X_new)
y_new

from sklearn.model_selection import cross_val_score
cross_val_score(sgd, X_train_std, y_train, cv=4, scoring="accuracy")





Misclassified samples: 1
Accuracy: 0.9777777777777777


ValueError: X has 4 features per sample; expecting 2

In [135]:
X_new = [[1.1, 0.2,],[0.4, 1.9], [1.4, 0.2]]


In [151]:


    
# scikit-learn stochastic gradient descent model
from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier(random_state=42)

sgd.fit(X_train_std, y_train)

y_pred = sgd.predict(X_test_std)
print('Misclassified samples: ' + str((y_test != y_pred).sum()))

from sklearn.metrics import accuracy_score
print('Accuracy: ' + str(accuracy_score(y_test, y_pred)))


#X_new = [[1.1, 0.2,],[0.4, 1.9], [1.4, 0.2]]
X_new = [[4.1, 3.2, 1.2, 0.3],[3.4, 2.9, 1.8, 0.7], [4.4, 3.4, 1.1, 0.4]]

y_new = sgd.predict(X_new)
y_new

from sklearn.model_selection import cross_val_score
cross_val_score(sgd, X_train_std, y_train, cv=4, scoring="accuracy")




Misclassified samples: 5
Accuracy: 0.8888888888888888


array([1.        , 0.88888889, 0.88888889, 0.83333333])