######  The University of Melbourne, School of Computing and Information Systems
# COMP90049 Introduction to Machine Learning, 2021 Semester 1

## Week 6

### NOTE:  You will need the newer (18.1) build of `scikit-learn` for its neural network support.


### Exercise 1.
The Multilayer Perceptron is available from (newer builds of) `scikit-learn` as `sklearn.neural_network.MLPClassifier`.


In [2]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.neural_network import MLPClassifier
from collections import Counter

### Exercise 1.(a) 
Build a default Multilayer Perceptron to classify the `Iris` data. Evaluate its cross-validation accuracy.

In [3]:
iris = datasets.load_iris()
X = iris.data
y = iris.target
print('X:', X.shape, 'y:', set(y))


clf = MLPClassifier(max_iter=2000)

print('corss-val acc:', np.mean(cross_val_score(clf, X, y, cv=5)))
clf.fit(X, y)


X: (150, 4) y: {0, 1, 2}
corss-val acc: 0.9800000000000001


MLPClassifier(max_iter=2000)

In [10]:
print(type(X))

<class 'numpy.ndarray'>


In [8]:
from sklearn.feature_selection import SelectKBest, chi2

x2 = SelectKBest(chi2, k=1)

X_x2 = x2.fit_transform(X,y)
# X_test_x2 = x2.transform(X_test)
print(x2.get_support(indices=True))
# for feat_num in x2.get_support(indices=True):
#     print(vectoriser.get_feature_names()[feat_num])

[2]


### Exercise 1.(b) 
Check the `coefs_` and `n_layers_` attributes of the fitted classifier to examine the resulting neural network.

In [3]:
print(clf.coefs_)
print('parameter shapes:',[p.shape for p in clf.coefs_])
print('num layers:', clf.n_layers_)

[array([[ 1.15939024e-11,  7.71780289e-02,  3.74601840e-02,
        -7.36068448e-02,  1.00509466e-01,  3.22748607e-02,
        -3.06676194e-02,  1.09432226e-01,  2.02062686e-01,
         3.91995196e-01, -1.99887338e-01,  7.58646478e-03,
        -1.89552265e-02, -2.30749332e-05, -1.26992499e-01,
         1.18487425e-01,  2.19482443e-01,  3.02801699e-01,
         6.08882107e-13,  1.99024769e-01,  1.54168027e-01,
         5.65867174e-12,  3.51405015e-01,  1.17037407e-01,
         2.18891514e-01, -4.19571060e-02, -1.20112405e-02,
        -1.85932650e-02,  1.37859665e-01,  2.50412347e-02,
        -1.31343677e-05,  1.14789274e-01, -3.46352061e-02,
        -1.09519534e-04, -3.93294434e-02,  2.64504672e-01,
         2.36059011e-01, -2.43128237e-06, -8.82597582e-10,
         2.08477396e-01,  2.66508825e-01, -1.77133106e-02,
         1.38526169e-01,  4.22180667e-01, -1.46373930e-02,
         1.77158349e-01,  1.71178592e-01,  1.03729732e-01,
         9.30197021e-02, -3.16010897e-13,  2.95790822e-

### Exercise 2.
One important issue with this Multilayer Perceptron is that it is sensitive to the scale of the input attribute values.
### Exercise 2.(a) 
Read up on the `StandardScaler` , and re-scale the `Iris` data so that each attribute has a *mean* of 0 and a *variance* of 1. Evaluate and examine the resulting neural network built on the re-scaled data.


In [4]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
clf = MLPClassifier(max_iter=2000)
#it is cheating because the mean and variance are estimated using both training and test data
print('corss-val cheating standardised features acc:', np.mean(cross_val_score(clf, scaler.fit_transform(X), y, cv=5))) 


corss-val cheating standardised features acc: 0.9733333333333334


### Exercise 2.(c) 
(Harder) Calculating the _mean_ and _variance_ on the entire data set (before splitting into train/test sets) is cheating slightly. Write a re-scale function that calculates the scaling factors for the training data, and applies the scaler to the test data. Then, write a wrapper function that uses this to cross-validate.

In [5]:
clf = MLPClassifier(max_iter=2000)
#this way we don't cheat read more on pipelines https://scikit-learn.org/stable/modules/compose.html
pipeline = Pipeline([('transformer', scaler), ('estimator', clf)])
print('corss-val noncheating standardised features acc:', np.mean(cross_val_score(pipeline, X, y, cv=5)))


corss-val noncheating standardised features acc: 0.9733333333333334


*You might not see reduction in performance for the noncheating method, but in general it is best to standardise only the training data (fit_transform), and then apply the transformation to the test data (transform).*

*Also you didn't see improvements with standardisation, which might be the result of the neural network not being tuned well in terms of regularisation, and number/size of the layers.*

### Exercise 3 
You can coerce the Multilayer Perceptron to have specifically–sized hidden layers using the *hidden_layer_sizes* parameter.
### Exercise 3.(a) 
Train a Multilayer Perceptron on the two-class `Abalone` data, and examine the resulting neural
network.


In [8]:
def convert_class(raw, num_class=2): #convert classes to binary or multinomial
    raw = int(raw)
    if num_class == 2:
        if raw<=10: return 0
        else: return 1
    elif num_class == 3:
        if raw <= 8:
            return 0
        elif 9<=raw<=10:
            return 1
        elif 11<=raw:
            return 2
    elif num_class == 29:
        return raw

def load_abalone(addsex=False, num_class=2):
    X, y = [], []
    with open('abalone.data', 'r') as fin:
        for line in fin:
            atts = line[:-1].split(",")
            if not addsex:
                X.append(atts[1:-1])
            else:
                sex = atts[0]
                if sex == "M": sex = 0
                elif sex=="I": sex = 1
                elif sex=="F": sex = 2
                else: sex = 3
                
                X.append([sex] + atts[1:-1])
            y.append(convert_class(atts[-1], num_class))
    X = np.array(X, dtype=float)
    return X, y

X, y = load_abalone(addsex=False, num_class=2)
print('X:', X.shape, 'y:', set(y))

clf = MLPClassifier(max_iter=2000)
clf.fit(X,y)
print(clf.coefs_)

X: (4177, 7) y: {0, 1}
[array([[-1.31531103e-01, -1.32771636e-01, -1.02266411e-01,
         2.42490711e-01,  7.20822175e-02, -1.35244147e-52,
         8.22294931e-03, -8.60203095e-02,  7.94626538e-02,
         1.14050527e-47, -1.34975356e-01, -8.99679142e-02,
         1.20059377e-01, -6.87712488e-02, -1.04573503e-78,
         1.89503383e-01, -1.03544470e-77, -4.25461736e-74,
         6.97353684e-02, -3.07459947e-02,  1.49052333e-52,
         8.55666614e-02, -1.08541490e-01, -1.39666397e-44,
        -1.95738294e-59,  5.57488029e-76,  1.10524837e-01,
        -6.67819808e-02, -2.70948010e-02,  6.48838814e-02,
        -1.56177216e-55,  2.25389531e-01,  1.18196754e-01,
         2.24082255e-01,  1.78615151e-01,  6.38794858e-02,
        -6.51296457e-02,  2.28653142e-01, -4.04783595e-02,
         1.92348927e-01, -1.28817640e-01, -6.10068610e-02,
        -4.20590355e-02, -5.73435035e-67, -3.24413673e-53,
        -1.59615091e-01,  5.03988384e-02,  1.60344366e-01,
         1.64717957e-01,  3.5855

### Exercise 3.(b) 
(Harder) Change the size and/or number of hidden layers. How are the resulting weights affected? Can you discern any relationship between the weights for layers of varying sizes?

In [9]:
clf = MLPClassifier(hidden_layer_sizes=[10, 10, 4], max_iter=2000)
#this way we don't cheat read more on pipelines https://scikit-learn.org/stable/modules/compose.html
clf.fit(X, y)
print(clf.coefs_)

[array([[-0.07747804, -0.05470933, -0.22873315,  0.14042917,  0.27367439,
        -0.16751675, -0.38023649,  0.62046604, -0.14810798, -0.25364371],
       [-0.37936501,  0.50563958, -0.23247282, -0.38008997, -0.15552607,
         0.43910963, -0.0588767 ,  0.335598  ,  0.63546494, -0.04346653],
       [-0.05153807, -0.25125844,  0.49841425, -0.93351084,  0.24654007,
         0.38504697, -0.29706621, -0.57498998, -0.58760191,  0.05332798],
       [-0.19079326,  0.35819438,  0.70520953,  0.15816268,  0.70457358,
         0.64342869, -0.55295313, -0.30254355, -0.10952335,  0.32856662],
       [ 0.41309896, -0.34167007,  1.14939554,  1.05403077, -0.5182208 ,
        -0.78080423, -0.12755564,  0.6335394 ,  0.72025471, -0.35737822],
       [ 0.34117412,  0.50256056, -0.13763266, -0.11700591, -0.06680289,
        -0.5814316 , -0.36058434,  0.36088966,  0.03337964,  0.35324359],
       [-0.640471  ,  1.08753606, -0.67645879, -0.17344297,  0.02323189,
         0.71663396,  0.52716417, -0.7273742

### Exercise 4. 
There are a couple of different tune-able parameters for the MLPClassifier , mostly dealing with the weight optimisation — however, it is often worthwhile to tune the Regularisation parameter (α).
### Exercise 4.(a) 
Try varying orders of α between 10 and 10 −5 for a Multilayer Perceptron built on the two-class `Abalone` data. How much variance in cross-validation accuracy do you observe?


In [10]:
alphas = [np.power(10.0, i) for i in range(-7, 2)] # alpha is the learning rate
print(alphas)

for alpha in alphas:
    clf = MLPClassifier(max_iter=2000, alpha=alpha)
    pipeline = Pipeline([('transformer', scaler), ('estimator', clf)])
    scores = cross_val_score(pipeline, X, y, cv=5)
    print('alpha: {} mean_acc: {} standard_dev_acc: {}'.format(alpha, np.mean(scores), np.std(scores)))

[1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0]
alpha: 1e-07 mean_acc: 0.7866879637853479 standard_dev_acc: 0.011786261177182398
alpha: 1e-06 mean_acc: 0.7859708334527118 standard_dev_acc: 0.014829799840827232
alpha: 1e-05 mean_acc: 0.789322121307624 standard_dev_acc: 0.01272368051453238
alpha: 0.0001 mean_acc: 0.7914766638970862 standard_dev_acc: 0.01631058225567131
alpha: 0.001 mean_acc: 0.7862103544107957 standard_dev_acc: 0.012525224560086302
alpha: 0.01 mean_acc: 0.7859716929776811 standard_dev_acc: 0.012914006996107273
alpha: 0.1 mean_acc: 0.7854926510615133 standard_dev_acc: 0.012532362559726508
alpha: 1.0 mean_acc: 0.7783067358106753 standard_dev_acc: 0.023521912445921667
alpha: 10.0 mean_acc: 0.7407260120906513 standard_dev_acc: 0.03022452770631513


### Exercise 4.(b) 
Read up on the `GridSearchCV` utility, to help you in tuning the performance of the *Multilayer Perceptron*. Split the data into a training–and–tuning partition, and a test partition. What is the value of the regularisation parameter that `GridSearchCV` comes up with? How does the test accuracy compare to the default (un-tuned) `MLPClassifier` ? 

*>>Please note that running this part can take some time. Be patient! ;)*

In [12]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

X_train, X_devtest, y_train, y_devtest = train_test_split(X, y, test_size=0.4, random_state=42)
X_dev, X_test, y_dev, y_test = train_test_split(X_devtest, y_devtest, test_size=0.5, random_state=42)

clf.fit(X_train, y_train)
print('MLP acc without tuning:', clf.score(X_test, y_test))

hidden_sizes = [[100], [10, 10]]
#arguments of MLPClassifier and a list of values for them to search and find the best.
param_grid = {'alpha': alphas, 'hidden_layer_sizes':hidden_sizes}


gs = GridSearchCV(estimator=clf,
                  param_grid=param_grid,
                  scoring='accuracy',
                  cv=3,
                  n_jobs=2, # 
                  verbose=1) #
gs.fit(X_train, y_train)

best_params = gs.best_params_
print('best_params', best_params)
clf = MLPClassifier(max_iter=2000, **best_params) # ** => double power, to the power of the best parameter => kwargs
clf.fit(X_train, y_train)
print('acc with best params:', clf.score(X_test, y_test))


MLP acc without tuning: 0.7739234449760766
Fitting 3 folds for each of 18 candidates, totalling 54 fits


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   24.2s
[Parallel(n_jobs=2)]: Done  54 out of  54 | elapsed:   26.4s finished


best_params {'alpha': 0.0001, 'hidden_layer_sizes': [10, 10]}
acc with best params: 0.7727272727272727
