# HW 11.

* Running the models may take minutes. This HW takes ~30 min to complete in computational time, so make sure you don't start it 1 hour before it is due.

* Tasks 2-4. should be done using the `sklearn` library, the last is a pure TensorFlow ([Keras is part of TensorFlow](https://github.com/keras-team/keras/releases#:~:text=since%20this%20release-,Keras%202.2.,well%20as%20Theano%20and%20CNTK)) example.

  * Use tf.keras instead of the standalone keras package

* The example notebook was run in Google COLAB without any package installation. I advise you to use Google COLAB with a GPU instance for the last task.

* Where not asked otherwise, use the default settings for the model.

* You may try running the models using more CPU cores to speed the training (sklearn supports for most of the models with a parameter, usually n_jobs).

In [None]:
#importing the libraries

import numpy as np
import matplotlib.pyplot as plt

from tensorflow import keras
import tensorflow

import random

from sklearn.metrics import confusion_matrix, auc as auc_score, roc_curve, accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelBinarizer
from sklearn.linear_model import SGDClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

import seaborn as sns

from sklearn.metrics import accuracy_score

### 1. Load the CIFAR 10 dataset from the `tf.keras.datasets` API and train a `LogisticRegression` model on the dataset and predict all test outcomes with the `sklearn` API

* Create an image grid visualization of randomly selected images (9, 16) with labels.
* Preprocess the dataset for `sklearn`, scale the pixels [0-1], and also flatten each example to a vector.
* Use the `multi_class='multinomial'` option, describe what it means.
* Plot the ROC curves and AUC scores on the same figure for each class.
* Calculate the accuracy of the classifier on the test set.


Hint:

* `from sklearn.preprocessing import LabelBinarizer` might be useful for you.

In [None]:
#loading the dataset

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

#printing the shapes 

print('The shape of the train dataset: ', np.shape(x_train))
print('The shape of the test dataset: ', np.shape(x_test))
print()

#creating a plot of the first 10 pictures

fig, axs = plt.subplots(2,5, figsize=(15, 6), facecolor='w', edgecolor='k')
fig.subplots_adjust(hspace = .5, wspace=.001)

axs = axs.ravel()

for i in range(10):

    axs[i].imshow(x_train[i])
    axs[i].set_title('The label: ' + str(y_train[i]))
    axs[i].axis('off')
    
plt.show()

After checking the labels online, we can plot the pictures with the actual labels (not numbers):

In [None]:
#putting the actual labels in a list

actual_labels = ['airplane', 'automobile', 'bird',	'cat', 'deer', 'dog',	'frog',	'horse', 'ship','truck']

#creating some neew lists with the actual labels

actual_label_list_train = []
actual_label_list_test = []

for i in range(0,len(y_train)):
  actual_label_list_train.append(actual_labels[y_train[i][0]])

for i in range(0,len(y_test)):
  actual_label_list_test.append(actual_labels[y_test[i][0]]) 

In [None]:
#generating random indices to plot

random_indices = []

for i in range(0,144):
  random_indices.append(random.randint(0,50000))

The shapes mean that there are 50000 training- and 10000 testing images, all in RGB. Let's preprocess the images!

In [None]:
#creating a plot of the first 10 pictures

fig, axs = plt.subplots(5,10, figsize=(35, 10), facecolor='w', edgecolor='k')
fig.subplots_adjust(hspace = .5, wspace=.001)

axs = axs.ravel()

for i in range(50):

    axs[i].imshow(x_train[random_indices[i]])
    axs[i].set_title('The label: ' + str(actual_label_list_train[random_indices[i]]))
    axs[i].axis('off')
    
plt.show()

In [None]:
#reshaping the dataset

x_train_scaled = x_train.reshape(np.shape(x_train)[0], np.shape(x_train)[1]*np.shape(x_train)[2]*3)/255
x_test_scaled = x_test.reshape(np.shape(x_test)[0], np.shape(x_test)[1]*np.shape(x_test)[2]*3)/255

#transforming to categorical values

y_train_oh = keras.utils.to_categorical(y_train)
y_test_oh = keras.utils.to_categorical(y_test)

#let's check out the new shapes!

print('The shape of the train dataset: ', np.shape(x_train_scaled))
print('The shape of the test dataset: ', np.shape(x_test_scaled))

In [None]:
#calling the logistic regression function

clf = LogisticRegression(multi_class='multinomial', random_state=0, max_iter=800, n_jobs=-1).fit(x_train_scaled, y_train.T[0])

#predicting the data

predicted_test = clf.predict(x_test_scaled)
confusion_matrix_values = confusion_matrix(y_test.T[0], predicted_test)

In [None]:
#plotting the confusion matrix

plt.subplots(figsize=(10,7))

p1 = sns.heatmap(confusion_matrix_values, xticklabels=actual_labels, yticklabels=actual_labels, annot=True, fmt="d", cmap='Blues')

plt.ylabel('Real values')
plt.xlabel('Predicted values')
plt.title('Heatmap of the real and predicted values', fontsize=15)
plt.show()

As it's seen, the dogs and cats are quite similar, according to the calssifier. The ships and airplanes are also quite indistinguishable, just like the truck and automobiles. These are not suprising results. This might cause the acccuracy to be unsufficiently low. Let's check it out!

In [None]:
#calculating the accuracy

acc = accuracy_score(y_test, predicted_test)

print('The accuracy of the model is:', round(acc,4))

As it's seen, one logistic regression model doesn't result in very high accuracy. Let's try another apporoach, which is fitting logistic regressions to all the different labels and then check the accuracy! 

In [None]:
#creating the one vs rest classifier model

clf2 = OneVsRestClassifier(LogisticRegression(multi_class='multinomial', random_state=0, max_iter=200, n_jobs=-1))

y_score = clf2.fit(x_train_scaled, y_train_oh).decision_function(x_test_scaled)

In [None]:
# Compute ROC curve and ROC area for each class

fpr = dict()
tpr = dict()
roc_auc = []
for i in range(len(y_score[0])):
    fpr[i], tpr[i], _ = roc_curve(y_test_oh[:, i], y_score[:, i])
for i in range(len(y_score[0])):
    roc_auc.append(round(auc_score(fpr[i], tpr[i]),3))

In [None]:
#plotting the results:

colours = ['orange', 'gold', 'firebrick', 'lightblue', 'lightgreen', 'pink', 'brown', 'forestgreen', 'gray', 'purple']
plt.figure(figsize=(10,5))

for i in range(len(y_score[0])):

    plt.plot(fpr[i], tpr[i], color = colours[i], label='' + f': {actual_labels[i]}' + ' AUC' + f': {roc_auc[i]}')
    
plt.plot([0, 1], [0, 1], '--', c='k')
plt.xlabel('False Positive Rate', fontsize = 12)
plt.ylabel('True Positive Rate', fontsize = 12)
plt.grid()
plt.legend(fontsize=12)
plt.show()

### 2. Train an `SGDClassifier` regression model on the dataset and predict all the test outcomes with the `sklearn` API. 

* Select an appropiate loss for this task, explain what this means.
* Time is precious, run the classifier paralell on many jobs.
* Plot the ROC curves and AUC scores on the same figure for the test set.
* Calculate the accuracy of the classifier.
* Describe the above model with your own words, how is it different than the logistic regression model?

Modified huber loss funciton is mainly used in robust regressions. It's less sensitive to outliers than the squared error loss. Even though this is a classification model, this loss function proved to be quite effective when solving this problem. 

In [None]:
#creating the SGD model

sgd = SGDClassifier(loss="modified_huber", max_iter=800, n_jobs=-1)

sgd.fit(x_train_scaled, y_train.T[0])

#predicting the data

predicted_test_sgd = sgd.predict(x_test_scaled)
confusion_matrix_values_sgd = confusion_matrix(y_test.T[0], predicted_test_sgd)

In [None]:
#plotting the confusion matrix

plt.subplots(figsize=(10,7))

p1 = sns.heatmap(confusion_matrix_values_sgd, xticklabels=actual_labels, yticklabels=actual_labels, annot=True, fmt="d", cmap='Blues')

plt.ylabel('Real values')
plt.xlabel('Predicted values')
plt.title('Heatmap of the real and predicted values', fontsize=15)
plt.show()

In [None]:
#calculating the accuracy

acc_sgd = accuracy_score(y_test, predicted_test_sgd)

print('The accuracy of the model is:', round(acc_sgd,4))

In [None]:
#creating the one vs rest classifier model

sgd2 = OneVsRestClassifier(SGDClassifier(loss="modified_huber", max_iter=800, n_jobs=-1))

y_score_sgd = sgd2.fit(x_train_scaled, y_train_oh).decision_function(x_test_scaled)

In [None]:
# Compute ROC curve and ROC area for each class

fpr_sgd = dict()
tpr_sgd = dict()
roc_auc_sgd = []
for i in range(len(y_score_sgd[0])):
    fpr_sgd[i], tpr_sgd[i], _ = roc_curve(y_test_oh[:, i], y_score_sgd[:, i])
for i in range(len(y_score_sgd[0])):
    roc_auc_sgd.append(round(auc_score(fpr_sgd[i], tpr_sgd[i]),3))

In [None]:
#plotting the results:

colours = ['orange', 'gold', 'firebrick', 'lightblue', 'lightgreen', 'pink', 'brown', 'forestgreen', 'gray', 'purple']
plt.figure(figsize=(10,5))

for i in range(len(y_score_sgd[0])):

    plt.plot(fpr_sgd[i], tpr_sgd[i], color = colours[i], label='' + f': {actual_labels[i]}' + ' AUC' + f': {roc_auc_sgd[i]}')
    
plt.plot([0, 1], [0, 1], '--', c='k')
plt.xlabel('False Positive Rate', fontsize = 12)
plt.ylabel('True Positive Rate', fontsize = 12)
plt.grid()
plt.legend(fontsize=12)
plt.show()

The SGD classifier uses Stochastic Gradient Descent as a solver. As far as I know, the SGD method is an optimization method, while Logistic Regression is a machine learning algorithm. SGDClassifier is a linear classifier using SGD optimization. 

### 3. Train a RandomForest classifier

* Plot the ROC curve with AUC scores on the test set.
* Calculate accuracy of the classifier on the test set.

In [None]:
#creating the model

rf = RandomForestClassifier(random_state = 42)
rf.fit(x_train_scaled, y_train.T[0])

#predicting the data

predicted_test_rf = rf.predict(x_test_scaled)
confusion_matrix_values_rf = confusion_matrix(y_test.T[0], predicted_test_rf)

In [None]:
#plotting the confusion matrix

plt.subplots(figsize=(10,7))

p1 = sns.heatmap(confusion_matrix_values_rf, xticklabels=actual_labels, yticklabels=actual_labels, annot=True, fmt="d", cmap='Blues')

plt.ylabel('Real values')
plt.xlabel('Predicted values')
plt.title('Heatmap of the real and predicted values', fontsize=15)
plt.show()

In [None]:
#calculating the accuracy

acc_rf = accuracy_score(y_test, predicted_test_rf)

print('The accuracy of the model is:', round(acc_rf,4))

In [None]:
#creating the one vs rest classifier model

rf2 = OneVsRestClassifier(RandomForestClassifier(random_state = 0, n_jobs=-1))

rf2.fit(x_train_scaled, y_train_oh)

y_score_rf2 = rf2.predict_proba(x_test_scaled)

In [None]:
# Compute ROC curve and ROC area for each class

fpr_rf = dict()
tpr_rf = dict()
roc_auc_rf = []
for i in range(len(y_score_rf2[0])):
    fpr_rf[i], tpr_rf[i], _ = roc_curve(y_test_oh[:, i], y_score_rf2[:, i])
for i in range(len(y_score_rf2[0])):
    roc_auc_rf.append(round(auc_score(fpr_rf[i], tpr_rf[i]),3))

In [None]:
#plotting the results:

colours = ['orange', 'gold', 'firebrick', 'lightblue', 'lightgreen', 'pink', 'brown', 'forestgreen', 'gray', 'purple']
plt.figure(figsize=(10,5))

for i in range(len(y_score_rf2[0])):

    plt.plot(fpr_rf[i], tpr_rf[i], color = colours[i], label='' + f': {actual_labels[i]}' + ' AUC' + f': {roc_auc_rf[i]}')
    
plt.plot([0, 1], [0, 1], '--', c='k')
plt.xlabel('False Positive Rate', fontsize = 12)
plt.ylabel('True Positive Rate', fontsize = 12)
plt.grid()
plt.legend(fontsize=12)
plt.show()

### 4. Train an multi layer perceptron classifier

* use the `MLPClassifier` from `sklearn`
* Set its parameter to `max_iter = 30` or if you have time, set it for at least `100`. After `30` iterations the model does not converge but gives reasonable predictions (with default parameters).
* Plot the ROC curves with AUC scores for the test set.
* Calculate the accuracy of the model on the test set.

In [None]:
#creating the model and fitting the data

mlpc = MLPClassifier(random_state=1, max_iter=30).fit(x_train_scaled, y_train.T[0])

#predicting the results

y_test_predicted_mlpc = mlpc.predict(x_test_scaled)
confusion_matrix_values_mlpc = confusion_matrix(y_test.T[0], y_test_predicted_mlpc)

In [None]:
#plotting the confusion matrix

plt.subplots(figsize=(10,7))

p1 = sns.heatmap(confusion_matrix_values_mlpc, xticklabels=actual_labels, yticklabels=actual_labels, annot=True, fmt="d", cmap='Blues')

plt.ylabel('Real values')
plt.xlabel('Predicted values')
plt.title('Heatmap of the real and predicted values', fontsize=15)
plt.show()

In [None]:
#calculating the accuracy

acc_mlpc = accuracy_score(y_test, y_test_predicted_mlpc)

print('The accuracy of the model:', round(acc_mlpc,4))

In [None]:
#creating the one vs rest classifier model

mlpc2 = OneVsRestClassifier(MLPClassifier(random_state=1, max_iter=30))

mlpc2.fit(x_train_scaled, y_train_oh)

y_score_mlpc = mlpc2.predict_proba(x_test_scaled)

In [None]:
# Compute ROC curve and ROC area for each class

fpr_mlpc = dict()
tpr_mlpc = dict()
roc_auc_mlpc = []
for i in range(len(y_score_mlpc[0])):
    fpr_mlpc[i], tpr_mlpc[i], _ = roc_curve(y_test_oh[:, i], y_score_mlpc[:, i])
for i in range(len(y_score_mlpc[0])):
    roc_auc_mlpc.append(round(auc_score(fpr_mlpc[i], tpr_mlpc[i]),3))

In [None]:
#plotting the results:

colours = ['orange', 'gold', 'firebrick', 'lightblue', 'lightgreen', 'pink', 'brown', 'forestgreen', 'gray', 'purple']
plt.figure(figsize=(10,5))

for i in range(len(y_score_mlpc[0])):

    plt.plot(fpr_mlpc[i], tpr_mlpc[i], color = colours[i], label='' + f': {actual_labels[i]}' + ' AUC' + f': {roc_auc_mlpc[i]}')
    
plt.plot([0, 1], [0, 1], '--', c='k')
plt.xlabel('False Positive Rate', fontsize = 12)
plt.ylabel('True Positive Rate', fontsize = 12)
plt.grid()
plt.legend(fontsize=12)
plt.show()

### 5. Train a ResNet50 CNN model on the dataset, utilize ImageNet pre-trained weights and fine-tune for at least 3 epochs:

* training for 3 epochs should be enough to prove that this model is superior compared to others, train longer to explore the possibilities of the model

Convert the dataset:

```python
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(32)

test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(32)
```

Hints:

* loading a pretrained model and letting its parameters be tunable

```python
backbone = tf.keras.applications.YOUR_MODEL_OF_CHOICE # set include_top = False to get rid of the dense layers
backbone.trainable = True # set if you want to fine-tune the pretrained weights too, otherwise set to False
```

* defining your custom model with the pretrained backbone

```python
# YOUR_MODEL_OF_CHOICE here is ResNet50 as per the task description.

# Functional TensorFlow API
def my_own_model():
  inp = tf.keras.layers.Input(shape=(32, 32, 3))
  x = tf.keras.applications.YOUR_MODEL_OF_CHOISE.preprocess_input(inp)

  x = backbone(x)
  # Here comes some more layers
  # and flattening where needed!
  out = # layer outputting the specified number of classes
        # with or without a softmax activation, later on
        # the choice of the loss depends on this
  model = tf.keras.models.Model(inputs=inp, outputs=out)
  return model
```

In [None]:
#creating the preprocessing function for the images

def imagenet_convert(img):
    img  = img.astype(float)[...,::-1] # RGB --> BGR
    img -= [103.939, 116.779, 123.68]
    return img

In [None]:
#loading the pre-trained model

model = tensorflow.keras.applications.ResNet50(weights = 'imagenet')

In [None]:
#loading the cifar10 dataset 

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

#converting the images

x_train_imagenet = np.array([imagenet_convert(x) for x in x_train])
x_test_imagenet = np.array([imagenet_convert(x) for x in x_test])

y_train_oh = keras.utils.to_categorical(y_train.T[0])
y_test_oh = keras.utils.to_categorical(y_test.T[0])

In [None]:
#we shall pop the last layer, since we need a 10 long dense layer in the cifar10 dataset

model._layers.pop()
inputs = model.input
output = model.layers[-1].output
output = keras.layers.Dense(10, activation = 'softmax')(output)
model = keras.models.Model(inputs, output)

#we need to freeze the layers, except for the last one

for i in model.layers[:-1]:
    i.trainable = False

#let's compile the model

model.compile(optimizer = keras.optimizers.Adam(lr = 1e-4),loss = 'categorical_crossentropy',metrics = ['accuracy'])

In [None]:
#let's check out the parameters!

model.summary()

As it's seen, we froze all the layers, except for the last one. This means that all those parameters are non-trainable and only 20.600 are trainable. Let's fit the model and check out the results!

In [None]:
#training the model

history = model.fit(x_train_imagenet, y_train_oh, batch_size = 32, epochs = 3, validation_data = (x_test_imagenet, y_test_oh), shuffle = True)

As it's seen, the validation accuracy is around 60-77% (depends on the run). This is almost a 10-20% increase from even the best machine learning technique. Not only is this a significant improvement in accuarcy, but also in computing time. The machine learning techniques took about 10-60 minutes, while the neural network required only approximately 3 minutes. This means that using neural networks for this problem proved to be much more efficient. 