## Laboratoire 2
**Equipe 1** 



Membres: Kevin Chenier, Jeremie Bellegarde, Sebastien René


## Introduction

 Pour ce laboratoire, nous devons choisir 8 classes pour créer un ensemble d'images qui pourront être reconnu grâce aux notions que nous avons apprisent en classe.
 Pour ce faire, nous utiliserons Jupyter Notebook afin de séparer clairement chacune des étapes que nous deverons accomplir.
 Nous allons faire par nous même un réseau de neuronne et par la suite, utiliser des modeles déjà existant afin de comparer nos résultats. Ceci nous permettera de tirer une conclusion sur les forces et faiblesses de différents modèles.
 

## Importation des librairies nécéssaires

Les librairies qui nous seront nécéssaire pour le traitement des images sont les suivantes :


In [26]:
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd
import cv2
import numpy as np
import imutils
import random
import os
import pickle
from tqdm.notebook import tqdm
from glob import glob
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn import preprocessing
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, mean_squared_error, f1_score
from sklearn.neural_network import MLPClassifier

## Augmentation des données

Voici le code utilisé pour l'augmentation des données. Ce code consiste à créer des nouvelles images à partir des images existantes en effectuant des opérations tel que, par exemple, la rotation. Le but de cette opération est d'avoir plus d'image pour notre echantillon.

In [None]:
def dataset_augmentation(path, nb_imgs):
    # Path to the image dataset
    p = Augmentor.Pipeline(str(path), output_directory=".")

    # Operations to be performed on the images:
    p.rotate90(probability=0.2)
    p.rotate270(probability=0.2)
    p.flip_left_right(probability=0.30)
    p.flip_top_bottom(probability=0.30)
    p.skew_tilt(probability=0.25, magnitude=0.1)
    p.random_distortion(probability=1, grid_width=2, grid_height=2, magnitude=4)

    # Specifyin the number of images to generate
    p.sample(nb_imgs)
    
    print({path}, ": augmentation done for", {nb_imgs}, "more images.")

## Choix du Dataset pour l'ensemble B

Voici notre choix d'images pour l'ensemble B. Cet ensemble est constitué de 8 symboles:

In [27]:
# Define the base data directory path
data_dir = Path.cwd() / "EnsembleB_H2020"

# A list of string with all the categories/labels in your database, i.e., each class subfolder name
CLASSES = [
    {
        "LABEL": "Cercle2", 
        "GROUP": 0,
        "PATH": os.path.join(data_dir, 'Cercles', 'Cercle2')
    },
    {
        "LABEL": "Cercle3", 
        "GROUP": 1,
        "PATH": os.path.join(data_dir, 'Cercles', 'Cercle3')
    },
    {
        "LABEL": "Diamant2",
        "GROUP": 2,
        "PATH": os.path.join(data_dir, 'Diamants', 'Diamant2')
    },
    {
        "LABEL":"Diamant3", 
        "GROUP": 3,
        "PATH": os.path.join(data_dir, 'Diamants', 'Diamant3')
    },
    {
        "LABEL":"Hexagone2", 
        "GROUP": 4,
        "PATH": os.path.join(data_dir, 'Hexagones', 'Hexagone2')
    },
    {
        "LABEL":"Hexagone3", 
        "GROUP": 5,
        "PATH": os.path.join(data_dir, 'Hexagones', 'Hexagone3')
    },
    {
        "LABEL":"Triangle2", 
        "GROUP": 6,
        "PATH": os.path.join(data_dir, 'Triangles', 'Triangle2')
    },
    {
        "LABEL":"Triangle3", 
        "GROUP": 7,
        "PATH": os.path.join(data_dir, 'Triangles', 'Triangle3')
    }
]

## Pretraitement des images

Nous avons commencé par redimmensionner les images, pour ensuite leur appliqué un filtre pour les rendre grises et pour les rendre flous. Les opérations qui nous ont permi d'avoir le plus d'amélioration selon nos test était l'application du filtre pour le flou et du filtre pour le gris. Nous en avous aussi profité pour redimensionner l'image tel que demandé dans l'ennoncé.

In [28]:
imageSize = (160, 160)

dataSet = []

# Read all the files and append to dataset
for CLASS in CLASSES:
    print(f"=> Reading files from class {CLASS['LABEL']}")
    for image in tqdm(glob(os.path.join(CLASS["PATH"], '*'))):
        # Read the image in grayscale
        gray = cv2.imread(image, cv2.IMREAD_GRAYSCALE)
        # Resize the image
        gray = cv2.resize(gray, imageSize)
        
        # Calculate contours
        blurred = cv2.GaussianBlur(gray, (5,5), 0)
        thresh = cv2.threshold(blurred, 150, 255, cv2.THRESH_BINARY)[1]
        contours, hierarchy = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
        
        polygons = []
        
        # Save the polygons
        for contour in contours:
            perimeter = cv2.arcLength(contour, True)
            approximation = cv2.approxPolyDP(contour, 0.02 * perimeter, True)
            polygons.append(len(approximation))
        
        feature = [0 for i in range(16)]
        
        # Create the feature vector from polygons
        for polygon in polygons:
            feature[polygon] += 1
        
        # Create a data variable to add to dataSet
        data = {
            'image': gray,
            'label': CLASS['GROUP'],
            'polygons': polygons,
            'feature': feature
        }
        
        # Append data to dataSet
        dataSet.append(data)   

=> Reading files from class Cercle2


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=184.0), HTML(value='')))


=> Reading files from class Cercle3


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=175.0), HTML(value='')))


=> Reading files from class Diamant2


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=130.0), HTML(value='')))


=> Reading files from class Diamant3


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=145.0), HTML(value='')))


=> Reading files from class Hexagone2


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=147.0), HTML(value='')))


=> Reading files from class Hexagone3


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=129.0), HTML(value='')))


=> Reading files from class Triangle2


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=164.0), HTML(value='')))


=> Reading files from class Triangle3


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=128.0), HTML(value='')))




In [4]:
# Shuffle pictures
random.shuffle(dataSet)

# create X and y from DataSet
X = np.array([data['feature'] for data in dataSet])
y = np.array([data['label'] for data in dataSet])

# Creating files containing all the information about your model and saving them
pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out.close()

pickle_out = open("y.pickle", "wb")
pickle.dump(y, pickle_out)
pickle_out.close()

## Machine learning training

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=84)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=84)

In [6]:
scaler = MinMaxScaler(feature_range=(0,1))
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
X_val = scaler.fit_transform(X_val)

In [7]:
X_train = X_train.flatten()
X_test = X_test.flatten()
X_val = X_val.flatten()

In [8]:
X_train = X_train.reshape(len(y_train), len(X[0]))
X_test = X_test.reshape(len(y_test),len(X[0]))
X_val = X_val.reshape(len(y_val), len(X[0]))

## Réseau Neuronal

Voici notre modèle pour le Réseau Neuronal. Nous nous sommes basé sur la théorie vue en classe pour le faire. Une fois le modèle en place, nous avons essayer avec des hyperparamètres différents tel qu'un learning rate ou un nombre d'itérations (epoch) plus ou moins élevé. Nous avons laissé comme données les hyperparamètre pour lesquels nous semblions avoir le plus de succès.

In [9]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_prime(z):
    o = sigmoid(z)
    return o*(1-o)

In [40]:
class NeuralNetwork:

    def __init__(self, n_classes, n_features, n_hidden_units=15, epochs=300,
                 learning_rate=0.01, n_batches=1):
        self.n_classes = n_classes
        self.n_features = n_features
        self.n_hidden_units = n_hidden_units
        self.n_batches = n_batches
        self.w1, self.w2 = self.weights()
        self.learning_rate = learning_rate
        self.epochs = epochs
        
    def error(self, y, output):
        error = -np.sum(y*np.log(output))
        return 0.5 * np.mean(error)

    def weights(self):
        w1 = np.random.rand(self.n_hidden_units, self.n_features)
        w2 = np.random.rand(self.n_classes, self.n_hidden_units)
        return w1, w2

    def forward_step(self, X):
        z1 = np.dot(self.w1,X.T)
        hidden_output = sigmoid(z1)
        z2 = np.dot(self.w2,hidden_output)
        final_output = sigmoid(z2)
        return z1, hidden_output, z2, final_output

    def backward_step(self, X, z1, hidden_output, final_output, y):
        output_error = final_output - y
        output_delta = self.w2.T.dot(output_error) * sigmoid_prime(z1)
        grad1 = output_delta.dot(X)
        grad2 = output_error.dot(hidden_output.T)
        return grad1, grad2

    def backprop_step(self, X, y):
        z1, hidden_output, z2, final_output = self.forward_step(X)
        y = y.T
        grad1, grad2 = self.backward_step(X, z1, hidden_output, final_output, y)
        error = self.error(y, final_output)
        return error, grad1, grad2

    def fit(self, X, y):
        self.error_ = []
        lb = preprocessing.LabelBinarizer()
        y = lb.fit_transform(y)

        X_batches = np.array_split(X, self.n_batches)
        y_batches = np.array_split(y, self.n_batches)

        for i in range(self.epochs):
            epoch_errors = []
            for Xi, yi in zip(X_batches, y_batches):
                error, grad1, grad2 = self.backprop_step(Xi, yi)
                epoch_errors.append(error)
                self.w1 -= (self.learning_rate * grad1)
                self.w2 -= (self.learning_rate * grad2)
            self.error_.append(np.mean(epoch_errors))
            print(i, np.mean(epoch_errors))
        return self

    def predict(self, X):
        z1, hidden_output, z2, final_output = self.forward(X)
        return np.argmax(z2.T, 1)

    def score(self, X, y):
        y_hat = self.predict(X)
        return np.sum(y == y_hat, axis=0) / float(X.shape[0])

In [None]:
nn = NeuralNetwork(
    n_classes=8, 
    n_features=len(X[0]),
    n_hidden_units=50,
    epochs=3000,
    learning_rate=0.01,
    n_batches=30,
).fit(X_train, y_train);

0 23.538040882964893
1 27.3239590186413
2 27.164086963279107
3 27.00770483419973
4 26.85164402106612
5 26.693186846938975
6 26.5300037421995
7 26.36013302314223
8 26.18199499638753
9 25.99443277123421
10 25.796770822940115
11 25.588879568641186
12 25.37123088325509
13 25.14492697118385
14 24.911685092336608
15 24.673764987289974
16 24.433835135250778
17 24.194786973824375
18 23.95951948384248
19 23.73072545989361
20 23.510711624947287
21 23.301276807016897
22 23.103658600393693
23 22.918544340615604
24 22.74613143932929
25 22.586217370514888
26 22.438300416624074
27 22.30167659742598
28 22.175523692281892
29 22.05896824531043
30 21.95113510429928
31 21.851181266889437
32 21.758316844780648
33 21.671816180250005
34 21.591021901470352
35 21.51534423906886
36 21.44425740946385
37 21.377294390782456
38 21.314041014473183
39 21.254129979753568
40 21.197235162593174
41 21.14306642314829
42 21.091365001097916
43 21.04189951398537
44 20.994462528423973
45 20.948867649195424
46 20.9049470602806

439 16.096107470860844
440 16.089515908474205
441 16.082945023258464
442 16.076394756215965
443 16.069865046647603
444 16.063355832196752
445 16.05686704889282
446 16.050398631194458
447 16.04395051203233
448 16.037522622851522
449 16.031114893653484
450 16.024727253037522
451 16.018359628241853
452 16.012011945184145
453 16.005684128501652
454 15.999376101590723
455 15.993087786645914
456 15.986819104698574
457 15.980569975654792
458 15.974340318332983
459 15.968130050500804
460 15.961939088911581
461 15.95576734934019
462 15.949614746618373
463 15.943481194669497
464 15.937366606542799
465 15.931270894447007
466 15.925193969783477
467 15.919135743178709
468 15.913096124516354
469 15.907075022968636
470 15.901072347027245
471 15.895088004533656
472 15.889121902708933
473 15.88317394818292
474 15.877244047022995
475 15.871332104762166
476 15.86543802642675
477 15.859561716563435
478 15.85370307926585
479 15.847862018200631
480 15.842038436632947
481 15.836232237451537
482 15.8304433231

856 14.247989647757779
857 14.244408352719457
858 14.240828480904332
859 14.237250026149615
860 14.233672982320222
861 14.230097343308541
862 14.226523103034209
863 14.222950255443875
864 14.219378794511018
865 14.215808714235745
866 14.212240008644605
867 14.208672671790405
868 14.205106697752074
869 14.201542080634473
870 14.1979788145683
871 14.194416893709915
872 14.190856312241237
873 14.187297064369641
874 14.183739144327847
875 14.180182546373821
876 14.176627264790717
877 14.173073293886794
878 14.169520627995349
879 14.165969261474675
880 14.162419188708013
881 14.158870404103533
882 14.155322902094314
883 14.151776677138304
884 14.148231723718341
885 14.144688036342178
886 14.141145609542441
887 14.137604437876734
888 14.134064515927596
889 14.130525838302598
890 14.126988399634376
891 14.123452194580691
892 14.119917217824524
893 14.116383464074117
894 14.11285092806309
895 14.109319604550535
896 14.105789488321108
897 14.102260574185172
898 14.098732856978886
899 14.0952063

1221 13.012500421231417
1222 13.009341402708515
1223 13.0061841248936
1224 13.00302859122553
1225 12.999874805100296
1226 12.996722769871168
1227 12.993572488848857
1228 12.990423965301677
1229 12.98727720245568
1230 12.984132203494868
1231 12.980988971561365
1232 12.977847509755609
1233 12.97470782113653
1234 12.971569908721774
1235 12.968433775487915
1236 12.96529942437062
1237 12.962166858264935
1238 12.959036080025449
1239 12.955907092466559
1240 12.952779898362682
1241 12.949654500448478
1242 12.946530901419104
1243 12.943409103930444
1244 12.940289110599348
1245 12.937170924003889
1246 12.93405454668357
1247 12.93093998113962
1248 12.927827229835183
1249 12.924716295195628
1250 12.921607179608738
1251 12.91849988542499
1252 12.915394414957795
1253 12.912290770483732
1254 12.909188954242815
1255 12.906088968438727
1256 12.902990815239052
1257 12.899894496775529
1258 12.8968000151443
1259 12.893707372406134
1260 12.89061657058666
1261 12.887527611676612
1262 12.88444049763206
1263 

1580 11.99484156571019
1581 11.992298327130724
1582 11.98975632395576
1583 11.98721555213234
1584 11.984676007605266
1585 11.982137686317168
1586 11.979600584208546
1587 11.977064697217834
1588 11.974530021281492
1589 11.971996552334028
1590 11.969464286308074
1591 11.966933219134441
1592 11.964403346742191
1593 11.961874665058671
1594 11.959347170009591
1595 11.956820857519066
1596 11.954295723509665
1597 11.951771763902496
1598 11.949248974617214
1599 11.946727351572122
1600 11.944206890684171
1601 11.941687587869067
1602 11.939169439041269
1603 11.936652440114068
1604 11.934136586999639
1605 11.93162187560906
1606 11.929108301852391
1607 11.92659586163871
1608 11.924084550876135
1609 11.92157436547191
1610 11.919065301332417
1611 11.916557354363228
1612 11.914050520469157
1613 11.911544795554274
1614 11.909040175521984
1615 11.906536656275039
1616 11.904034233715572
1617 11.901532903745172
1618 11.899032662264883
1619 11.89653350517525
1620 11.894035428376366
1621 11.891538427767905

1946 11.11337031220195
1947 11.111006983309235
1948 11.108643441619998
1949 11.106279684310937
1950 11.103915708578304
1951 11.101551511637984
1952 11.099187090725653
1953 11.096822443096867
1954 11.09445756602717
1955 11.092092456812235
1956 11.089727112767939
1957 11.087361531230492
1958 11.084995709556559
1959 11.08262964512333
1960 11.08026333532865
1961 11.07789677759111
1962 11.075529969350153
1963 11.073162908066168
1964 11.070795591220612
1965 11.068428016316053
1966 11.066060180876324
1967 11.063692082446584
1968 11.061323718593412
1969 11.058955086904895
1970 11.056586184990728
1971 11.05421701048229
1972 11.051847561032742
1973 11.049477834317079
1974 11.047107828032251
1975 11.044737539897225
1976 11.042366967653043
1977 11.039996109062944
1978 11.0376249619124
1979 11.035253524009203
1980 11.032881793183543
1981 11.030509767288063
1982 11.028137444197945
1983 11.025764821810961
1984 11.02339189804756
1985 11.021018670850896
1986 11.018645138186919
1987 11.01627129804446
19

2302 10.253200633133284
2303 10.250767822456732
2304 10.248335458835625
2305 10.245903549514233
2306 10.243472101752513
2307 10.241041122825603
2308 10.238610620023364
2309 10.236180600649885
2310 10.233751072022983
2311 10.231322041473726
2312 10.228893516345924
2313 10.226465503995646
2314 10.224038011790704
2315 10.221611047110146
2316 10.219184617343762
2317 10.216758729891561
2318 10.21433339216326
2319 10.211908611577766
2320 10.209484395562662
2321 10.207060751553664
2322 10.204637686994149
2323 10.202215209334565
2324 10.19979332603195
2325 10.197372044549356
2326 10.194951372355378
2327 10.192531316923567
2328 10.190111885731914
2329 10.187693086262303
2330 10.185274925999988
2331 10.182857412433043
2332 10.180440553051817
2333 10.178024355348384
2334 10.17560882681602
2335 10.173193974948635
2336 10.170779807240235
2337 10.168366331184384
2338 10.16595355427363
2339 10.16354148399898
2340 10.16113012784935
2341 10.158719493310995
2342 10.156309587866966
2343 10.15390041899658

In [None]:
print('Train Accuracy: %.2f%%' % (nn.score(X_train, y_train) * 100))

In [None]:
def plot_error(model):
    plt.plot(range(len(model.error_)), model.error_)
    plt.xlabel('Epochs')
    plt.ylabel('Errors')
    plt.show()
    
plot_error(nn)

In [None]:
y_train_prediction = nn.predict(X_train)
y_test_prediction = nn.predict(X_test)
y_prediction = nn.predict(X_val)

### Matrice de confusion ensemble "train"

In [None]:
print(confusion_matrix(y_train,y_train_prediction))
print(classification_report(y_train,y_train_prediction))

### Matrice de confusion ensemble "test"

In [None]:
print(confusion_matrix(y_test,y_test_prediction))
print(classification_report(y_test,y_test_prediction))

### Matrice de confusion ensemble "Val"

In [None]:
print(confusion_matrix(y_val,y_prediction))
print(classification_report(y_val,y_prediction))

## Support Vector Machine (SVM lineair)

In [None]:
svc = SVC(cache_size=1500)
parameter = {'kernel': ['linear'], 'C': [0.001, 0.1, 1, 10], 'class_weight': ['balanced'], 'gamma': ['scale']}
classifier = GridSearchCV(svc, param_grid = parameter, cv = 10, n_jobs = 5, scoring = 'accuracy', verbose=4)
classifier.fit(X_train, y_train)
print("LINEAR : The best hyperparameters are %s with a score of %0.2f" % (classifier.best_params_, classifier.best_score_))

Nous avons constaté qu'augmenter la grosseur de la cache ne changeait pas nos résultats, alors nous l'avons gardé à 1500.

In [None]:
SVMaccuracy = pd.DataFrame(classifier.cv_results_['mean_test_score'], index = [x['C'] for x  in classifier.cv_results_['params']], columns = ['SVM accuracy'])
print(SVMaccuracy)

## K-Nearest Neighbor (KNN)

In [None]:
# K-Nearest Neighbor 
def KNN_model (X_train, X_test, y_train, y_test, weights):
    
    KNN_accuracy_uniform = []
    KNN_f1_uniform = []
    
    KNNParams = [3,5,10]

    for neighbors in KNNParams:

        model = KNeighborsClassifier(n_neighbors = neighbors, weights = weights)
        KNNmodel = model.fit(X_train, y_train)
        y_prediction = KNNmodel.predict(X_test)
        
        accuracy = accuracy_score(y_test, y_prediction)
        f1score = f1_score(y_test, y_prediction, average = 'weighted') 

        KNN_accuracy_uniform.append(accuracy)
        KNN_f1_uniform.append(f1score)
        
        # F1 Score
        print("F1 score : KNN")
        print(f1_score(y_test, y_prediction, average = 'weighted'))
        
        # Accuracy Score
        print("Accuracy score : KNN avec k = " + str(neighbors) + " et poid = " + weights)
        print(KNNmodel.score(X_test, y_test))
        
        clf  = KNeighborsClassifier(n_neighbors = neighbors, weights = 'distance')
        
        ## KNN - Cross-validation
        print("Cross-validation")
        KNN_scores = cross_val_score(clf , X, y, cv = 10)
        print(sum(KNN_scores) / 10)
        print('\n')

In [None]:
KNN_model(X_train, X_test, y_train, y_test, 'uniform')
KNN_model(X_train, X_test, y_train, y_test, 'distance')

## Réseaux de neurones (RN)

In [None]:
mlp = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
mlp.fit(X, y)

In [None]:
y_test_prediction = mlp.predict(X_test)
y_train_prediction = mlp.predict(X_train)
y_prediction = mlp.predict(X_val)

### Matrice de confusion ensemble "train"

In [None]:
print(confusion_matrix(y_train,y_train_prediction))
print(classification_report(y_train,y_train_prediction))

### Matrice de confusion ensemble "test"

In [None]:
print(confusion_matrix(y_test,y_test_prediction))
print(classification_report(y_test,y_test_prediction))

## Conclusion et intreprétation des résultats

Pour conclure, nous avons réussi à faire un modèle d'apprentissage machine en fonction des notions vue en cours. Nous avons remarqué que , dépendament du models, les hyperparamètres peuvent  varié et avoir des effets differents. Cela nous démontre l'importance de choisir un bon modèle en fonction dude la situation qu'on veut résoudre, car un mauvais choix de modèle peut s'avérer catastrophique. Ce laboratoire nous a permis d'appliquer nos nouvelles connaissances vue en classe et de les approfondirs. Il nous a aussi permi d'apprendre l'importance d'un bon choix pour les modèles sélectionner en fonction de la situation et l'importance d'une bonne décision face aux hyperparamètres.