#Preprocessing CK+ dataset
In questo notebook generiamo il dataset di addestramento delle reti al riconoscimento delle emozioni facciali.

Il dataset che viene creato verra' aumentato utilizzando delle rotazioni nelle immagini e successivamente bilanciato basandoci sulle classi.

Creiamo un dataset composto da 4 file di estensione **npy** dove:
* **train_dataset** contiene poco piu' di 1000 immagini per il **_train_**,
* **train_labels** contiene le **labels** per il train gia' multilabellizzate (ogni riga e' fatta cosi' -> [0,0,0,1,0,0,0,0] con 1 rappresentante la classe di appartenenza),
* **test_dataset** contiene poco piu' di 200 immagini per il **_test_**,
* **train_labels** come sopra.

##Import e definizioni
In questa sezione vengono effettuate le import delle librerie e la definizione di **costanti, classi e funzioni** utili.

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [2]:
import os
import matplotlib.pyplot as plt 
import numpy as np 
import cv2
from time import time
import random
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer

In [11]:
%cd /content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/Cohn-Kanade\ Database

/content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/Cohn-Kanade Database


In [6]:
path_emotion = "Emotion"
path_images = "cohn-kanade-images"
end_emotions = "_emotion.txt"

###Definizione di DataAugumentator
Classe contenente metodi per il data augumentation, vengono effettuate solo rotazioni nella creazione del dataset.

In [7]:
class DataUgumentator:
  def __init__(self):
    pass

  def horizontal_flip(self, img, flag = True):
    if flag:
        return cv2.flip(img, 1)
    else:
        return img
  
  def vertical_flip(self, img, flag = True):
    if flag:
        return cv2.flip(img, 0)
    else:
        return img

  def horizontal_shift(self, img, ratio=0.0):
    if ratio > 1 or ratio < 0:
        print('Value should be less than 1 and greater than 0')
        return img
    ratio = random.uniform(-ratio, ratio)
    h, w = img.shape[:2]
    to_shift = w*ratio
    if ratio > 0:
        img = img[:, :int(w-to_shift), :]
    if ratio < 0:
        img = img[:, int(-1*to_shift):, :]
    img = fill(img, h, w)
    return img

  def vertical_shift(self, img, ratio=0.0):
    if ratio > 1 or ratio < 0:
        print('Value should be less than 1 and greater than 0')
        return img
    ratio = random.uniform(-ratio, ratio)
    h, w = img.shape[:2]
    to_shift = h*ratio
    if ratio > 0:
        img = img[:int(h-to_shift), :, :]
    if ratio < 0:
        img = img[int(-1*to_shift):, :, :]
    img = fill(img, h, w)
    return img

  def zoom(self, img, value):
    if value > 1 or value < 0:
        print('Value for zoom should be less than 1 and greater than 0')
        return img
    value = random.uniform(value, 1)
    h, w = img.shape[:2]
    h_taken = int(value*h)
    w_taken = int(value*w)
    h_start = random.randint(0, h-h_taken)
    w_start = random.randint(0, w-w_taken)
    img = img[h_start:h_start+h_taken, w_start:w_start+w_taken, :]
    img = fill(img, h, w)
    return img
  
  def rotation(self, img, angle):
    # angle in gradi
    angle = int(random.uniform(-angle, angle))
    h, w = img.shape[:2]
    M = cv2.getRotationMatrix2D((int(w/2), int(h/2)), angle, 1)
    img = cv2.warpAffine(img, M, (w, h))
    return img

def timed(func):
  def wrapper(*args, **kwds):
    start = time()
    val = func(*args, **kwds)
    print("Elapsed time: ", time() - start, "s")
    return val
  return wrapper

###Definizione della funzione preprocess
effettua il preprocessing del dataset. 

In [8]:
@timed
def preprocess(img_dim, list_emotion, rgb=False, neutral=False):
  
  dataset = {
    0 : [], # neutral
    1 : [], # anger
    2 : [], # contempt
    3 : [], # disgust
    4 : [], # fear
    5 : [], # happy
    6 : [], # sadness
    7 : [], # surprise
  }

  for person in list_emotion:
    list_person_emotions = os.listdir(path_emotion + "/" + person)
    for emotion in list_person_emotions:
      file = os.listdir(path_emotion + "/" + person + "/" + emotion)
      if len(file) == 0:
        continue
      else:
        with open(path_emotion + "/" + person + "/" + emotion + "/" + file[0], 'r') as f:
          label = int(f.read().strip().split(".")[0])
          list_images = os.listdir(path_images + "/" + person + "/" + emotion )
          if len(list_images) < 4 :
            print("For path " + path_images + "/" + person + "/" + emotion + " whe have less than 4 images")
            print("list imgs", list_images)
          else:
            if neutral:
              neutral_img = list_images[0]
              try:
                imn = cv2.imread(path_images + "/" + person + "/" + emotion + "/" + neutral_img)
                if not rgb:
                  imn = cv2.cvtColor(imn, cv2.COLOR_RGB2GRAY)
                imn = cv2.resize(imn, (img_dim,img_dim))
                dataset[0].append(imn)
              except Exception as e:
                print(e)
                print("for path: " + path_images + "/" + person + "/" + emotion)
                print("img neutral:", neutral_img)
            for img in list_images[-3:]:
              try:
                im = cv2.imread(path_images + "/" + person + "/" + emotion + "/" + img)
                if not rgb:
                  im = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY)
                im = cv2.resize(im, (img_dim,img_dim))
                im30 = augument.rotation(im, 30)
                im_30 = augument.rotation(im, -30)
                dataset[label].append(im)
                dataset[label].append(im30)
                dataset[label].append(im_30)
              except Exception as e:
                print(e)
                print("for path: " + path_images + "/" + person + "/" + emotion)
                print("img list:", list_images)
                print("img:", img)
                if img.startswith("."):
                  print("\n\tDeleting:", img)
                  os.remove(path_images + "/" + person + "/" + emotion + "/" + img)
                  print("\tdone\n")

  for key in dataset.keys():
    dataset[key] = np.array(dataset[key])

  return dataset

## Preprocessing

In [23]:
list_emotions = os.listdir(path_emotion)
list_person_train, list_person_test = train_test_split(list_emotions, shuffle = True, test_size = 0.05)
augument = DataUgumentator()

In [24]:
labels_train = {}
labels_test = {}

dataset_train = preprocess(224, list_person_train, rgb=False, neutral=False)
dataset_test = preprocess(224, list_person_test, rgb=False, neutral=False)

Elapsed time:  9.476033449172974 s
Elapsed time:  0.6481406688690186 s


In [25]:
print("\nDataset di Train\n")
for key in dataset_train.keys():
  print(f"\tOccourrencies for label {key} :", dataset_train[key].shape[0])
  labels_train[key] = np.ones((dataset_train[key].shape[0], 1), dtype=int) * int(key)

print("\nDataset di Test\n")
for key in dataset_test.keys():
  print(f"\tOccourrencies for label {key} :", dataset_test[key].shape[0])
  labels_test[key] = np.ones((dataset_test[key].shape[0], 1), dtype=int) * int(key)


Dataset di Train

	Occourrencies for label 0 : 0
	Occourrencies for label 1 : 360
	Occourrencies for label 2 : 153
	Occourrencies for label 3 : 495
	Occourrencies for label 4 : 216
	Occourrencies for label 5 : 594
	Occourrencies for label 6 : 243
	Occourrencies for label 7 : 693

Dataset di Test

	Occourrencies for label 0 : 0
	Occourrencies for label 1 : 45
	Occourrencies for label 2 : 9
	Occourrencies for label 3 : 36
	Occourrencies for label 4 : 9
	Occourrencies for label 5 : 27
	Occourrencies for label 6 : 9
	Occourrencies for label 7 : 54


In [26]:
min_occ = np.min([d.shape[0] for d in dataset_train.values()][1:]) if len(dataset_train[0]) == 0 else np.min([d for d in dataset_train.values()])

for key in dataset_train.keys():
  dataset_train[key] = dataset_train[key][:min_occ]
  labels_train[key] = labels_train[key][:min_occ]

In [27]:
dataset_list = [d for d in dataset_train.values()][1:] if len(dataset_train[0]) == 0 else [d for d in dataset_train.values()]
labels_list = [d for d in labels_train.values()][1:] if len(dataset_train[0]) == 0 else [d for d in labels_train.values()]

dataset_tot = np.concatenate(dataset_list)
labels_tot = np.concatenate(labels_list)

In [28]:
min_occ_test = np.min([d.shape[0] for d in dataset_test.values()][1:]) if len(dataset_test[0]) == 0 else np.min([d for d in dataset_test.values()])

for key in dataset_test.keys():
  dataset_test[key] = dataset_test[key][:min_occ_test]
  labels_test[key] = labels_test[key][:min_occ_test]

In [29]:
dataset_test_list = [d for d in dataset_test.values()][1:] if len(dataset_test[0]) == 0 else [d for d in dataset_test.values()]
labels_test_list = [d for d in labels_test.values()][1:] if len(dataset_test[0]) == 0 else [d for d in labels_test.values()]

dataset_tot_test = np.concatenate(dataset_test_list)
labels_tot_test = np.concatenate(labels_test_list)

In [30]:
for i in range(8):
    print(f"n_occurences({i}) train = {np.count_nonzero(labels_tot == i)}" )
    print(f"n_occurences({i}) test = {np.count_nonzero(labels_tot_test == i)}" )
    print()

n_occurences(0) train = 0
n_occurences(0) test = 0

n_occurences(1) train = 153
n_occurences(1) test = 9

n_occurences(2) train = 153
n_occurences(2) test = 9

n_occurences(3) train = 153
n_occurences(3) test = 9

n_occurences(4) train = 153
n_occurences(4) test = 9

n_occurences(5) train = 153
n_occurences(5) test = 9

n_occurences(6) train = 153
n_occurences(6) test = 9

n_occurences(7) train = 153
n_occurences(7) test = 9



In [31]:
encoder = MultiLabelBinarizer()
train_labels = encoder.fit_transform(labels_tot)
test_labels = encoder.fit_transform(labels_tot_test)

In [32]:
train_labels.shape

(1071, 7)

###Salvataggio dei dataset

In [33]:
!rm -r /content/dataset
!mkdir /content/dataset

np.save("/content/dataset/train_dataset.npy", dataset_tot)
np.save("/content/dataset/train_labels.npy", train_labels)
np.save("/content/dataset/test_dataset.npy", dataset_tot_test)
np.save("/content/dataset/test_labels.npy", test_labels)

###Salvataggio su Google Drive
Occhio ai path (cambiare i nomi)

In [34]:
!cp -av /content/dataset /content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/
!mv /content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/dataset /content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/dataset_resnet

'/content/dataset' -> '/content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/dataset'
'/content/dataset/train_dataset.npy' -> '/content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/dataset/train_dataset.npy'
'/content/dataset/train_labels.npy' -> '/content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/dataset/train_labels.npy'
'/content/dataset/test_dataset.npy' -> '/content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/dataset/test_dataset.npy'
'/content/dataset/test_labels.npy' -> '/content/gdrive/MyDrive/BDAMLproject/Cohn-Kanade/dataset/test_labels.npy'
