# Facemask detection dataset

The following code converts the nearly 11.8 images from this [kaggle dataset](https://www.kaggle.com/ashishjangra27/face-mask-12k-images-dataset) to a Hierarchical Data Format 5 file in order to be processed for the neural network modeled for this project. The code requires h5py, PIL and permissions form the OS library to function properly. This notebook should be run in the same folder as the images dataset. 



In [2]:
import h5py
import os 
import numpy as np
from PIL import Image

First, the images are normalized to a square 64x64px size: 

In [None]:
folders = ['./Test/WithMask/', './Test/WithoutMask⁄', './Train/WithMask/', './Train/WithoutMask/', './Validation/WithMask/', './Validation/WithoutMask/']

for carpet in folders:
    dirs = os.listdir(carpet)
    for item in dirs:
        path = os.path.join(carpet, item)
        temp_img = Image.open(path)
        temp_img = temp_img.resize((64,64))
        temp_img.save(path)


Después las imágenes fueron procesadas con el siguiente proceso:
Then the images are processed accordingly: 
+ Images are open and converted into a numpy array.
+ The array is added to a number flag which indicates the image class (1-mask, 0-nonmask). The array is then added to an array of arrays. 
+ This array of arrays is then converted into a numpy array and randomly shuffled. 
+ Two arrays are formed from this numpy array and then integrated as datasets into the final .h5 file. 
+ Este arreglo después es añadido a un arreglo con un flag que indica si es una imagen con máscara o no (1 para máscara, 0 para no-máscara). Este arreglo es añadido a un arreglo de arreglos.

The dataset used to train, test and validate the model were trimmed into 500, 100, 100 sets accordingly. This decision was due to technical restrictions. If you want to build a bigger dataset, just remove the if() -> instruction into each cell or change the target value. 
    
**train_data.h5** :

In [7]:
dir_name = './Train/WithMask/'
img_set = []
act_dir = os.listdir(dir_name)
train_i = 0
print('Files in dir ' + dir_name + ' : ' + str(len(act_dir)))
for items in act_dir:
    if(train_i == 500):
        break
    path = os.path.join(dir_name, items)
    temp_img = np.array(Image.open(path))
    img_set.append([temp_img, 1])
    train_i += 1

Archivos en la carpeta ./Train/WithMask/ : 5000


In [8]:
dir_name2 = './Train/WithoutMask/'
act_dir2 = os.listdir(dir_name2)
train_i2 = 0
print('Files in dir  ' + dir_name2 + ' : ' + str(len(act_dir2)))
for items in act_dir2:
    if(train_i2 == 500):
        break
    path = os.path.join(dir_name2, items)
    temp_img = np.array(Image.open(path))
    img_set.append([temp_img, 0])
    train_i2 += 1

Archivos en la carpeta ./Train/WithoutMask/ : 5000


In [9]:
print('Total files: ' + str(len(act_dir) + len(act_dir2)))

Archivos totales: 10000


In [10]:
temp_array = np.array(img_set)
print('Shape of array: ' + str(temp_array.shape))

Forma del arreglo generado: (1000, 2)


In [11]:
np.random.shuffle(temp_array)

In [12]:
set_x = []
set_y = []
for elem in temp_array:
    set_x.append(elem[0])
    set_y.append(elem[1])

In [14]:
save_path = './Train/train_data_1000.h5'
ds_file = h5py.File(save_path, 'a')
imgs = ds_file.create_dataset('train_set_x', data = np.array(set_x))
labels = ds_file.create_dataset('train_set_y', data = np.array(set_y))
ds_file.close()
print('File size: %d bytes'%os.path.getsize(save_path))

Tamaño del archivo generado: 12298048 bytes


Para **test_data.h5** se tiene: 

In [15]:
dir_name = './Test/WithMask/'
img_set = []
act_dir = os.listdir(dir_name)
test_i = 0
print('Files in dir  ' + dir_name + ' : ' + str(len(act_dir)))
for items in act_dir:
    if(test_i == 50):
        break
    path = os.path.join(dir_name, items)
    temp_img = np.array(Image.open(path))
    img_set.append([temp_img, 1])
    test_i += 1

Archivos en la carpeta ./Test/WithMask/ : 483


In [16]:
dir_name2 = './Test/WithoutMask/'
act_dir2 = os.listdir(dir_name2)
test_i2 = 0
print('Files in dir ' + dir_name2 + ' : ' + str(len(act_dir2)))
for items in act_dir2:
    if(test_i2 == 50):
        break
    path = os.path.join(dir_name2, items)
    temp_img = np.array(Image.open(path))
    img_set.append([temp_img, 0])
    test_i2 += 1

Archivos en la carpeta ./Test/WithoutMask/ : 509


In [17]:
print('Total files: ' + str(len(act_dir) + len(act_dir2)))

Archivos totales: 992


In [18]:
temp_array = np.array(img_set)
print('Shape of array: ' + str(temp_array.shape))

Forma del arreglo generado: (100, 2)


In [19]:
np.random.shuffle(temp_array)
set_x = []
set_y = []
for elem in temp_array:
    set_x.append(elem[0])
    set_y.append(elem[1])

In [20]:
save_path = './Test/test_data_100.h5'
ds_file = h5py.File(save_path, 'a')
imgs = ds_file.create_dataset('test_set_x', data = np.array(set_x))
labels = ds_file.create_dataset('test_set_y', data = np.array(set_y))
ds_file.close()
print('File size: %d bytes'%os.path.getsize(save_path))

Tamaño del archivo generado: 1231648 bytes


Para validation_data.h5 se tiene:

In [21]:
dir_name = './Validation/WithMask/'
img_set = []
act_dir = os.listdir(dir_name)
val_i = 0
print('Files in dir  ' + dir_name + ' : ' + str(len(act_dir)))
for items in act_dir:
    if(val_i == 50):
        break
    path = os.path.join(dir_name, items)
    temp_img = np.array(Image.open(path))
    img_set.append([temp_img, 1])
    val_i += 1

Archivos en la carpeta ./Validation/WithMask/ : 400


In [23]:
dir_name2 = './Validation/WithoutMask/'
act_dir2 = os.listdir(dir_name2)
val_2 = 0
print('Files in dir  ' + dir_name2 + ' : ' + str(len(act_dir2)))
for items in act_dir2:
    if(val_2 == 50):
        break
    path = os.path.join(dir_name2, items)
    temp_img = np.array(Image.open(path))
    img_set.append([temp_img, 0])
    val_2+=1

Archivos en la carpeta ./Validation/WithoutMask/ : 400


In [24]:
print('Total files: ' + str(len(act_dir) + len(act_dir2)))

Archivos totales: 800


In [25]:
temp_array = np.array(img_set)
print('Shape of array: ' + str(temp_array.shape))

Forma del arreglo generado: (100, 2)


In [26]:
np.random.shuffle(temp_array)
set_x = []
set_y = []
for elem in temp_array:
    set_x.append(elem[0])
    set_y.append(elem[1])

In [27]:
save_path = './Validation/validation_100.h5'
ds_file = h5py.File(save_path, 'a')
imgs = ds_file.create_dataset('validation_set_x', data = np.array(set_x))
labels = ds_file.create_dataset('validation_set_y', data = np.array(set_y))
ds_file.close()
print('File size: %d bytes'%os.path.getsize(save_path))

Tamaño del archivo generado: 1231648 bytes
