### Preprocessing stage

- Load train, validation and test datasets
- Convert and resize images
- Save new datasets
<img src='https://user-images.githubusercontent.com/74012107/130700197-a35e8979-caee-445f-a02d-fa4ec4c9e6e2.png' width='1000' height='600'>

### Datasets
The original dataset is available at [Kaggle](https://www.kaggle.com/ashishjangra27/face-mask-12k-images-dataset).
- Train dataset - 10 000 images (5000 face images with masks and 5000 without masks)  
- Validation dataset - 800 images (400 face images with masks and 400 without masks)
- Test dataset - 992 images (483 face images with masks and 509 without masks)

In [1]:
import cv2
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras import utils

In [2]:
categories = ['WithMask', 'WithoutMask']
labels = [0, 1]

label_dict = dict(zip(categories,labels))

print('Categories: ', categories)
print('Labels: ', labels)
print(label_dict)

Categories:  ['WithMask', 'WithoutMask']
Labels:  [0, 1]
{'WithMask': 0, 'WithoutMask': 1}


In [3]:
datasets = ['Train', 'Validation', 'Test']
paths = ['Dataset/Train', 'Dataset/Validation', 'Dataset/Test']

path_dict = dict(zip(datasets, paths))

print('Datasets: ', datasets)
print('Paths: ', paths)
print(path_dict)

Datasets:  ['Train', 'Validation', 'Test']
Paths:  ['Dataset/Train', 'Dataset/Validation', 'Dataset/Test']
{'Train': 'Dataset/Train', 'Validation': 'Dataset/Validation', 'Test': 'Dataset/Test'}


In [4]:
image_size = 100

# Load data and target sets
for dataset in datasets:
    data = []
    target = []

    for category in categories:
        folder_path = os.path.join(path_dict[dataset], category)
        image_names = os.listdir(folder_path)

        for image_name in image_names:
            image_path = os.path.join(folder_path, image_name)
            
            try:
                image = cv2.imread(image_path)

                # Convert the image from RGB in GRAY scale because the RGB color image contains so
                # much redundant information that is not necessary for face mask detection
                gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
                # Resize the GRAY scale into 100x100 to maintain uniformity of the input images
                resized = cv2.resize(gray, (image_size, image_size))

                # Append the image and its label
                data.append(resized)
                target.append(label_dict[category])
            
            except Exception as e:
                print('Failure: ', image_path)
                print('Exception: ',e)

    # Normalize the images in the range [0, 1]
    data = np.array(data) / 255.0
    # Reshape the images in 'channels last' format (samples, height, width, color_depth)
    data = np.reshape(data, (data.shape[0], image_size, image_size, 1))
    
    target = np.array(target)
    new_target = utils.to_categorical(target)

    # Print dataset information
    print(dataset + ':')
    print('Shape: {}'.format(data.shape))
    num_with_mask = np.sum(target == 0)
    print('With Mask: {0} ({1}%)'.format(num_with_mask, round(num_with_mask / data.shape[0], 2) * 100))
    num_without_mask = np.sum(target == 1)
    print('Without Mask: {0} ({1}%)'.format(num_without_mask, round(num_without_mask / data.shape[0], 2) * 100))
    print()
   
    # Save new dataset
    np.save('DatasetPreprocessed/data_' + dataset, data)
    np.save('DatasetPreprocessed/target_' + dataset, new_target)

Train:
Shape: (10000, 100, 100, 1)
With Mask: 5000 (50.0%)
Without Mask: 5000 (50.0%)

Validation:
Shape: (800, 100, 100, 1)
With Mask: 400 (50.0%)
Without Mask: 400 (50.0%)

Test:
Shape: (992, 100, 100, 1)
With Mask: 483 (49.0%)
Without Mask: 509 (51.0%)

