# Imports

In [65]:
import os
import numpy as np
from PIL import Image

# Constants

In [66]:
DATA_DIR = os.path.join(os.getcwd(), "data")
TRAIN_DIR = os.path.join(DATA_DIR, "train")
TEST_DIR = os.path.join(DATA_DIR, "test")

# Convolutional Neural Network

## Reading Data
Data are images 48x48 in grayscale (expected size is 48x48, but grayscale is mandatory for the images).  
Grayscale meaning, there is only 1 channel (for RGB image there are 3 channels).  
So the actual image is of size (48, 48, 1) - (image_width, image_height, n_channels).  

This section extracts the images and labels.

1. Read images, from each image remember its label (f.e. happy, sad, ...). We now have images and label_names lists.
2. Map the label names list, each label name will now have its own identificator. (f.e. happy = 0, sad = 1, ...). We now have labels list.
3. Convert the labels list into a 2D one-hot encoded vector array.

In [67]:
def get_data(data_dir: str, image_size=(48, 48)) -> (np.array, np.array):
    
    """
    Reading data from data_dir.
    Each image is processed, converted into a numpy array and then normalized (every value is divided by 255).
    Image is expected to be in grayscale (.convert('L')), and its size should be 48x48 (default).
    
    Expected directory tree from which the images are processed:
    - data_dir
        - facial_expression_dir1
            - image1
            - image2
            - ...
        - facial_expression_dir2 
        - facial_expression_dir3
        - ...
        - facial_expression_dirN
        
    :param data_dir: directory from which the images are processed
    :param image_size: size of the images, default is 48x48
    :return: two numpy arrays, first numpy array has stored all the images, and second numpy array has stored all the label names
    """
    
    images = []
    label_names = []
    
    for expression_dirname in sorted(os.listdir(data_dir)):  
        # get every directory, this directory contains images of facial expressions 
        
        expression_dir = os.path.join(data_dir, expression_dirname)
        
        if not os.path.isdir(expression_dir):  # process only directories, skip non-directories - files
            continue
            
        for expression_image in os.listdir(expression_dir):
            # get every image in the directory
            
            image_path = os.path.join(expression_dir, expression_image)
            
            try:
                image = Image.open(image_path).convert('L')  # L mode, because images are grayscaled
                image = image.resize(image_size)  # resize the image to expected 48x48
                
                # convert image into an array, image is a 2D array
                image_array = np.array(image)
                
                # normalize image values, values between 0 - 1
                image_array = image_array / 255.0

                images.append(image_array)
                label_names.append(expression_dirname)  # directory name is already a label name of a facial expression
                
            except Exception as e:
                print(f"Failed to process image: {image_path}")
                print(e)
    
    # convert images and labels list into numpy arrays adn return them
    return np.array(images), np.array(label_names)            

In [68]:
train_images, train_label_names = get_data(TRAIN_DIR)
test_images, test_label_names = get_data(TEST_DIR)

print("Train Images:")
print(train_images.shape)
print(train_images, '\n')

print("Test Images:")
print(test_images.shape)
print(test_images, '\n')

print("Train Labels:")
print(train_label_names.shape)
print(train_label_names, '\n')

print("Test Labels:")
print(test_label_names.shape)
print(test_label_names, '\n')

Train Images:
(28709, 48, 48)
[[[0.70980392 0.70196078 0.69411765 ... 0.71372549 0.71372549 0.71372549]
  [0.70196078 0.69803922 0.69019608 ... 0.70196078 0.69411765 0.68627451]
  [0.70196078 0.69803922 0.69019608 ... 0.67843137 0.70196078 0.7254902 ]
  ...
  [0.76862745 0.70980392 0.74901961 ... 0.90196078 0.89411765 0.80392157]
  [0.76078431 0.72941176 0.78431373 ... 0.89019608 0.87058824 0.91372549]
  [0.77647059 0.77254902 0.83137255 ... 0.88627451 0.85882353 0.95294118]]

 [[0.08235294 0.07058824 0.10588235 ... 0.32941176 0.20392157 0.24705882]
  [0.08235294 0.08235294 0.10980392 ... 0.34509804 0.28235294 0.36078431]
  [0.09019608 0.10980392 0.12941176 ... 0.4        0.41176471 0.45882353]
  ...
  [0.99607843 0.99215686 1.         ... 0.63921569 0.61960784 0.61176471]
  [1.         1.         1.         ... 0.59607843 0.64705882 0.59607843]
  [0.99607843 1.         0.99215686 ... 0.61568627 0.57647059 0.56078431]]

 [[0.16078431 0.24705882 0.33333333 ... 0.13333333 0.1372549  0.12

In [69]:
print("Example of one Image:")
print(train_images[0].shape)
print(train_images[0])

Example of one Image:
(48, 48)
[[0.70980392 0.70196078 0.69411765 ... 0.71372549 0.71372549 0.71372549]
 [0.70196078 0.69803922 0.69019608 ... 0.70196078 0.69411765 0.68627451]
 [0.70196078 0.69803922 0.69019608 ... 0.67843137 0.70196078 0.7254902 ]
 ...
 [0.76862745 0.70980392 0.74901961 ... 0.90196078 0.89411765 0.80392157]
 [0.76078431 0.72941176 0.78431373 ... 0.89019608 0.87058824 0.91372549]
 [0.77647059 0.77254902 0.83137255 ... 0.88627451 0.85882353 0.95294118]]


### Map the label names into actual labels
Each label name should have its integer identificator.  
From label names list get labels list, where label name has been replaced by its identificator

In [70]:
def map_label_names(label_names: np.array) -> np.array:
    
    """
    Map the label names, each label name will have its own identificator.
    Replace the label names with their unique identifier.
    :param label_names: list of label names
    :return: an array of labels, where now the label names have been replaced by their unique identifier.
    """
    
    mapped_labels = {}
    
    # map the unique label names
    for label, unique_label_name in enumerate(np.unique(label_names)):
        mapped_labels[unique_label_name] = label

    # replace label name by its identificator
    labels = np.array([mapped_labels[label_name] for label_name in label_names])
    
    return labels

In [71]:
train_labels = map_label_names(train_label_names)
test_labels = map_label_names(test_label_names)

print("Train labels:")
print(train_labels.shape)
print(train_labels, '\n')

print("Test labels:")
print(test_labels.shape)
print(test_labels)

Train labels:
(28709,)
[0 0 0 ... 6 6 6] 

Test labels:
(7178,)
[0 0 0 ... 6 6 6]


### Encode the labels into one hot vectors


In [72]:
def one_hot_encode(labels: np.array, num_classes: int):
    
    """
    Encode the labels 1D vector into one-hot vectors encoding.
    One hot encoded vector has all zeros, but only one 1.
    
    :param labels: 1D vector of labels 
        F.e.:
        happy: 0
        sad: 1
        angry: 2
        labels: [0, 0, 1, 1, 2]
        
    :param num_classes: number of unique classes - of unique labels (f.e. 3 - happy, sad, and angry)
    :return: a 2D one hot encoded array.
    
            F.e.:
            labels = [0, 0, 1, 1, 2]
            one_hot = 
                [
                    [1 0 0]
                    [1 0 0]
                    [0 1 0]
                    [0 1 0]
                    [0 0 1]
                ]
            - shape of one_hot: (n_labels, unique_labels)
    """
    
    # an array full of zeros of shape (num_labels, num_classes) 
    one_hot = np.zeros((len(labels), num_classes))
    
    # set the 1 to appropriate labels
    for n_row, label in enumerate(labels):
        one_hot[n_row, label] = 1
    
    return one_hot

In [73]:
train_labels_one_hot = one_hot_encode(train_labels, num_classes)
test_labels_one_hot = one_hot_encode(test_labels, num_classes)

print("Train labels one-hot:")
print(train_labels_one_hot.shape)
print(train_labels_one_hot, '\n')

print("Test labels one-hot:")
print(test_labels_one_hot.shape)
print(test_labels_one_hot, '\n')

Train labels one-hot:
(28709, 7)
[[1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]] 

Test labels one-hot:
(7178, 7)
[[1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]] 



### Final Representation of Data


Training and testing sets:  
- image arrays: train_images and test_images
- one-hot encoded vector arrays: train_labels_one_hot, test_labels_one_hot

In [74]:
print("Train Images (first 5):")
print(train_images.shape)
print(train_images[:5], '\n')

print("Test Images (first 5):")
print(test_images.shape)
print(test_images[:5], '\n')

print("Train Labels one-hot:")
print(train_labels_one_hot.shape)
print(train_labels_one_hot, '\n')

print("Test Labels one-hot:")
print(test_labels_one_hot.shape)
print(test_labels_one_hot, '\n')

Train Images (first 5):
(28709, 48, 48)
[[[0.70980392 0.70196078 0.69411765 ... 0.71372549 0.71372549 0.71372549]
  [0.70196078 0.69803922 0.69019608 ... 0.70196078 0.69411765 0.68627451]
  [0.70196078 0.69803922 0.69019608 ... 0.67843137 0.70196078 0.7254902 ]
  ...
  [0.76862745 0.70980392 0.74901961 ... 0.90196078 0.89411765 0.80392157]
  [0.76078431 0.72941176 0.78431373 ... 0.89019608 0.87058824 0.91372549]
  [0.77647059 0.77254902 0.83137255 ... 0.88627451 0.85882353 0.95294118]]

 [[0.08235294 0.07058824 0.10588235 ... 0.32941176 0.20392157 0.24705882]
  [0.08235294 0.08235294 0.10980392 ... 0.34509804 0.28235294 0.36078431]
  [0.09019608 0.10980392 0.12941176 ... 0.4        0.41176471 0.45882353]
  ...
  [0.99607843 0.99215686 1.         ... 0.63921569 0.61960784 0.61176471]
  [1.         1.         1.         ... 0.59607843 0.64705882 0.59607843]
  [0.99607843 1.         0.99215686 ... 0.61568627 0.57647059 0.56078431]]

 [[0.16078431 0.24705882 0.33333333 ... 0.13333333 0.137

## Convolutional Layers
Convolution layers consists of multiple filters / kernels.  
Each filter is one convolution layer.

This filter can be of any size, let's say of size fxf. (f - n pixels in width or height)  

We use <b>convolution operations</b> to the input image using the filter, also we can use techniques like <b>padding</b> to the input image, or we can specify a <b>stride</b> during convolution operation.

### Convolutional Operations

For example let's say we have a filter, this concrete filter is used to detect vertical edges.  
Size of this filter is 3x3, f = 3.

|   |   |    |
|---|---|----|
| 1 | 0 | -1 | 
| 1 | 0 | -1 |
| 1 | 0 | -1 |

And we have an input image:

|   |   |   |   |
|---|---|---|---|
| 2 | 3 | 8 | 3 |
| 1 | 3 | 9 | 1 |
| 1 | 5 | 6 | 1 |
| 5 | 0 | 0 | 1 |

Now that we have defined a filter, and we have an input image, let's apply a convolution operation to the input image using our filter. Convolution operation is input_image * filter. Multiplying input image with the filter.  

|   |   |   |   |
|---|---|---|---|
| 2 | 3 | 2 | 3 |
| 1 | 3 | 5 | 1 |   
| 4 | 5 | 1 | 1 |
| 5 | 0 | 0 | 1 |

- input image size: (4 x 4)

 |   |   |    |
 |---|---|----|
 | 1 | 0 | -1 | 
 | 1 | 0 | -1 |
 | 1 | 0 | -1 |
 
- filter size(3x3)


Result after multiplying would be:

|    |   |
|----|---|
| -2 | 6 |
| 4  | 5 |  

- output image size: (2 x 2)

```
Notice that the result image - ouput image (2 x 2) has much lower resolution than the original one - input image (4 x 4).
So the main reason we use convolutional operations is to reduce dimensionality, reduce the complexity of the image.
```

Formula to get the size of the output image after the convolution:

```
f = size fo kernel
m = width of input image
n = height of the input image

(m x n) * (f x f) = (m - f + 1) x (n - f + 1)
```

Image can have multiple channels - c. For RGB image there would be 3 channels, meaning 3 matrices. One matrix for RED, one for BLUE and one for GREEN.  
In the example above, there was only 1 channel. If input image is RGB, the input image matrix would be 3 dimensional, there would be 3 matrices.  
Then the filter must also have 3 channels, to be able to compute the output image matrix.  

For example lets say we have an RGB image:

|   |   |   |   |
|---|---|---|---|
| 2 | 3 | 2 | 3 |
| 1 | 3 | 5 | 1 |   
| 4 | 5 | 1 | 1 |
| 5 | 0 | 0 | 1 |

- For red

|   |   |   |   |
|---|---|---|---|
| 0 | 3 | 2 | 3 |
| 0 | 1 | 5 | 0 |   
| 4 | 5 | 3 | 1 |
| 1 | 0 | 0 | 4 |

- For green

|   |   |   |   |
|---|---|---|---|
| 2 | 0 | 1 | 4 |
| 1 | 0 | 5 | 2 |   
| 5 | 0 | 8 | 2 |
| 3 | 2 | 0 | 3 |

- For blue

In this example the image of size: (4 x 4 x 3), where 4 x 4 is the size of the image and 3 is the number of channels.

Then the filter shlould also have 3 channels:

 |   |   |    |
 |---|---|----|
 | 1 | 0 | -1 | 
 | 1 | 0 | -1 |
 | 1 | 0 | -1 |
 
- 1st channel

 |    |    |   |
 |----|----|---|
 | 0  | 0  | 1 | 
 | 0  | -1 | 1 |
 | -1 | 0  | 1 |
 
- 2nd channel 

 |   |   |   |
 |---|---|---|
 | 1 | 0 | 1 | 
 | 0 | 1 | 0 |
 | 1 | 0 | 1 |
 
- 3rd channel

Convolutional operation would be the same as in the example where there was only 1 channel.  
Filter is now of size (3 x 3 x 3).

```
Images can have a depth - channels, for example for RGB image there are 3 channels, for grayscale image there is only 1 channel, etc...
```

Formula to get the size of the output image after the convolution, now having number of channels - c:

```
c = number of channels
f = size fo kernel
m = width of input image
n = height of the input image

(m x n x c) * (f x f x c) = (m - f + 1) x (n - f + 1) x c

Notice that number of channels remain, only the output image size has been reduced.
```

```
As you can see, there is an infite number of filters that can be applied. We cannot hard code the filters and expect great performance from our model.
Therefore our model will train the filter values to get the best possible filters for our model.
```


### Padding
