$$
\Huge G \cup n \; \sqrt{-1} \; \ell \; e \; \emptyset
$$

<p style="text-align: center">A lipsync project, made by Nil Atabey, Leonardo Biason and Günak Yüzak</p>


---

<h2 align="center"><b>Table of Contents</b></h2>

1. [Code structure](#1-code-structure)
2. [Import of the Packages](#2-import-of-the-packages)
3. [Data Loading](#3-data-loading)
4. [Model Settings](#4-model-settings)
5. [Training + Testing](#5-training--testing)

$$
\newcommand{\goto}{\; \longrightarrow \;}
\newcommand{\tdconv}{\text{2D Convolution} }
\newcommand{\relu}{\text{ReLU} }
$$

---

## 1) Code Structure

The code structure is the following:

```python
project
 ├ assets
 │  ├ cnn.py
 │  └ dataloader.py
 └ data
```

---

## 2) Import of the Packages

Standard packages needed that can be installed with either `conda` or `pip`:

In [1]:
# Pytorch imports
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch import nn
import torchmetrics
import torchinfo

# Utils imports
import numpy as np
import os
import matplotlib.pyplot as plt

Custom imports from our libraries:

In [2]:
from assets.gnldataloader import GNLDataLoader
from assets.cnn import LabialCNN
from assets.loops import train_loop, test_loop

---

## 3) Data Loading

In order to clean the code and make sure that there were the same number of videos and labels, the following code was performed:

```python
a, b = sorted(os.listdir("data/matching/fronts")), sorted(os.listdir("data/matching/labels"))
a_new, b_new = set([item[:-4] for item in a]), set([item[:-5] for item in b])
tot = a_new.intersection(b_new)
print(tot)

for item in b:
    if item[:-5] in tot:
        prev_path = os.path.join(path_labels, item)
        os.rename(prev_path, os.path.join("data/matching/labels", item))

print(len(tot))
```

In [7]:
# Create the dataloaders of our project
path_data = "data/matching/fronts" # "data/lombardgrid_front/lombardgrid/front"
path_labels = "data/matching/labels" # "data/lombardgrid_alignment/lombardgrid/alignment"

dataset = GNLDataLoader(path_labels, path_data, transform=None, debug=True)

# Test
print(
    f"[DEBUG] Items in the data folder: {len(sorted(os.listdir(path_data)))}",
    f"[DEBUG] Items in the labels folder: {len(sorted(os.listdir(path_labels)))}",
    sep="\n"
)
#print(dataset[0][1][0])
# print(dataset[0][1][0].shape)

dset_train = dataset[0:128]

dataloader_train = DataLoader(dset_train, batch_size=32, shuffle=True, num_workers=128)
dataloader_test = DataLoader(dataset[128:192], batch_size=32, shuffle=True, num_workers=128)

[DEBUG] The data dir has been recognized
[DEBUG] The label dir has been recognized
[DEBUG] Items in the data folder: 5129
[DEBUG] Items in the labels folder: 5129
[DEBUG] Index of the dataloader: slice(0, 128, None)
[DEBUG] Data folder: ['s10_l_bbat9p.mov', 's10_l_bbay5n.mov', 's10_l_bbbg8s.mov', 's10_l_bbbg9p.mov', 's10_l_bgin7n.mov', 's10_l_braczp.mov', 's10_l_brin2n.mov', 's10_l_bwbc1s.mov', 's10_l_bwbi7s.mov', 's10_l_bwbp9a.mov', 's10_l_bwiu3s.mov', 's10_l_bwwoza.mov', 's10_l_lbbt4a.mov', 's10_l_lgif6n.mov', 's10_l_lgwp6p.mov', 's10_l_lgwz8a.mov', 's10_l_lral6p.mov', 's10_l_lwix7n.mov', 's10_l_pbbj5n.mov', 's10_l_pbbq3s.mov', 's10_l_pbwf9p.mov', 's10_l_pbwm2p.mov', 's10_l_pbwq2a.mov', 's10_l_pbwz2n.mov', 's10_l_pgae7s.mov', 's10_l_pgba6s.mov', 's10_l_pgil3n.mov', 's10_l_pgir2a.mov', 's10_l_pgwe9p.mov', 's10_l_prax4n.mov', 's10_l_prayzp.mov', 's10_l_prws1p.mov', 's10_l_pwwh1p.mov', 's10_l_pwws8n.mov', 's10_l_sbbh6a.mov', 's10_l_sbib5a.mov', 's10_l_sgad4s.mov', 's10_l_sgbj5p.mov', 's

KeyboardInterrupt: 

## 4) Model Settings

The following settings are applied:
> `device`: specifies where the model must be trained. If an Nvidia GPU is detected, then CUDA will be used;<br>
> `epochs`: the number of epochs;<br>
> `batch_size`: the size of each singular batch of analysed images;<br>
> `learning_rate`: `N/A`;<br>
> `loss_fn`: the loss function of the model;<br>
> `optimizer`: the optimizer of the model. For now it's `AdamW`, which is more performant than `SGD`.

The model has the following layers:

$$
\underbrace{x}_{\text{input}} \goto \underbrace{st_0(3, \; 5, \; 5)}_{\text{ST CNN}} \goto \underbrace{p_0(1, \; 2, \; 2)}_{\text{Normalization Pool}} \goto \underbrace{st_1(3, \; 5, \; 5)}_{\text{ST CNN}} \goto \underbrace{p_1(1, \; 2, \; 2)}_{\text{Normalization Pool}} \goto
$$
$$
\goto \underbrace{st_2(3, \; 5, \; 5)}_{\text{ST CNN}} \goto \underbrace{p_2(1, \; 2, \; 2)}_{\text{Normalization Pool}} \goto \underbrace{y}_{\text{Output}}
$$

In [5]:
device = "cuda" if torch.cuda.is_available() else "cpu"

model = LabialCNN(debug=False).to(device)

# Print the summary of the model
torchinfo.summary(model, (1,75, 100, 150), col_names = ("input_size", "output_size", "num_params", "kernel_size", "mult_adds"), verbose = 1)

Layer (type:depth-idx)                   Input Shape               Output Shape              Param #                   Kernel Shape              Mult-Adds
LabialCNN                                [1, 75, 100, 150]         [75, 37]                  --                        --                        --
├─Sequential: 1-1                        [1, 75, 100, 150]         [32, 75, 6, 9]            --                        --                        --
│    └─Conv3d: 2-1                       [1, 75, 100, 150]         [8, 75, 50, 75]           608                       [3, 5, 5]                 18,240,000
│    └─ReLU: 2-2                         [8, 75, 50, 75]           [8, 75, 50, 75]           --                        --                        --
│    └─MaxPool3d: 2-3                    [8, 75, 50, 75]           [8, 75, 25, 37]           --                        [1, 2, 2]                 --
│    └─Conv3d: 2-4                       [8, 75, 25, 37]           [16, 75, 25, 37]          9,61

  return self._call_impl(*args, **kwargs)


Layer (type:depth-idx)                   Input Shape               Output Shape              Param #                   Kernel Shape              Mult-Adds
LabialCNN                                [1, 75, 100, 150]         [75, 37]                  --                        --                        --
├─Sequential: 1-1                        [1, 75, 100, 150]         [32, 75, 6, 9]            --                        --                        --
│    └─Conv3d: 2-1                       [1, 75, 100, 150]         [8, 75, 50, 75]           608                       [3, 5, 5]                 18,240,000
│    └─ReLU: 2-2                         [8, 75, 50, 75]           [8, 75, 50, 75]           --                        --                        --
│    └─MaxPool3d: 2-3                    [8, 75, 50, 75]           [8, 75, 25, 37]           --                        [1, 2, 2]                 --
│    └─Conv3d: 2-4                       [8, 75, 25, 37]           [16, 75, 25, 37]          9,61

Here are the hyperparameters:

In [6]:
epochs = 2
batch_size = 32
learning_rate = 10 ** (-4)
dropout = 0.5

loss_fn = nn.CTCLoss(reduction="mean")
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

# 5) Training + Testing

In [7]:
for epoch_ind in range(epochs):
    train_loop(device, dataloader_train, model, loss_fn, optimizer, epochs, epoch_ind, debug=True)
    test_loop(device, dataloader_test, model, loss_fn, debug=True)

print("===          The training has finished          ===")

→ Loss: -1.6055443286895752 [Batch 1/4, Epoch 1/2]
→ Loss: -1.599102258682251 [Batch 2/4, Epoch 1/2]
→ Loss: -1.5805745124816895 [Batch 3/4, Epoch 1/2]
→ Loss: -1.5423518419265747 [Batch 4/4, Epoch 1/2]
===     The epoch 1/2 has finished training     ===
===        The testing loop has finished        ===
→ Loss: -1.5854508876800537 [Batch 1/4, Epoch 2/2]
→ Loss: -1.576768159866333 [Batch 2/4, Epoch 2/2]
→ Loss: -1.534742832183838 [Batch 3/4, Epoch 2/2]
→ Loss: -1.568716049194336 [Batch 4/4, Epoch 2/2]
===     The epoch 2/2 has finished training     ===
===        The testing loop has finished        ===
===          The training has finished          ===
