$$
\Huge G \cup n \; \sqrt{-1} \; \ell \; e \; \emptyset
$$

<p style="text-align: center">A lipsync project, made by Nil Atabey, Leonardo Biason and Günak Yuzak</p>


---

<h2 align="center"><b>Table of Contents</b></h2>

1. [Code structure](#1-code-structure)
2. [Import of the Packages](#2-import-of-the-packages)
3. [Data Loading](#3-data-loading)
4. [Model Settings](#4-model-settings)
5. [Training + Testing](#5-training--testing)

$$
\newcommand{\goto}{\; \longrightarrow \;}
\newcommand{\tdconv}{\text{2D Convolution} }
\newcommand{\relu}{\text{ReLU} }
$$

---

## 1) Code Structure

The code structure is the following:

```python
project
 ├ assets
 │  ├ cnn.py
 │  └ dataloader.py
 └ data
```

---

## 2) Import of the Packages

Standard packages needed that can be installed with either `conda` or `pip`:

In [1]:
# Pytorch imports
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch import nn
import torchmetrics
import torchinfo

# Utils imports
import numpy as np
import os
import matplotlib.pyplot as plt

Custom imports from our libraries:

In [2]:
from assets.gnldataloader import GNLDataLoader
from assets.cnn import LabialCNN
from assets.loops import train_loop, test_loop

---

## 3) Data Loading

In order to clean the code and make sure that there were the same number of videos and labels, the following code was performed:

```python
a, b = sorted(os.listdir("data/matching/fronts")), sorted(os.listdir("data/matching/labels"))
a_new, b_new = set([item[:-4] for item in a]), set([item[:-5] for item in b])
tot = a_new.intersection(b_new)
print(tot)

for item in b:
    if item[:-5] in tot:
        prev_path = os.path.join(path_labels, item)
        os.rename(prev_path, os.path.join("data/matching/labels", item))

print(len(tot))
```

In [3]:
# Create the dataloaders of our project
path_data = "data/matching/fronts" # "data/lombardgrid_front/lombardgrid/front"
path_labels = "data/matching/labels" # "data/lombardgrid_alignment/lombardgrid/alignment"

dataset = GNLDataLoader(path_labels, path_data, transform=None, debug=False)

# Test
print(
    f"[DEBUG] Items in the data folder: {len(sorted(os.listdir(path_data)))}",
    f"[DEBUG] Items in the labels folder: {len(sorted(os.listdir(path_labels)))}",
    sep="\n"
)
#print(dataset[0][1][0])
# print(dataset[0][1][0].shape)

dset_train = dataset[0:16]

dataloader_train = DataLoader(dset_train, batch_size=8, shuffle=True)
dataloader_test = DataLoader(dataset[16:24], batch_size=8, shuffle=True)

[DEBUG] Items in the data folder: 5129
[DEBUG] Items in the labels folder: 5129


Labels in datasets are not tensorized, so maybe that's the problem?

Normal dataset:
- The index determines the piece of data

Our dataset:
- The index determines the feature $\Longrightarrow$ MUST BE CONVERTED

In [9]:
"""test_data = dataset[1:3]
dloader = DataLoader(test_data, batch_size = 1, shuffle = True)"""

for index, (x, y) in enumerate(dset_train):
    print(y.shape)

torch.Size([26])
torch.Size([23])
torch.Size([25])
torch.Size([26])
torch.Size([25])
torch.Size([25])
torch.Size([21])
torch.Size([24])
torch.Size([26])
torch.Size([26])
torch.Size([26])
torch.Size([28])
torch.Size([25])
torch.Size([23])
torch.Size([28])
torch.Size([29])


## 4) Model Settings

The following settings are applied:
> `device`: specifies where the model must be trained. If an Nvidia GPU is detected, then CUDA will be used;<br>
> `epochs`: the number of epochs;<br>
> `batch_size`: the size of each singular batch of analysed images;<br>
> `learning_rate`: `N/A`;<br>
> `loss_fn`: the loss function of the model;<br>
> `optimizer`: the optimizer of the model. For now it's `AdamW`, which is more performant than `SGD`.

The model has the following layers:

$$
\underbrace{x}_{\text{input}} \goto \underbrace{st_0(3, \; 5, \; 5)}_{\text{ST CNN}} \goto \underbrace{p_0(1, \; 2, \; 2)}_{\text{Normalization Pool}} \goto \underbrace{st_1(3, \; 5, \; 5)}_{\text{ST CNN}} \goto \underbrace{p_1(1, \; 2, \; 2)}_{\text{Normalization Pool}} \goto
$$
$$
\goto \underbrace{st_2(3, \; 5, \; 5)}_{\text{ST CNN}} \goto \underbrace{p_2(1, \; 2, \; 2)}_{\text{Normalization Pool}} \goto \underbrace{y}_{\text{Output}}
$$

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

model = LabialCNN(debug=False).to(device)

# Print the summary of the model
torchinfo.summary(model, (1,75, 100, 150), col_names = ("input_size", "output_size", "num_params", "kernel_size", "mult_adds"), verbose = 1)

Here are the hyperparameters:

In [None]:
epochs = 2
batch_size = 32
learning_rate = 10 ** (-4)
dropout = 0.5

loss_fn = nn.CTCLoss(reduction="mean")
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

# 5) Training + Testing

In [None]:
for epoch_ind in range(epochs):
    train_loop(device,dataloader_train, model, loss_fn, optimizer,epochs, epoch_ind, debug=False)
    test_loop(device,dataloader_test, model, loss_fn, debug=False)

print("=== The training has finished ===")