$$
\Huge G \cup n \; \sqrt{-1} \; \ell \; e \; \emptyset
$$

<p style="text-align: center">A lipsync project, made by Nil Atabey, Leonardo Biason and Günak Yuzak</p>


---

<h2 align="center"><b>Table of Contents</b></h2>

1. [Code structure](#1-code-structure)
2. [Import of the Packages](#2-import-of-the-packages)
3. [Data Loading](#3-data-loading)
4. [Model Settings](#4-model-settings)

$$
\newcommand{\goto}{\; \longrightarrow \;}
\newcommand{\tdconv}{\text{2D Convolution} }
\newcommand{\relu}{\text{ReLU} }
$$

---

## 1) Code Structure

The code structure is the following:

```python
project
 ├ assets
 │  ├ cnn.py
 │  └ dataloader.py
 └ data
```

---

## 2) Import of the Packages

Standard packages needed that can be installed with either `conda` or `pip`:

In [1]:
# Pytorch imports
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch import nn
import torchmetrics
import torchinfo

# Utils imports
import numpy as np
import os
import matplotlib.pyplot as plt

Custom imports from our libraries:

In [2]:
from assets.gnldataloader import GNLDataLoader
#from assets.cnn import LabialCNN

---

## 3) Data Loading

In order to clean the code and make sure that there were the same number of videos and labels, the following code was performed:

```python
a, b = sorted(os.listdir("data/matching/fronts")), sorted(os.listdir(path_labels))
a_new, b_new = set([item[:-4] for item in a]), set([item[:-5] for item in b])
tot = a_new.intersection(b_new)
print(tot)

for item in b:
    if item[:-5] in tot:
        prev_path = os.path.join(path_labels, item)
        os.rename(prev_path, os.path.join("data/matching/labels", item))

print(len(tot))
```

In [4]:
# Create the dataloaders of our project
path_data = "data/matching/fronts" # "data/lombardgrid_front/lombardgrid/front"
path_labels = "data/matching/labels" # "data/lombardgrid_alignment/lombardgrid/alignment"

dataLoader = GNLDataLoader(path_labels, path_data, transform=None, debug=True)

# Test
print(
    f"[DEBUG] Items in the data folder: {len(sorted(os.listdir(path_data)))}",
    f"[DEBUG] Items in the labels folder: {len(sorted(os.listdir(path_labels)))}",
    sep="\n"
)

part_of_dataset = dataLoader[1:3]

[DEBUG] The data dir has been recognized
[DEBUG] The label dir has been recognized
[DEBUG] Items in the data folder: 5319
[DEBUG] Items in the labels folder: 5319
[DEBUG] Index of the dataloader: slice(1, 3, None)
[DEBUG] Data folder: ['s10_l_bbay5n.mov', 's10_l_bbbg8s.mov']
[DEBUG] Labels folder: ['s10_l_bbay5n.json', 's10_l_bbbg8s.json']
[DEBUG] Trying to open the video at path data/matching/fronts/s10_l_bbay5n.mov


[DEBUG] Trying to open the video at path data/matching/fronts/s10_l_bbbg8s.mov
tensor([37,  2,  9, 14, 37,  2, 12, 21,  5, 37,  1, 20, 37, 25, 37,  6,  9, 22,
         5, 37, 14, 15, 23])
tensor([37,  2,  9, 14, 37,  2, 12, 21,  5, 37,  2, 25, 37,  7, 37,  5,  9,  7,
         8, 20, 37, 19, 15, 15, 14])


([array([[[ 5.21623414e-01,  5.21623414e-01,  4.73855119e-01, ...,
            2.35013643e-01,  3.06666086e-01,  3.30550234e-01],
          [ 5.21623414e-01,  4.97739267e-01,  4.73855119e-01, ...,
            2.35013643e-01,  3.06666086e-01,  3.30550234e-01],
          [ 4.97739267e-01,  4.97739267e-01,  4.73855119e-01, ...,
            2.11129496e-01,  2.58897791e-01,  2.82781938e-01],
          ...,
          [ 6.78246101e-02,  6.78246101e-02,  9.17087576e-02, ...,
           -2.60719992e+00, -2.63108407e+00, -2.65496822e+00],
          [ 9.17087576e-02,  6.78246101e-02,  9.17087576e-02, ...,
           -2.55943162e+00, -2.63108407e+00, -2.65496822e+00],
          [ 9.17087576e-02,  9.17087576e-02,  9.17087576e-02, ...,
           -2.51166333e+00, -2.63108407e+00, -2.65496822e+00]],
  
         [[ 4.73058565e-01,  3.97934661e-01,  3.47852058e-01, ...,
            1.72562947e-01,  1.97604249e-01,  2.22645550e-01],
          [ 4.22975962e-01,  3.72893359e-01,  3.22810756e-01, ...,
    

## 4) Model Settings

The following settings are applied:
> `device`: specifies where the model must be trained. If an Nvidia GPU is detected, then CUDA will be used;<br>
> `epochs`: the number of epochs;<br>
> `batch_size`: the size of each singular batch of analysed images;<br>
> `learning_rate`: `N/A`;<br>
> `loss_fn`: the loss function of the model;<br>
> `optimizer`: the optimizer of the model. For now it's `AdamW`, which is more performant than `SGD`.

The model has the following layers:

$$
\underbrace{x}_{\text{input}} \goto \underbrace{st_0(3, \; 5, \; 5)}_{\text{ST CNN}} \goto \underbrace{p_0(1, \; 2, \; 2)}_{\text{Normalization Pool}} \goto \underbrace{st_1(3, \; 5, \; 5)}_{\text{ST CNN}} \goto \underbrace{p_1(1, \; 2, \; 2)}_{\text{Normalization Pool}} \goto
$$
$$
\goto \underbrace{st_2(3, \; 5, \; 5)}_{\text{ST CNN}} \goto \underbrace{p_2(1, \; 2, \; 2)}_{\text{Normalization Pool}} \goto \underbrace{y}_{\text{Output}}
$$

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

model = LabialCNN(debug=True).to(device)

epochs = 2
batch_size = 16
learning_rate = 0.0001

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)