In [6]:
# enable auto reload of modules
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Model Training

This notebook will be used to load and preprocess the data, as well as train our own captioning model from scratch. If you want to use the model directly, please refer to the `main.py` file.

This notebook was made to be modular, in such a way that you can run only the section that you want. However, some sections may require the output of the previous section to work properly.

## Loading captions

In [29]:
from src.utils.data_utils import *

In [15]:
# load the annotation data
df = load_raw_captions_data("./data/captions.csv")

In [16]:
# generate the caption dictionary
captions_dic = generate_captions_dic(df)

In [17]:
# print info about the caption dictionary
n_images = len(captions_dic)
n_captions_per_image = len(next(iter(captions_dic.values())))
n_captions = n_images * n_captions_per_image

print(f"Number of images: {n_images}")
print(f"Number of captions per image: {n_captions_per_image}")
print(f"Total number of captions: {n_captions}")

Number of images: 31783
Number of captions per image: 5
Total number of captions: 158915


In [22]:
# clean the captions by removing any special characters and converting to lower case
captions_dic = clean_captions(captions_dic)

100%|██████████| 31783/31783 [00:00<00:00, 62322.14it/s]


In [23]:
# print tha captions for the first image
print(f"Captions for the first image:")
for cap in next(iter(captions_dic.values())):
    print(cap)

Captions for the first image:
two young guys with shaggy hair look at their hands while hanging out in the yard
two young white males are outside near many bushes
two men in green shirts are standing in a yard
a man in a blue shirt standing in a garden
two friends enjoy time spent together


In [27]:
# build the vocabulary
vocab = build_vocab(captions_dic)

# print the size of the vocabulary
print(f"Vocabulary size: {len(vocab)} words")

100%|██████████| 31783/31783 [00:00<00:00, 168730.30it/s]

Vocabulary size: 18288 words





In [32]:
# save the captions dictionary
save_captions_dic(captions_dic, "./data/")