### Recipes 5k

* [Dataset](http://www.ub.edu/cvub/recipes5k/)

* [Original Paper](https://www.researchgate.net/publication/318729535_Food_Ingredients_Recognition_through_Multi-label_Learning)

In [None]:
%cd ..

### Setup Environment:

In [2]:
import os
import pandas as pd

from src.classifiers import process_labels, split_data
from src.classifiers_base import preprocess_df

from transformers import BertTokenizer

from src.multimodal_data_loader import VQADataset
from torch.utils.data import DataLoader

from src.classifiers_base_cpu_metrics import calculate_memory

In [3]:
PATH = 'datasets/Recipes5k/'

In [4]:
text_path = os.path.join(PATH, 'labels.csv')
images_path = os.path.join(PATH, 'images')

## Get data

In [5]:
df = pd.read_csv(text_path)
df

Unnamed: 0,image,class,split,ingredients
0,onion_rings/0_einkorn_onion_rings_hostedLargeU...,onion_rings,val,"yellow onion,flour,baking powder,seasoning sal..."
1,onion_rings/1_crispy_onion_rings_hostedLargeUr...,onion_rings,train,"white onion,panko,cornmeal,ground paprika,onio..."
2,onion_rings/2_onion_rings_hostedLargeUrl.jpg,onion_rings,train,"yellow onion,all-purpose flour,baking powder,l..."
3,onion_rings/3_onion_rings_hostedLargeUrl.jpg,onion_rings,train,"oil,pancake mix,spanish onion"
4,onion_rings/4_onion_rings_hostedLargeUrl.jpg,onion_rings,train,"peanut oil,sweet onion,flour,eggs,celery salt,..."
...,...,...,...,...
4821,chocolate_ice_cream/45_chocolate_ice_cream_hos...,chocolate_ice_cream,train,"dark chocolate,whole milk,unsweetened cocoa po..."
4822,chocolate_ice_cream/46_dark_chocolate_ice_crea...,chocolate_ice_cream,train,"half & half,whole milk,heavy cream,sugar,sea s..."
4823,chocolate_ice_cream/47_the_darkest_chocolate_i...,chocolate_ice_cream,train,"unsweetened cocoa powder,brewed coffee,granula..."
4824,chocolate_ice_cream/48_homemade_chocolate_ice_...,chocolate_ice_cream,train,"unsweetened cocoa powder,sugar,firmly packed b..."


## Data Perparation

In [6]:
# Select features and labels vectors
text_columns = 'ingredients'
image_columns = 'image'
label_columns = 'class'

df = preprocess_df(df, image_columns, images_path)

# Split the data
train_df, test_df = split_data(df)

# Process and one-hot encode labels for training set
train_labels, mlb, train_columns = process_labels(train_df, col=label_columns)
test_labels = process_labels(test_df, col=label_columns, train_columns=train_columns)

100%|██████████| 4826/4826 [00:00<00:00, 13443.84it/s]
100%|██████████| 4826/4826 [00:00<00:00, 15914.57it/s]


Train Shape: (3409, 4)
Test Shape: (783, 4)


In [7]:
# Instantiate tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

In [11]:
train_dataset = VQADataset(train_df, text_columns, image_columns, label_columns, mlb, train_columns, tokenizer)
test_dataset = VQADataset(test_df, text_columns, image_columns, label_columns, mlb, train_columns, tokenizer)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=1)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=1)

### Models

In [12]:
output_size = len(mlb.classes_)
multilabel = False

In [10]:
calculate_memory(train_loader, test_loader, output_size)

Early fusion:
Average Memory per Batch in Train: 36.27 MB
Total Memory Usage per Epoch Train: 1958.83 MB (excluding model parameters)
Test:
Average Memory per Batch in Test: 8.33 MB
Total Memory Usage per Epoch Test: 108.31 MB (excluding model parameters)
Model: 
Model Memory Usage: 747.99 MB
Late fusion:
Average Memory per Batch in Train: 36.27 MB
Total Memory Usage per Epoch Train: 1958.83 MB (excluding model parameters)
Test:
Average Memory per Batch in Test: 8.33 MB
Total Memory Usage per Epoch Test: 108.31 MB (excluding model parameters)
Model: 
Model Memory Usage: 747.62 MB
