### Setup Environment:

In [1]:
from src.vlm_models import CLIP, BLIP2, LLAVA
from src.classifiers_base import preprocess_df
import pandas as pd
import os

2024-02-03 16:37:49.329156: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-03 16:37:49.366271: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Embeddings Generation

* **Dataframe:** Pandas dataframe with image path and text

* **Image Column:** Column with the path to the images

* **Text Column:** Column with text data

* **Batch Size:** Integer with the size of the batch

In [2]:
#model_name = 'blip2'
#model_name = 'clip'
model_name = 'blip2'

In [3]:
if model_name.lower() == 'clip':
    print('Creating Instance of CLIP model')
    model = CLIP()
elif model_name.lower() == 'blip2':
    print('Creating Instance of BLIP 2 model')
    model = BLIP2()
elif model_name.lower() == 'llava':
    print('Creating Instance of LLAVA model')
    model = LLAVA()
else:
    raise NotImplementedError('The model should be clip, blip2 or llava')

Creating Instance of BLIP 2 model


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## 1. DAQUAR

* **[DAQUAR Dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/vision-and-language/visual-turing-challenge#c7057)**:

DAQUAR (Dataset for Question Answering on Real-world images) dataset was created for the purpose of advancing research in visual question answering (VQA). It consists of indoor scene images, each accompanied by sets of questions related to the scene's content. The dataset serves as a benchmark for training and evaluating models in understanding images and answering questions about them.

We'll use the function `get_embeddings_df` to generate the embeddings in `datasets/daquar/images` and store the embeddings in `Embeddings/daquar/Embeddings_Backbone.csv`

In [None]:
batch_size = 16
dataset = 'daquar'
image_col = 'image_id'
text_col = 'question'
output_dir = f'Embeddings_vlm/{dataset}/'
output_file = f'embeddings_{model_name}.csv'

dataset_path = f'datasets/{dataset}/'
images_dir = 'images/'
labels = 'labels.csv'

images_path = os.path.join(dataset_path, images_dir)
labels_path = os.path.join(dataset_path, labels)

df = preprocess_df(df=pd.read_csv(labels_path), image_columns=image_col, images_path=images_path)

model.get_embeddings(dataframe=df, batch_size=batch_size, image_col_name=image_col, text_col_name=text_col, output_dir=output_dir, output_file=output_file)

100%|██████████| 12468/12468 [00:01<00:00, 9739.57it/s] 
100%|██████████| 12468/12468 [00:04<00:00, 2774.97it/s]


Processing batches:   0%|          | 0/520 [00:00<?, ?it/s]

It looks like you are trying to rescale already rescaled images. If the input images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again.
It looks like you are trying to rescale already rescaled images. If the input images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again.
It looks like you are trying to rescale already rescaled images. If the input images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again.
It looks like you are trying to rescale already rescaled images. If the input images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again.
It looks like you are trying to rescale already rescaled images. If the input images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again.
It looks like you are trying to rescale already rescaled images. If the input images have pixel values between 0 and 1, set `do_re

Batch 0


## 2. COCO-QA

* **[COCO-QA Dataset](https://www.cs.toronto.edu/~mren/research/imageqa/data/cocoqa/)**:

The COCO-QA (COCO Question-Answering) dataset is designed for the task of visual question-answering. It is a subset of the COCO (Common Objects in Context) dataset, which is a large-scale dataset containing images with object annotations. The COCO-QA dataset extends the COCO dataset by including questions and answers associated with the images. Each image in the COCO-QA dataset is accompanied by a set of questions and corresponding answers.

We'll use the function `get_embeddings_df` to generate the embeddings in `datasets/coco-qa/images` and store the embeddings in `Embeddings/coco-qa/Embeddings_Backbone.csv`

In [None]:
batch_size = 24
dataset = 'coco-qa'
image_col = 'image_id'
text_col = 'questions'
output_dir = f'Embeddings_vlm/{dataset}'
output_file = 'embeddings_clip.csv'

dataset_path = f'datasets/{dataset}/'
images_dir = 'images/'
labels = 'labels.csv'

images_path = os.path.join(dataset_path, images_dir)
labels_path = os.path.join(dataset_path, labels)

df = preprocess_df(df=pd.read_csv(labels_path), image_columns=image_col, images_path=images_path)

model.get_embeddings(dataframe=df, batch_size=batch_size, image_col_name=image_col, text_col_name=text_col, output_dir=output_dir, output_file=output_file)

#### 

## 2. Fakeddit

* **[Fakeddit Dataset](https://fakeddit.netlify.app/)**:

Fakeddit is a large-scale multimodal dataset for fine-grained fake news detection. It consists of over 1 million samples from multiple categories of fake news, including satire, misinformation, and fabricated news. The dataset includes text, images, metadata, and comment data, making it a rich resource for developing and evaluating fake news detection models.

We'll use the function `get_embeddings_df` to generate the embeddings in `datasets/fakeddit/images` and store the embeddings in `Embeddings/fakeddit/Embeddings_Backbone.csv`

In [None]:
batch_size = 24
dataset = 'fakeddit'
image_col = 'id'
text_col = 'title'
output_dir = f'Embeddings_vlm/{dataset}'
output_file = 'embeddings_clip.csv'

dataset_path = f'datasets/{dataset}/'
images_dir = 'images/'
labels = 'labels_subset.csv'

images_path = os.path.join(dataset_path, images_dir)
labels_path = os.path.join(dataset_path, labels)

df = preprocess_df(df=pd.read_csv(labels_path), image_columns=image_col, images_path=images_path)

model.get_embeddings(dataframe=df, batch_size=batch_size, image_col_name=image_col, text_col_name=text_col, output_dir=output_dir, output_file=output_file)

## 4. Recipes5k

* **[Recipes5k Dataset](http://www.ub.edu/cvub/recipes5k/)**:

The Recipes5k dataset comprises 4,826 recipes featuring images and corresponding ingredient lists, with 3,213 unique ingredients simplified from 1,014 by removing overly-descriptive particles, offering a diverse collection of alternative preparations for each of the 101 food types from Food101, meticulously balanced across training, validation, and test splits. The dataset addresses intra- and inter-class variability, extracted from Yummly with 50 recipes per food type.


We'll use the function `get_embeddings_df` to generate the embeddings in `datasets/Recipes5k/images` and store the embeddings in `Embeddings/Recipes5k/Embeddings_Backbone.csv`

In [None]:
batch_size = 24
dataset = 'Recipes5k'
image_col = 'image'
text_col = 'ingredients'
output_dir = f'Embeddings_vlm/{dataset}'
output_file = 'embeddings_clip.csv'

dataset_path = f'datasets/{dataset}/'
images_dir = 'images/'
labels = 'labels.csv'

images_path = os.path.join(dataset_path, images_dir)
labels_path = os.path.join(dataset_path, labels)

df = preprocess_df(df=pd.read_csv(labels_path), image_columns=image_col, images_path=images_path)

model.get_embeddings(dataframe=df, batch_size=batch_size, image_col_name=image_col, text_col_name=text_col, output_dir=output_dir, output_file=output_file)