# Image Captioning using Attention Mechanism

In this project, I will develop an Image Captioning model that leverages an Attention mechanism to generate descriptive captions for images. The model's architecture will consist of two main components:

- Encoder: This component utilizes a Convolutional Neural Network (CNN) to extract high-level features from input images. The CNN will process the image and produce a set of feature maps that capture essential visual information.

- Decoder: The Decoder, implemented as a Recurrent Neural Network (RNN), will use the feature maps from the Encoder to generate descriptive captions. An Attention mechanism will be incorporated to enable the model to focus on different parts of the image while generating each word in the caption, thereby improving the quality and relevance of the generated text.

By integrating these components, the model aims to provide accurate and contextually rich descriptions for a variety of images.

First, import all the necessary dependencies.

In [1]:
import os
import spacy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import zipfile

import torch
import torchtext; torchtext.disable_torchtext_deprecation_warning()
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as T
import torchvision.models as models

from PIL import Image
from collections import Counter

from torchtext.vocab import vocab
from torchtext.data.utils import get_tokenizer

from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import Dataset, DataLoader

Extracting the dataset

In [3]:
def extract_zip(zip_path, extract_to_folder):
    """
    Extracts a zip file to a specific folder.

    Parameters:
    zip_path (str): Path to zip file
    extract_to_folder (str): Directory where the ZIP file should be extracted
    """
    # Create the destination directory if it does not exist
    os.makedirs(extract_to_folder, exist_ok = True)

    try:
        # Opening the zip file
        with zipfile.ZipFile(zip_path, "r") as zip_ref:
            # Extracting all the contents into a specifies folder
            zip_ref.extractall(extract_to_folder)
        print(f"ZIP file extracted successfully to {extract_to_folder}.")

    except FileNotFoundError:
        print(f"Error: The file {zip_path} does not exist.")

    except zipfile.BadZipFile:
        print(f"Error: The file {zip_path} is not a ZIP file or it is corrupted.")

    except Exception as e:
        print(f"An unexcepted error occurred: {e}")

In [4]:
# Path for the Zip file and Path for the Directory
zip_path = "Data/datasets/flickr8k.zip"
extract_to_folder = "Data"

In [5]:
# Extract the ZIP file
extract_zip(zip_path, extract_to_folder)

ZIP file extracted successfully to Data.
