Skip to content

MJAHMADEE/Image_Captioning

Repository files navigation

Image Captioning with Neural Networks 🖼️🤖

Python PyTorch License

Image Captioning with Neural Networks is a deep learning project that combines Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to generate captions for images automatically. This implementation utilizes a pre-trained ResNet model for image feature extraction and an LSTM network for generating textual descriptions of the images.

Features 🌟

  • Utilizes a pre-trained ResNet-18 model for efficient image feature extraction.
  • Employs an LSTM network for generating descriptive captions based on image features.
  • Supports training with and without fine-tuning of the ResNet model.
  • Includes functionality for both training and testing the model with a custom dataset.
  • Visualizes training loss and sample predictions to assess model performance.

Setup and Installation 🛠️

  1. Clone the repository from GitHub.
  2. Navigate to the project directory.
  3. Install the required dependencies listed in the requirements.txt file.

Dataset 📁

The model is trained and tested on the Flickr8k dataset, which comprises 8,000 images each paired with five different captions. For the purpose of this project, the dataset is pre-processed to align with the model's requirements.

Training the Model 🚀

Training the model involves executing the training script, which will start the training process and save the model weights periodically.

Testing the Model 🧪

After training, the model's performance can be evaluated by executing the testing script, which generates captions for the images in the test dataset.

Results and Evaluation 📊

The model's performance can be evaluated based on the captions generated for the test images. A qualitative assessment involves comparing the predicted captions against the ground truth captions.

License 📜

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements 🙌

  • Thanks to the creators of the Flickr8k dataset for providing the resources necessary for training and testing the model.
  • PyTorch documentation for providing comprehensive guides and tutorials.

Notebook and Copyright

Open In Colab

@misc{MJImageCaptioning2023, author = {Mohammad Javad (MJ) Ahmadi}, title = {Image Captioning}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/MJAHMADEE/Image_Captioning}} }


For more information, please refer to the official repository.