Skip to content

Basim03/Image-caption-generator

Repository files navigation

Image Caption Generator using CNN-RNN with Xception Model

Overview

This project implements an image caption generator using a Convolutional Neural Network (CNN) for feature extraction and a Recurrent Neural Network (RNN) with the Xception model as the backbone. It generates descriptive captions for images, allowing computers to understand and describe visual content.

Table of Contents

  1. Installation
  2. Usage
  3. Dataset
  4. Training
  5. Inference
  6. Results
  7. Contributing
  8. License

Installation

  1. Clone this repository:

    git clone https://github.com/Basim03/Image-caption-generator.git
    cd image-caption-generator
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    
  3. Download and preprocess the dataset (See Dataset section).

Usage

Training

To train the image caption generator, use the following command:

python train.py

This will train the model using the specified dataset and save the trained model weights.

Inference

To generate captions for images using the trained model, use the following command:

python generate_caption.py --image_path path/to/image.jpg

Replace path/to/image.jpg with the path to the image for which you want to generate a caption.

Dataset

For the image caption generator, we will be using the Flickr_8K dataset. There are also other big datasets like Flickr_30K and MSCOCO dataset but it can take weeks just to train the network so we will be using a small Flickr8k dataset. The advantage of a huge dataset is that we can build better models.

Thanks to Jason Brownlee for providing a direct link to download the dataset (Size: 1GB).

-->Flicker8k_Dataset

-->Flickr_8k_text

The Flickr_8k_text folder contains file Flickr8k.token which is the main file of our dataset that contains image name and their respective captions separated by newline(“\n”).

Training

  • The model architecture is based on the Xception model for feature extraction and an RNN (LSTM or GRU) for generating captions.
  • You can configure training hyperparameters in config.py.
  • Training data is loaded and preprocessed using the data pipeline defined in data_loader.py.
  • The model is trained using the train.py script.

Inference

  • Inference is performed using the generate_caption.py script.
  • Provide the path to an image, and the model will generate a descriptive caption for it.
  • You can modify the model architecture and weights as needed for inference.

Results

This is the result of our model image it can generate captions for the image accuractely Sometimes it may ve incorrect but we can tamper with the hyper parameters to get better accuracy.

Contributing

Contributions to this project are welcome. To contribute, follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix: git checkout -b feature-name
  3. Make your changes and commit them: git commit -m "Description of your changes"
  4. Push your branch to your forked repository: git push origin feature-name
  5. Create a pull request on the main repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.


Feel free to customize this README template to include specific details about your project and provide more information about the model architecture, training process, and results. A well-documented README will help others understand and use your image caption generator project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages