Image Caption Generator using CNN-RNN with Xception Model

Overview

This project implements an image caption generator using a Convolutional Neural Network (CNN) for feature extraction and a Recurrent Neural Network (RNN) with the Xception model as the backbone. It generates descriptive captions for images, allowing computers to understand and describe visual content.

Installation

Clone this repository:

git clone https://github.com/Basim03/Image-caption-generator.git
cd image-caption-generator

Install the required dependencies:
```
pip install -r requirements.txt
```
Download and preprocess the dataset (See Dataset section).

Usage

Training

To train the image caption generator, use the following command:

python train.py

This will train the model using the specified dataset and save the trained model weights.

Inference

To generate captions for images using the trained model, use the following command:

python generate_caption.py --image_path path/to/image.jpg

Replace path/to/image.jpg with the path to the image for which you want to generate a caption.

Dataset

For the image caption generator, we will be using the Flickr_8K dataset. There are also other big datasets like Flickr_30K and MSCOCO dataset but it can take weeks just to train the network so we will be using a small Flickr8k dataset. The advantage of a huge dataset is that we can build better models.

Thanks to Jason Brownlee for providing a direct link to download the dataset (Size: 1GB).

-->Flicker8k_Dataset

-->Flickr_8k_text

The Flickr_8k_text folder contains file Flickr8k.token which is the main file of our dataset that contains image name and their respective captions separated by newline(“\n”).

Training

The model architecture is based on the Xception model for feature extraction and an RNN (LSTM or GRU) for generating captions.
You can configure training hyperparameters in config.py.
Training data is loaded and preprocessed using the data pipeline defined in data_loader.py.
The model is trained using the train.py script.

Inference

Inference is performed using the generate_caption.py script.
Provide the path to an image, and the model will generate a descriptive caption for it.
You can modify the model architecture and weights as needed for inference.

Results

This is the result of our model it can generate captions for the image accuractely Sometimes it may ve incorrect but we can tamper with the hyper parameters to get better accuracy.

Contributing

Contributions to this project are welcome. To contribute, follow these steps:

Fork the repository.
Create a new branch for your feature or bug fix: git checkout -b feature-name
Make your changes and commit them: git commit -m "Description of your changes"
Push your branch to your forked repository: git push origin feature-name
Create a pull request on the main repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Feel free to customize this README template to include specific details about your project and provide more information about the model architecture, training process, and results. A well-documented README will help others understand and use your image caption generator project.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
static		static
templates		templates
README.md		README.md
descriptions.txt		descriptions.txt
download.jpeg		download.jpeg
model.ipynb		model.ipynb
preprocessing.ipynb		preprocessing.ipynb
requirements.txt		requirements.txt
testing_model.py		testing_model.py
tokenizer.p		tokenizer.p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Caption Generator using CNN-RNN with Xception Model

Overview

Table of Contents

Installation

Usage

Training

Inference

Dataset

Training

Inference

Results

Contributing

License

About

Releases

Packages

Languages

Basim03/Image-caption-generator

Folders and files

Latest commit

History

Repository files navigation

Image Caption Generator using CNN-RNN with Xception Model

Overview

Table of Contents

Installation

Usage

Training

Inference

Dataset

Training

Inference

Results

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages