This project implements an image caption generator using a Convolutional Neural Network (CNN) for feature extraction and a Recurrent Neural Network (RNN) with the Xception model as the backbone. It generates descriptive captions for images, allowing computers to understand and describe visual content.
-
Clone this repository:
git clone https://github.com/Basim03/Image-caption-generator.git cd image-caption-generator
-
Install the required dependencies:
pip install -r requirements.txt
-
Download and preprocess the dataset (See Dataset section).
To train the image caption generator, use the following command:
python train.py
This will train the model using the specified dataset and save the trained model weights.
To generate captions for images using the trained model, use the following command:
python generate_caption.py --image_path path/to/image.jpg
Replace path/to/image.jpg
with the path to the image for which you want to generate a caption.
For the image caption generator, we will be using the Flickr_8K dataset. There are also other big datasets like Flickr_30K and MSCOCO dataset but it can take weeks just to train the network so we will be using a small Flickr8k dataset. The advantage of a huge dataset is that we can build better models.
Thanks to Jason Brownlee for providing a direct link to download the dataset (Size: 1GB).
The Flickr_8k_text folder contains file Flickr8k.token which is the main file of our dataset that contains image name and their respective captions separated by newline(“\n”).
- The model architecture is based on the Xception model for feature extraction and an RNN (LSTM or GRU) for generating captions.
- You can configure training hyperparameters in
config.py
. - Training data is loaded and preprocessed using the data pipeline defined in
data_loader.py
. - The model is trained using the
train.py
script.
- Inference is performed using the
generate_caption.py
script. - Provide the path to an image, and the model will generate a descriptive caption for it.
- You can modify the model architecture and weights as needed for inference.
This is the result of our model it can generate captions for the image accuractely Sometimes it may ve incorrect but we can tamper with the hyper parameters to get better accuracy.
Contributions to this project are welcome. To contribute, follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix:
git checkout -b feature-name
- Make your changes and commit them:
git commit -m "Description of your changes"
- Push your branch to your forked repository:
git push origin feature-name
- Create a pull request on the main repository.
This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to customize this README template to include specific details about your project and provide more information about the model architecture, training process, and results. A well-documented README will help others understand and use your image caption generator project.