ImageCaptioning

About this project

This is the final assignment of the Deep Learning course of the MSc in AI.

In this project, you can train, evaluate and test an end-to-end Deep Learning Image Captioning model that consists of a CNN Encoder and an RNN Decoder.

Before you get started, make sure that you have downloaded and placed the data in the correct position. See this section.

The train.py holds the implementation for a model's training process.
You can run demo.py to launch in your browser a Gradio app that will serve the best model. There you can upload images and see the generated captions from the model.
The model.py contains the models' architecture.
The plotter.py and utils.py contain some helper functions.
The eda.ipynb is a notebook for the captions' EDA.
With the analyse_test_set.ipynb we can get the top K images of the test set, with the highest and lowest BLEU scores.

How to train a model

Create a new Python environment. Note: It is suggested to use a Python version >= 3.9
pip install -r requirements.txt to install the required packages.
Use the config.json to change all the important parameters of the training process.
Run the train.py script to train, evaluate and save a model.

How to download and set up the data

Method 1:

You can download the dataset directly from Kaggle here.

Method 2:

Use the Kaggle API:

pip install kaggle
export KAGGLE_USERNAME=username
export KAGGLE_KEY=xxxxxxxxxxxxxx
kaggle datasets download adityajn105/flickr8k
unzip flickr8k.zip -d data

Create a folder named data and place the contents of the zip file inside. The structure of your project should look like this:

ImageCaptioning
 ┣ data
 ┃ ┣ Images
 ┃ ┃ ┣ 1000268201_693b08cb0e.jpg
 ┃ ┃ ┣ 1001773457_577c3a7d70.jpg
 ┃ ┃ ┣ 431018958_84b2beebff.jpg
 ┃ ┃ ┣ 431282339_0aa60dd78e.jpg
 ┃ ┃ ┣ ....
 ┃ ┗ captions.txt
 ┣ final_assignment_data
 ┣ inference
 ┣ ....
 ┣ train.py
 ┗ utils.py

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
final_assignment_data		final_assignment_data
inference/best_model		inference/best_model
.gitattributes		.gitattributes
.gitignore		.gitignore
ImageCaptioning_presentation.pptx		ImageCaptioning_presentation.pptx
README.md		README.md
analyse_test_set.ipynb		analyse_test_set.ipynb
config.json		config.json
demo.py		demo.py
eda.ipynb		eda.ipynb
image_captioning.drawio		image_captioning.drawio
losses.py		losses.py
models.py		models.py
plotter.py		plotter.py
requirements.txt		requirements.txt
technical_report.pdf		technical_report.pdf
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImageCaptioning

About this project

How to train a model

How to download and set up the data

Method 1:

Method 2:

About

Releases

Packages

Languages

VasilisStavrianoudakis/ImageCaptioning

Folders and files

Latest commit

History

Repository files navigation

ImageCaptioning

About this project

How to train a model

How to download and set up the data

Method 1:

Method 2:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages