Image Captioning using Seq2seq model with attention

Table of Contents

About The Project
Project Structure
Data Preparation
How to run repository
Try with google colab
Conclusion
License
Acknowledgements

About The Project

In this project we will build model for Image captioning. The target of project is from the image, we can describe the image with short script. The model we use in this project is Seq2seq with Attention

The project test on python version 3.10.12

Result we want to show

Seq2seq model with RNN

Attention

Project Structure

Image_captioning │ train.py # Train script │ detect.py # Detect script inference

├───models │ model.py #Define Seq2seq model structure │ ├───data │ Flicks.yaml #Config data Flicks.yaml │ └───utils │ datasets.py #Processing datasets │ general.py # Various helper functions

Data Preparation

You can dowload dataset here or through script in next part

flickr8k.zip

Flickr8k
└───datasets
    ├───Images
      ├───file_name.jpg
      ├───..............       
    ├───captions.txt

How to run repository

1.For training

+Step1: Install virtual environment, package

  conda create --name image_caption python=3.10.12
  git clone https://github.com/LuongTuanAnh163002/Image_captioning.git
  cd Image_captioning
  conda activate image_caption
  pip install -r requirements.txt

+Step2: Dowload dataset

  #for ubuntu/linux
  pip install gdown
  gdown "1P-32Vfy3-s8gaAxbLqTbjLAWlKDGzbTy&confirm=t"
  d="./flickr8k/"
  mkdir -p $d
  unzip -q flickr8k.zip -d $d
  rm flickr8k.zip
  \
  #for window
  pip install gdown
  gdown "1P-32Vfy3-s8gaAxbLqTbjLAWlKDGzbTy&confirm=t"
  tar -xf flickr8k.zip
  del flickr8k.zip

+Step3: Go to "data" folder then modify path of dataset to your path dataset

+Step4: Run the command below to training for pretrain

python train.py --data data/Flicks.yaml --epochs 25 --batch_size 256 --device 0

After you run and done training, all results save in runs/train/exp/..., folder runs automatic create after training done

2.For detect with your model

python detect.py --source file_name.jpg --weight runs/train/exp

3.For detect with my model

+Step1: Dowload my model with script below or you can dowload here

weight.zip

  #for ubuntu/linux
  curl -L -o weight_image_captioning.zip https://github.com/LuongTuanAnh163002/Image_captioning/releases/download/v1.0.0/weight_image_captioning.zip
  d="./weight_img_caption/"
  mkdir -p $d
  unzip -q weight_image_captioning.zip -d $d
  rm weight_image_captioning.zip
  \
  #for window
  pip install gdown
  gdown 15awWEiar47LKqHn9D4A5B_keWuxZlGTM
  tar -xf weight_image_captioning.zip
  del weight_image_captioning.zip

+Step2: Dowload image example

curl -L -o test_predict.jpg 'https://drive.google.com/uc?id=1PWU1tw53Rv3J-T9i0OQBGFKLbMF4rg9J&confirm=t'

+Step3: Detect

python detect.py --source test_predict.jpg --weight weight_img_caption/exp

Try with google colab

1.For training and detect in Flickr8k dataset

Conclusion

We build complete image captioning project but we have some disadvantage:

Disadvantage

We meet some problem when training with CPU, so that if you running project without no GPU, you meet some error, we will fix this bug in near future
Cannot train with multiple GPUs for acceleration
Only jpg files images are supported during training, in the future we will improve to support more file types images.
Haven't exported model to onnx or tensorRT yet. In the near future we will update the conversion code for onnx and tensorRT.
Model only experiment in English language, in the future we will experiment in other language, especials Vienamese language
Not metric to evaluate, we are building some code to evaluate model base BLEU score metric and will update in near future

License

See LICENSE for more information.

Acknowledgements

Our work will not be complete without the wonderful work of the following authors:

Yolov7
seq2seq-with-attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning using Seq2seq model with attention

About The Project

The project test on python version 3.10.12

Result we want to show

Seq2seq model with RNN

Attention

Project Structure

Data Preparation

How to run repository

1.For training

2.For detect with your model

3.For detect with my model

Try with google colab

1.For training and detect in Flickr8k dataset

Conclusion

License

Acknowledgements

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
image		image
models		models
script		script
tutorial		tutorial
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
detect.py		detect.py
python_version.txt		python_version.txt
requirements.txt		requirements.txt
train.py		train.py

License

LuongTuanAnh163002/Image_captioning

Folders and files

Latest commit

History

Repository files navigation

Image Captioning using Seq2seq model with attention

The project test on python version 3.10.12

Result we want to show

Seq2seq model with RNN

Attention

1.For training

2.For detect with your model

3.For detect with my model

1.For training and detect in Flickr8k dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Languages