Skip to content

LuongTuanAnh163002/Image_captioning

Repository files navigation

Image Captioning using Seq2seq model with attention

Generic badge PyTorch - Version Python - Version

Table of Contents
  1. About The Project
  2. Project Structure
  3. Data Preparation
  4. How to run repository
  5. Try with google colab
  6. Conclusion
  7. License
  8. Acknowledgements

In this project we will build model for Image captioning. The target of project is from the image, we can describe the image with short script. The model we use in this project is Seq2seq with Attention

The project test on python version 3.10.12

Result we want to show

Seq2seq model with RNN

Attention

Image_captioning
  │   train.py                      # Train script
  │   detect.py                     # Detect script inference

├───models │ model.py #Define Seq2seq model structure │ ├───data │ Flicks.yaml #Config data Flicks.yaml │ └───utils │ datasets.py #Processing datasets │ general.py # Various helper functions

You can dowload dataset here or through script in next part

flickr8k.zip
Flickr8k
└───datasets
    ├───Images
      ├───file_name.jpg
      ├───..............       
    ├───captions.txt

1.For training

+Step1: Install virtual environment, package

  conda create --name image_caption python=3.10.12
  git clone https://github.com/LuongTuanAnh163002/Image_captioning.git
  cd Image_captioning
  conda activate image_caption
  pip install -r requirements.txt
  

+Step2: Dowload dataset

  #for ubuntu/linux
  pip install gdown
  gdown "1P-32Vfy3-s8gaAxbLqTbjLAWlKDGzbTy&confirm=t"
  d="./flickr8k/"
  mkdir -p $d
  unzip -q flickr8k.zip -d $d
  rm flickr8k.zip
  \
  #for window
  pip install gdown
  gdown "1P-32Vfy3-s8gaAxbLqTbjLAWlKDGzbTy&confirm=t"
  tar -xf flickr8k.zip
  del flickr8k.zip
  

+Step3: Go to "data" folder then modify path of dataset to your path dataset

+Step4: Run the command below to training for pretrain

python train.py --data data/Flicks.yaml --epochs 25 --batch_size 256 --device 0

After you run and done training, all results save in runs/train/exp/..., folder runs automatic create after training done

2.For detect with your model

python detect.py --source file_name.jpg --weight runs/train/exp

3.For detect with my model

+Step1: Dowload my model with script below or you can dowload here

weight.zip
  #for ubuntu/linux
  curl -L -o weight_image_captioning.zip https://github.com/LuongTuanAnh163002/Image_captioning/releases/download/v1.0.0/weight_image_captioning.zip
  d="./weight_img_caption/"
  mkdir -p $d
  unzip -q weight_image_captioning.zip -d $d
  rm weight_image_captioning.zip
  \
  #for window
  pip install gdown
  gdown 15awWEiar47LKqHn9D4A5B_keWuxZlGTM
  tar -xf weight_image_captioning.zip
  del weight_image_captioning.zip
  

+Step2: Dowload image example

curl -L -o test_predict.jpg 'https://drive.google.com/uc?id=1PWU1tw53Rv3J-T9i0OQBGFKLbMF4rg9J&confirm=t'

+Step3: Detect

python detect.py --source test_predict.jpg --weight weight_img_caption/exp

1.For training and detect in Flickr8k dataset

Open In Colab

We build complete image captioning project but we have some disadvantage:

Disadvantage

  • We meet some problem when training with CPU, so that if you running project without no GPU, you meet some error, we will fix this bug in near future
  • Cannot train with multiple GPUs for acceleration
  • Only jpg files images are supported during training, in the future we will improve to support more file types images.
  • Haven't exported model to onnx or tensorRT yet. In the near future we will update the conversion code for onnx and tensorRT.
  • Model only experiment in English language, in the future we will experiment in other language, especials Vienamese language
  • Not metric to evaluate, we are building some code to evaluate model base BLEU score metric and will update in near future

See LICENSE for more information.