Table of Contents
In this project we will build model for Image captioning. The target of project is from the image, we can describe the image with short script. The model we use in this project is Seq2seq with Attention
Image_captioning │ train.py # Train script │ detect.py # Detect script inference├───models │ model.py #Define Seq2seq model structure │ ├───data │ Flicks.yaml #Config data Flicks.yaml │ └───utils │ datasets.py #Processing datasets │ general.py # Various helper functions
You can dowload dataset here or through script in next part
flickr8k.zip
Flickr8k └───datasets ├───Images ├───file_name.jpg ├───.............. ├───captions.txt
+Step1: Install virtual environment, package
conda create --name image_caption python=3.10.12 git clone https://github.com/LuongTuanAnh163002/Image_captioning.git cd Image_captioning conda activate image_caption pip install -r requirements.txt
+Step2: Dowload dataset
#for ubuntu/linux pip install gdown gdown "1P-32Vfy3-s8gaAxbLqTbjLAWlKDGzbTy&confirm=t" d="./flickr8k/" mkdir -p $d unzip -q flickr8k.zip -d $d rm flickr8k.zip \ #for window pip install gdown gdown "1P-32Vfy3-s8gaAxbLqTbjLAWlKDGzbTy&confirm=t" tar -xf flickr8k.zip del flickr8k.zip
+Step3: Go to "data" folder then modify path of dataset to your path dataset
+Step4: Run the command below to training for pretrain
python train.py --data data/Flicks.yaml --epochs 25 --batch_size 256 --device 0
After you run and done training, all results save in runs/train/exp/..., folder runs automatic create after training done
python detect.py --source file_name.jpg --weight runs/train/exp
+Step1: Dowload my model with script below or you can dowload here
weight.zip
#for ubuntu/linux curl -L -o weight_image_captioning.zip https://github.com/LuongTuanAnh163002/Image_captioning/releases/download/v1.0.0/weight_image_captioning.zip d="./weight_img_caption/" mkdir -p $d unzip -q weight_image_captioning.zip -d $d rm weight_image_captioning.zip \ #for window pip install gdown gdown 15awWEiar47LKqHn9D4A5B_keWuxZlGTM tar -xf weight_image_captioning.zip del weight_image_captioning.zip
+Step2: Dowload image example
curl -L -o test_predict.jpg 'https://drive.google.com/uc?id=1PWU1tw53Rv3J-T9i0OQBGFKLbMF4rg9J&confirm=t'
+Step3: Detect
python detect.py --source test_predict.jpg --weight weight_img_caption/exp
We build complete image captioning project but we have some disadvantage:
Disadvantage
- We meet some problem when training with CPU, so that if you running project without no GPU, you meet some error, we will fix this bug in near future
- Cannot train with multiple GPUs for acceleration
- Only jpg files images are supported during training, in the future we will improve to support more file types images.
- Haven't exported model to onnx or tensorRT yet. In the near future we will update the conversion code for onnx and tensorRT.
- Model only experiment in English language, in the future we will experiment in other language, especials Vienamese language
- Not metric to evaluate, we are building some code to evaluate model base BLEU score metric and will update in near future
See LICENSE
for more information.
- Yolov7
- seq2seq-with-attention
Our work will not be complete without the wonderful work of the following authors: