Image-Captioning-via-YOLOv5-EncoderDecoderwithAttention

Archived : This code was just a fun project! Neither it was propely tuned nor it is properly maintained! No Updates/correction expected! Focusing on other areas, I am not a vision expert. This code runs properly for most of the people, please check if images are getting populated properly for you if there is an error!**

Use original Flickr8K dataset. - https://www.kaggle.com/datasets/adityajn105/flickr8k

PUT 'Images' directory and 'captions.txt' in the same directory as in root of this repo.

Attempt for Image Captioning using combination of object detection via YOLOv5 and Encoder Decoder LSTM model on Flickr8K dataset.

Run to make object crops via YOLOv5

python detect_object.py

Run to train - This just takes the Resnet embeddngs of object cropped images detected not any kind of text from YOLO labeller.

python train.py True

To evaluate on validation data

python train.py False

Sample predictions -

references -  [This is a black dog splashing in the water, A black lab with tags frolicks in the water ,A black dog running in the surf,The black dog runs through the water]

prediction- [['<SOS>'], ['a'], ['black'], ['dog'], ['is'], ['a'], ['a'], ['water'], ['.'], ['<EOS>']]

references -  [A black dog and a spotted dog are fighting, A black dog and a tri-colored dog playing with each other on the road,
A black dog and a white dog with brown spots are staring at each other in the street,Two dogs of different breeds looking at each other on the road]

prediction- [['<SOS>'], ['a'], ['black'], ['and'], ['white'], ['dog'], ['is'], ['running'], ['through'], ['a'], ['.'], ['<EOS>']]

Mean BLEU-4 score on validation data is quite low. Suggested Improvements : Use Adam and shuffling of data. Maybe minibatching. Increasing number of datapoints combining other datasets since 8k is quite low smaple size (No of parameters >> no of datapoints, not ideal for neural nets).

Citation (Flickr8K Dataset)

Hodosh, Micah, Peter Young, and Julia Hockenmaier. "Framing image description as a ranking task: Data, models and evaluation metrics." Journal of Artificial Intelligence Research 47 (2013): 853-899.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
test_images		test_images
LICENSE		LICENSE
README.md		README.md
captions.txt		captions.txt
detect_object.py		detect_object.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_images

test_images

LICENSE

LICENSE

README.md

README.md

captions.txt

captions.txt

detect_object.py

detect_object.py

train.py

train.py

Repository files navigation

Image-Captioning-via-YOLOv5-EncoderDecoderwithAttention

Citation (Flickr8K Dataset)

About

Releases

Packages

Languages

License

akjayant/Image-Captioning-via-YOLOv5-EncoderDecoderwithAttention

Folders and files

Latest commit

History

Repository files navigation

Image-Captioning-via-YOLOv5-EncoderDecoderwithAttention

Citation (Flickr8K Dataset)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages