Skip to content

Image captioning and poetry generation using CNN/LSTM/Transformers and GPT3

Notifications You must be signed in to change notification settings

CMaxK/robo_romeo

Repository files navigation

Robo-Romeo

image

Project Description - can AI be creative?

Project exploring the creative abilities of AI: generating captions from images and turning the captions into romantic poetry.

Solution structure

  • Utilised CNN model (EfficientNetB0) to encode images into vectors and added embedding layer to tokenize the captions corresponding with images.
  • Established the LSTM model and trained it using Google Cloud Platform (GCP) Vertex AI to predict the next word of sequences and output whole sentences.
  • Applied Object-Oriented Programming (OOP) to design the batch for training dataset.
  • Used +118k images and +500k captions to train model.
  • Built up the scoring function which using doc2vec to transfer sentences into vectors and calculate the cosine similarities to evaluate the performance of image captioning.
  • Imported openai's GPT API to output the beautiful poetry according to information gathered from images.
  • Developed a website using Streamlit to present both poetry and robot voices.
  • Used Text-to-speech API (Uberduck.io) to provide audio output for poetry

Bonus - Attention layer

Custom coded layer added to the model to introduce attention mechanism in order to improve the descriptive accuracy of captioning.

Screenshot 2022-06-21 at 17 34 46

Screenshot 2022-06-21 at 17 35 26

Datasets used

Screenshot 2022-06-22 at 16 44 04

ImageNet - image database designed for use in computer vision research

Output predictions

Screenshot 2022-06-22 at 16 50 16

Performance metrics

  • Doc2vec for transfering sentences to vectors

  • Cosine similarities as scores

Screenshot 2022-06-22 at 16 13 51

Final product

Our Robo-Romeo's Output

Screenshot 2022-06-21 at 17 24 45

Screenshot 2022-06-21 at 17 44 47

About

Image captioning and poetry generation using CNN/LSTM/Transformers and GPT3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages