Project exploring the creative abilities of AI: generating captions from images and turning the captions into romantic poetry.
- Utilised CNN model (EfficientNetB0) to encode images into vectors and added embedding layer to tokenize the captions corresponding with images.
- Established the LSTM model and trained it using Google Cloud Platform (GCP) Vertex AI to predict the next word of sequences and output whole sentences.
- Applied Object-Oriented Programming (OOP) to design the batch for training dataset.
- Used +118k images and +500k captions to train model.
- Built up the scoring function which using doc2vec to transfer sentences into vectors and calculate the cosine similarities to evaluate the performance of image captioning.
- Imported openai's GPT API to output the beautiful poetry according to information gathered from images.
- Developed a website using Streamlit to present both poetry and robot voices.
- Used Text-to-speech API (Uberduck.io) to provide audio output for poetry
Custom coded layer added to the model to introduce attention mechanism in order to improve the descriptive accuracy of captioning.
ImageNet - image database designed for use in computer vision research
-
Doc2vec for transfering sentences to vectors
-
Cosine similarities as scores
- Link to Streamlit : https://awesome-github-readme-profile.netlify.app
- Link to Streamlit GitHub : https://github.com/CMaxK/robo_romeo_streamlit
- Link to demo presentation slides : https://docs.google.com/presentation/d/19MzJlfLe1qM_8c3-CEjDwYxT5BYAXQVz09pqFLd45gA/edit#slide=id.g134fb78e839_0_201