Image captioning is the process of generating a natural language description of an image. It is a task in the field of computer vision and natural language processing. The goal of image captioning is to generate a coherent and fluent sentence that accurately describes the image content.
An image captioning system typically consists of two main components:
This project uses streamlit to demo the result of EfficentNet + Transformer (Trained with 11 epoches) and connect with PostgreSQL to save the information about the picture and some metadata to a database.
So first you will need to install Anaconda, PostgreSQL and Python 3. Depend on your OS, there maybe many different ways to install it. In this project I use Ubuntu OS to install all of them. So I will put some video tutorial to install them here.
PostgreSQL + pgAdminIII: https://www.youtube.com/watch?v=-LwI4HMR_Eg
Python 3: https://www.youtube.com/watch?v=z3Hdewxuuoo
Anaconda: https://www.youtube.com/watch?v=5kuqIFDouXY
After completed install these things, you can do the below step.
git clone https://github.com/TomatoFT/Image-Captioning-with-Transformer
cd Image-Captioning-with-Transformer
conda create --name image-captioning
conda activate image-captioning
conda install -c anaconda pip
pip install -r requirements.txt
Read this document from Streamlit: https://docs.streamlit.io/knowledge-base/tutorials/databases/postgresql#add-username-and-password-to-your-local-app-secrets. Then go to pgAminIII, press Add the connection to server and fill this form.
In .streamlit/secrets.toml file. Change these information to YOUR PostgreSQL information.
[postgres]
host="localhost"
port=5432
user="postgres"
password="12345"
database="postgres"
streamlit run web.py
demo_image_captioning.mp4
conda deactivate
Training model file is here: https://colab.research.google.com/drive/1K2ZFaAUNIYV0L92XEsV56HSYaXi4DMDh?usp=sharing. I use this file to train model and save its weights to local computer to deploy in Streamlit (You can find it at model/model_IC.h5).
The model tutorial: https://keras.io/examples/vision/image_captioning/
You can read the submitted report to understand the process I do this project.
Feel free to clone my code to use.