An intelligent image captioning system that generates descriptive captions and relevant hashtags for uploaded images using state-of-the-art AI models. The system combines computer vision and natural language processing to create Instagram-style captions.
- Image Captioning: Generates multiple descriptive captions for uploaded images
- Hashtag Generation: Automatically creates relevant hashtags from captions
- Cloud Storage: Stores images in AWS S3 bucket
- Database Integration: Maintains a record of all processed images and their captions
- User-friendly Interface: Simple web interface built with Streamlit
- RESTful API: Backend service built with Flask
- Frontend: Streamlit
- Backend: Flask
- AI Models:
- Vision Encoder-Decoder (ViT-GPT2) for image captioning
- Transformers pipeline for hashtag generation
- Database: MySQL
- Cloud Storage: AWS S3
- Additional Libraries:
- NLTK for text processing
- PIL for image handling
- Boto3 for AWS integration
- Python 3.x
- MySQL Server
- AWS Account with S3 bucket
- Required Python packages (see requirements.txt)
- Clone the repository:
git clone <repository-url>
cd image-caption- Install dependencies:
pip install -r requirements.txt- Set up MySQL database:
CREATE DATABASE `image-caption`;
USE `image-caption`;
CREATE TABLE `image_data` (
`image_id` varchar(255) NOT NULL,
`captions` text NOT NULL,
`hashtags` text NOT NULL,
PRIMARY KEY (`image_id`)
);- Configure AWS credentials:
- Create an AWS account if you don't have one
- Create an S3 bucket named 'imagecaptionbucket-1'
- Configure AWS credentials in your environment
- Start the backend server:
cd server
python app.py- Start the frontend client:
cd client
streamlit run index.py- Access the application at
http://localhost:8501
- Open the web interface in your browser
- Click "Choose a file" to select an image
- Click "Upload" to process the image
- View the generated captions and hashtags
image-caption/
├── client/
│ └── index.py # Streamlit frontend
├── server/
│ └── app.py # Flask backend
├── requirements.txt # Python dependencies
└── README.md # This file
- Purpose: Upload and process an image
- Input: Image file
- Output: JSON containing captions and hashtags
- Response Format:
{ "captions": ["caption1", "caption2", ...], "hashtags": "#tag1 #tag2 ..." }
-
Image Captioning Model: ViT-GPT2
- Vision Transformer (ViT) for image encoding
- GPT-2 for text generation
- Generates multiple captions per image
- Maximum caption length: 16 tokens
- Beam search with 7 beams
-
Hashtag Generation:
- Uses text summarization pipeline
- Removes stopwords
- Formats output as Instagram-style hashtags
Feel free to submit issues and enhancement requests!