<a href="https://colab.research.google.com/github/MathurUtkarsh/Image_Caption_Generator/blob/main/Image_Caption_Generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Installing Required Libraries

In [4]:
# Install Required libraries
!pip install transformers
!pip install openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [5]:
from google.colab import drive
from transformers import AutoProcessor, AutoModelForCausalLM
import os
from PIL import Image, ImageEnhance
import openai

In [6]:
# Mount Google Drive
drive.mount('/content/drive')

Mounted at /content/drive


##Model Import from HuggingFace

In [7]:
# Set up the model and processor
processor = AutoProcessor.from_pretrained("microsoft/git-base-coco")
model = AutoModelForCausalLM.from_pretrained("microsoft/git-base-coco")

Downloading (…)rocessor_config.json:   0%|          | 0.00/503 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/453 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/2.82k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/707M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/141 [00:00<?, ?B/s]

##Getting Test Images stored in Google Drive.

In [8]:
# Specify the path to the images folder in the Google Drive
drive_path = '/content/drive/MyDrive/LISTED_ASSIGNMENT/Test_Images'

In [9]:
# List all image files in the folder
image_files = os.listdir(drive_path)
image_files = sorted(image_files)
image_files

['Image1.png', 'Image2.png', 'Image3.png']

In [10]:
# List to store captions
text_descriptions = []

## Text Description of image generated through my model.

In [11]:
def generate_image_captions(image_files):
    text_descriptions = []
    for image_file in image_files:
        image_path = os.path.join(drive_path, image_file)
        image = Image.open(image_path)

        # Enhance the image
        enhanced_image = ImageEnhance.Sharpness(image).enhance(2.0)

        # Convert the enhanced image to pixel values
        pixel_values = processor(images=enhanced_image, return_tensors="pt").pixel_values

        # Generate captions using the model
        generated_ids = model.generate(pixel_values=pixel_values, max_length=50, num_return_sequences=1)
        generated_captions = processor.batch_decode(generated_ids, skip_special_tokens=True)

        # Append only the caption string to the text_descriptions list
        text_descriptions.append(generated_captions[0])

    return text_descriptions

##Image Text

In [12]:
text_descriptions = generate_image_captions(image_files)
# Print the generated text descriptions
text_descriptions

['football player has been a key player for football team',
 'two horses standing in a field with a cloudy sky in the background',
 'we are all about our new logo.']

##Above Text Description passed to Open AI API.
To get  catchy, exciting, innovative, captivating, creative and
engaging caption instead of just a description of the picture.

In [13]:
# Set up OpenAI API credentials
openai.api_key = "YOUR-OPEN-AI-KEY"

## My Custom Prompt to Chat Completion Model for Fine-Tuning and Optimization

In [14]:
def get_final_caption(text, num_captions=1):
    # Use OpenAI's ChatCompletion API to generate final captions
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "user", "content": f"You are a social media manager expert who is an expert in writing viral social media captions with a minimum of 10 words and a maximum of 20 words. Please use this vague description of my image: {text}, and make sure that the caption is relevant and compelling."
            }
        ],
        temperature=0.7,
        n=num_captions
    )

    return response

##Image Captions Generated

In [15]:
# Set the number of captions we want to get from the model
num_captions = [2, 2, 2]

# Generate captions for each description
caption_objects = [get_final_caption(description, num) for description, num in zip(text_descriptions, num_captions)]

# Print the captions
for i, caption_object in enumerate(caption_objects):
    print(f'Image {i+1}:')
    choices = caption_object['choices']
    for j, choice in enumerate(choices):
        caption = choice['message']['content']
        print(f"Caption {j+1}: {caption}")
    print()

Image 1:
Caption 1: "Game after game, he's proven himself as the driving force behind our team's success. 🏈 #football #keyplayer #teamspirit"
Caption 2: "From the field to our hearts, this football player has proven to be an unbeatable asset to our team. #gamechanger #teamspirit 🏈"

Image 2:
Caption 1: "Nature's beauty at its finest - two majestic horses basking in the glory of a cloudy sky."
Caption 2: "Nature's symphony - Two majestic horses standing tall amidst the serene fields, as the cloudy sky sets the perfect background."

Image 3:
Caption 1: "Say hello to our brand new look! Our logo is the perfect representation of who we are and what we stand for. Exciting things are on the horizon!"
Caption 2: "New logo, new vibes. We're excited to share our fresh look with you!"

