# Image captioning app 🎨

## Introduction
This notebook demonstrates how to build an Image Captioning application using Hugging Face and Gradio.

## Load Environment Variables and Libraries
In this section, we load the necessary environment variables and import the required libraries.

### Setup and Installation

In [None]:
%pip install python-dotenv gradio

### Loading API Keys and Libraries

In [None]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

import os # Provides a way of using operating system-dependent functionality
import io # Provides core tools for working with streams of data
import IPython.display # Used for displaying rich content (e.g., images, HTML) in Jupyter Notebooks
from PIL import Image # Python Imaging Library for opening, manipulating, and saving image files
import base64 # Encodes and decodes data in base64 format
import requests
import json
import torch
import torch.nn as nn
import warnings
import gradio as gr

# Ignore specific UserWarnings related to max_length in transformers
warnings.filterwarnings("ignore", message=".*Using the model-agnostic default `max_length`.*")

# Load environment variables from .env file
hf_api_key = os.getenv('HF_API_KEY')
#hf_api_key = os.environ['HF_API_KEY']
endpoint_url = os.getenv('HF_API_ITT_BASE')
port1 = int(os.getenv('PORT1', 7860))

# Uncomment the following line to print HF API Key, Endpoint URL and Port1
#print("HF API Key:", hf_api_key)
#print("Endpoint URL:", endpoint_url)
#print("Port:", port1)

The HF_API_ITT_BASE for the image-to-text endpoint is related to the choosen model from image-to-text section that is available on the Hugging Face website's. Here, the selected model is Salesforce/blip-image-captioning-large, which is currently widely used. The corresponding API URL is:"https://api-inference.huggingface.co/models/Salesforce/blip-image-captioning-large"

### Helper Functions

In [3]:
#Image-to-text endpoint
def get_completion(inputs, parameters=None, endpoint_url=endpoint_url):
    headers = {
        "Authorization": f"Bearer {hf_api_key}",
        "Content-Type": "application/json"
    }
    data = {"inputs": inputs}
    if parameters is not None:
        data.update({"parameters": parameters})
    response = requests.post(endpoint_url, headers=headers, data=json.dumps(data))
    return json.loads(response.content.decode("utf-8"))

def get_generation(model, processor, image, dtype):
    inputs = processor(image, return_tensors="pt").to(dtype)
    out = model.generate(**inputs)
    return processor.decode(out[0], skip_special_tokens=True)

def load_image(img_url):
    image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
    return image

### Building the Image Captioning App

In [None]:
image_url = 'https://free-images.com/lg/9e46/white_bengal_tiger_tiger_0.jpg'
image = load_image(image_url)
display(image.resize((500, 350)))

caption = get_completion(image_url)
print(caption)

iface = gr.Interface(
    fn=caption_image, 
    inputs=gr.Image(type="filepath"),  # Corrected type
    outputs="text", 
    title="Image Captioning"
)
iface.launch(server_port=port1)


### Gradio Interface

In [None]:
import gradio as gr
import requests
from PIL import Image
from io import BytesIO

def caption_image(image_url):
    # Download the image from the URL
    response = requests.get(image_url)
    response.raise_for_status()  # Ensure the request was successful
    image = Image.open(BytesIO(response.content))  # Load image with PIL
    
    # Call your captioning function here (replace `get_completion` with the actual implementation)
    caption = get_completion(image)
    return caption

iface = gr.Interface(
    fn=caption_image,
    inputs=gr.Textbox(label="Image URL"),  # Input as a URL
    outputs="text",
    title="Image Captioning"
)

iface.launch()


def caption_image(image):
    image_url = image.name
    caption = get_completion(image_url)
    return caption

iface = gr.Interface(fn=caption_image, inputs=gr.Image(type="file"), outputs="text", title="Image Captioning")
iface.launch(server_port=port1)

### Gradio Interface

In [None]:
#Load Images 

#Image1
image1 = Image.open('./Building_Generative_AI_Applications_with_Gradio/Image_Captioning_app/white_bengal_tiger.jpg')
image1

In [None]:
#Image2
image2 = Image.open('./Building_Generative_AI_Applications_with_Gradio/Image_Captioning_app/cow.jpeg')
image2

In [None]:
#Image3
image3 = Image.open('./Building_Generative_AI_Applications_with_Gradio/Image_Captioning_app/bird_flight.jpeg')
image3

In [None]:
def image_to_base64_str(pil_image):
    byte_arr = io.BytesIO()
    pil_image.save(byte_arr, format='PNG')
    byte_arr = byte_arr.getvalue()
    return str(base64.b64encode(byte_arr).decode('utf-8'))

def captioner(image):
    base64_image = image_to_base64_str(image)
    result = get_completion(base64_image)
    return result[0]['generated_text']

gr.close_all()
demo = gr.Interface(fn=captioner,
                    inputs=[gr.Image(label="Upload image", type="pil")],
                    outputs=[gr.Textbox(label="Caption")],
                    title="Image Captioning with BLIP",
                    description="Caption any image using the BLIP model",
                    allow_flagging="never",
                    examples=[image1, image2, image3])                  

demo.launch(share=True, inline=True, server_port=int(os.environ['PORT1']))


In [None]:
gr.close_all()

## Building an image captioning app 

Here we'll be using an [Inference Endpoint](https://huggingface.co/inference-endpoints) for `Salesforce/blip-image-captioning-base` a 14M parameter captioning model.

The free images are available on: https://free-images.com/

## Captioning with `gr.Interface()`

#### gr.Image()
- The `type` parameter is the format that the `fn` function expects to receive as its input.  If `type` is `numpy` or `pil`, `gr.Image()` will convert the uploaded file to this format before sending it to the `fn` function.
- If `type` is `filepath`, `gr.Image()` will temporarily store the image and provide a string path to that image location as input to the `fn` function.
- gr.Image requires the installation of gradio and the set up of a custom PORT1 environment variable. 

## Conclusion
This notebook demonstrated how to build an Image Captioning application using Hugging Face and Gradio. For more information, refer to the Hugging Face Documentation and Gradio Documentation.