<h1 align="center"><font color="yellow">L2: Image captioning app 🖼️📝</font></h1>

<font color="yellow">Data Scientist.: Dr.Eddy Giusepe Chirinos Isidro</font>

Carregue sua chave `API HF` e bibliotecas `Python` relevantes.

In [1]:
import os
import io
import IPython.display
from PIL import Image
import base64 

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
#hf_api_key = os.environ['HF_API_KEY']
Eddy_API_KEY_HuggingFace = os.environ["HUGGINGFACEHUB_API_TOKEN"]

In [None]:
# Helper functions
import requests, json

# Image-to-text endpoint
def get_completion(inputs, parameters=None, ENDPOINT_URL=os.environ['HF_API_ITT_BASE']): 
    headers = {
      "Authorization": f"Bearer {Eddy_API_KEY_HuggingFace}",
      "Content-Type": "application/json"
    }

    data = { "inputs": inputs }
    
    if parameters is not None:
        data.update({"parameters": parameters})
    
    response = requests.request("POST",
                                ENDPOINT_URL,
                                headers=headers,
                                data=json.dumps(data))
    return json.loads(response.content.decode("utf-8"))


# Construindo um aplicativo de legenda de imagem

Aqui, usaremos um ponto de extremidade de inferência para `Salesforce/blip-image-captioning-base`, um modelo de legenda de 14 milhões de parâmetros.


O código seria muito semelhante se você o estivesse executando localmente em vez de uma API. Você pode verificar a página de documentação do [Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines).

In [4]:
from transformers import pipeline

get_completion = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")

def summarize(input):
    output = get_completion(input)
    return output[0]['generated_text']


Downloading (…)okenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

As imagens gratuitas estão disponíveis em: [https://free-images.com/](https://free-images.com/)

In [5]:
image_url = "https://free-images.com/sm/9596/dog_animal_greyhound_983023.jpg"

display(IPython.display.Image(url=image_url))
get_completion(image_url)





[{'generated_text': 'a dog wearing a santa hat and a red scarf'}]

# Captioning with `gr.Interface()`

In [None]:
import gradio as gr 

def image_to_base64_str(pil_image):
    byte_arr = io.BytesIO()
    pil_image.save(byte_arr, format='PNG')
    byte_arr = byte_arr.getvalue()
    return str(base64.b64encode(byte_arr).decode('utf-8'))

def captioner(image):
    base64_image = image_to_base64_str(image)
    result = get_completion(base64_image)
    return result[0]['generated_text']

gr.close_all()
demo = gr.Interface(fn=captioner,
                    inputs=[gr.Image(label="Upload image", type="pil")],
                    outputs=[gr.Textbox(label="Caption")],
                    title="Image Captioning with BLIP",
                    description="Caption any image using the BLIP model",
                    allow_flagging="never",
                    examples=["christmas_dog.jpeg", "bird_flight.jpeg", "cow.jpeg"])

#demo.launch(share=True, server_port=int(os.environ['PORT1'])) 

demo.launch(share=True)



In [13]:
gr.close_all()

Closing server running on port: 7860
Closing server running on port: 7860
Closing server running on port: 7860
