# Responses API - chat on your own image

The Responses API provides a structured response format that allows AI to interact with multiple tools while maintaining context across interactions. It supports: 

- Tool calling in one simple API call: Now, developers can seamlessly integrate AI tools, making execution more efficient. 
- Computer use: Use the computer use tool within the Responses API to drive automation and execute software interactions. 
- File search: Interact with enterprise data dynamically and extract relevant information. 
- Function calling: Develop and invoke custom functions to enhance AI capabilities. 
- Chaining responses into conversations: Keep track of interactions by linking responses together using unique response IDs, ensuring continuity in AI-driven dialogues. 
- Enterprise-grade data privacy: Built with Azure’s trusted security and compliance standards, ensuring data protection for organizations
  
> https://azure.microsoft.com/en-us/blog/announcing-the-responses-api-and-computer-using-agent-in-azure-ai-foundry/?msockid=2e39c66c693c66a5151fd200687567d0

In [1]:
import base64
import gradio as gr
import os
import sys

from openai import AzureOpenAI, OpenAI
from datetime import datetime
from dotenv import load_dotenv

In [2]:
sys.version

'3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]'

In [3]:
print(f"Today is {datetime.today().strftime('%d-%b-%Y %H:%M:%S')}")

Today is 18-Apr-2025 07:30:26


In [4]:
load_dotenv("azure.env")

True

In [5]:
def get_client():
    """
    Retrieves the deployment name and initializes the Azure OpenAI client.

    This function fetches the necessary configuration details from environment variables
    and creates an instance of the AzureOpenAI client.

    Returns:
        tuple: A tuple containing the deployment name (str) and the AzureOpenAI client instance.
    """
    deployment = os.environ["AZURE_OPENAI_API_MODEL"]
    
    client = AzureOpenAI(
        api_key=os.environ["AZURE_OPENAI_API_KEY"],
        api_version=os.environ["AZURE_OPENAI_API_VERSION"],
        azure_endpoint=os.environ["AZURE_OPENAI_API_ENDPOINT"])

    return deployment, client

In [6]:
deployment, client = get_client()

previous_response_id = None

In [7]:
def encode_image(image_path):
    """
    Encodes an image file to a base64 string.

    This function reads an image file from the specified path, encodes its content
    to base64, and returns the encoded string.

    Args:
        image_path (str): The path to the image file to be encoded.

    Returns:
        str: The base64 encoded string of the image content.
    """
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

In [8]:
def chat_stream(user_prompt, history, file_path):
    """
    Manages a streaming conversation with the Azure OpenAI client.

    This function initializes the conversation history, processes the user prompt,
    and handles streaming responses from the Azure OpenAI client. It supports
    image uploads by encoding them to base64 and including them in the input payload.

    Args:
        user_prompt (str): The user's input prompt for the conversation.
        history (list): The conversation history, which is a list of message dictionaries.
        file_path (str): The path to an image file to be included in the conversation, if any.

    Yields:
        tuple: A tuple containing the updated conversation history twice, to update the UI.

    Environment Variables:
        deployment (str): The deployment name for the Azure OpenAI model.
        client (AzureOpenAI): The Azure OpenAI client instance.
        previous_response_id (str): The ID of the previous response for context.
    """
    # Ensure the history list is initialized
    if history is None:
        history = []

    # Add the user prompt to the conversation history with appropriate role
    history.append({"role": "user", "content": user_prompt})

    # Create and add a placeholder for the assistant's reply
    assistant_message = {"role": "assistant", "content": ""}
    history.append(assistant_message)

    # Yield initial state to update the UI
    yield history, history

    # Prepare parameters for the API call, including model name and streaming flag
    global previous_response_id
    params = {
        "model": deployment,
        "input": [{
            "role": "user",
            "content": user_prompt
        }],
        "stream": True
    }

    # Attach the previous response ID for context if available
    if previous_response_id:
        params["previous_response_id"] = previous_response_id

    # If an image file was uploaded, encode it to base64 and add it to the input payload
    if file_path is not None:
        base64_image = encode_image(file_path)
        params["input"].append({
            "role":
            "user",
            "content": [{
                "type": "input_image",
                "image_url": f"data:image/png;base64,{base64_image}"
            }]
        })

    # Initiate the streaming conversation using the client
    stream = client.responses.create(**params)

    # Process each event from the stream to build the complete assistant message
    for event in stream:
        # Record the response id from the first event
        if event.type == 'response.created':
            previous_response_id = event.response.id

        # Append new text received in the stream to the assistant message
        if event.type == 'response.output_text.delta':
            assistant_message["content"] += event.delta
            yield history, history

In [9]:
def clear_chat():
    """
    Clears the conversation history and resets the previous response ID.

    This function resets the global `previous_response_id` to `None` and returns
    empty lists for the conversation history and assistant messages, along with `None`
    for the file path.

    Returns:
        tuple: A tuple containing three elements:
            - An empty list for the user message history.
            - An empty list for the assistant message history.
            - None for the file path.
    """
    global previous_response_id

    previous_response_id = None
    return [], [], None

In [10]:
def clear_textbox():
    """
    Clears the content of a textbox.

    This function returns an empty string, effectively clearing any text
    that might be present in a textbox.

    Returns:
        str: An empty string.
    """
    return ""

In [11]:
# Build the Gradio Blocks interface for the chat demo
with gr.Blocks() as webapp:
    # Header Markdown text for the demo UI, centered
    gr.Markdown(
        "<h2 style='text-align: center;'>Chat with your image</h2>"
    )
    # Chatbot component to display messages stored in a list of role-content dictionaries
    chatbot = gr.Chatbot(height=500, type="messages")
    # State to maintain the conversation history between messages
    state = gr.State([])
    # Textbox for user input with a placeholder message
    msg = gr.Textbox(show_label=False,
                     placeholder="🚀 Your query and press Enter")
    # Row containing the Submit and Clear buttons
    with gr.Row():
        submit_btn = gr.Button("🔥Submit")
        clear_btn = gr.Button("Clear")
    # File upload control for image inputs (placed below the buttons)
    file_picker = gr.File(label="✅ Upload an image file",
                          file_count="single",
                          type="filepath",
                          file_types=[".jpg", ".jpeg", ".png"],
                          height=140)
    # Bind the Textbox submit action to the stream processing function and clear the textbox after submission
    msg.submit(fn=chat_stream,
               inputs=[msg, state, file_picker],
               outputs=[chatbot, state]).then(clear_textbox, None, msg)
    # Also bind the submit button to the same functionality as the Textbox submit
    submit_btn.click(fn=chat_stream,
                     inputs=[msg, state, file_picker],
                     outputs=[chatbot, state]).then(clear_textbox, None, msg)
    # Bind the clear button to reset the chat and clear the file upload
    clear_btn.click(fn=clear_chat,
                    inputs=[],
                    outputs=[chatbot, state, file_picker])

In [12]:
webapp.launch(share=True)

* Running on local URL:  http://127.0.0.1:7860
* Running on public URL: https://0708cbd3f465a768c0.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


