<a href="https://colab.research.google.com/github/dayody/Build_Generative_AI_APP/blob/main/Visual_QA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction
This guide will walk you through creating a web application that can answer questions about an image you upload. We will use the following tools:

Streamlit: To create the web interface for our application.

ChatGPT-4: OpenAI powerful multimodal large language model that will understand the image and your question to provide an answer.

Google Colab/Jupyter Notebook: As the environment to write and run our code.

ngrok: To create a public URL for our Streamlit app, so you can access it from your browser.




Install Necessary Libraries

In [1]:
!pip install streamlit openai pyngrok Pillow

Collecting streamlit
  Downloading streamlit-1.47.1-py3-none-any.whl.metadata (9.0 kB)
Collecting pyngrok
  Downloading pyngrok-7.2.12-py3-none-any.whl.metadata (9.4 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.47.1-py3-none-any.whl (9.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m56.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyngrok-7.2.12-py3-none-any.whl (26 kB)
Downloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m58.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl (

Load OpenAI API Credentials

In [2]:
from google.colab import userdata

api_creds = {
    'OPENAI_API_KEY': userdata.get('OPENAI_API_KEY')
}

# You can access the key like this:
# openai_key = api_creds['OPENAI_API_KEY']



In [3]:
import os

os.environ['OPENAI_API_KEY'] = api_creds['OPENAI_API_KEY']

Write the Frontend (UI) code here and store it in a py file

In [11]:
%%writefile app.py
import streamlit as st
from PIL import Image
import openai
import base64
import requests

# --- Page Setup ---
st.set_page_config(
    page_title="Visual QA Bot",
    page_icon="🖼️",
    layout="centered",
    initial_sidebar_state="auto",
)

st.title("🖼️ Visual QA Bot")
st.write("Drag and drop file here.")

# --- OpenAI API Key ---
openai_api_key = st.text_input("Enter your OpenAI API Key:", type="password")

# --- Image Upload and Question ---
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
question = st.text_input("Ask a question about the image:")

if uploaded_file is not None:
    image = Image.open(uploaded_file)
    st.image(image, caption="Uploaded Image.", use_column_width=True)

# --- Function to encode the image ---
def encode_image(image_file):
    # The file uploader widget in streamlit returns a file-like object
    # that needs to be read.
    file_bytes = image_file.getvalue()
    return base64.b64encode(file_bytes).decode('utf-8')

# --- Main Logic ---
if st.button("Get Answer"):
    if not openai_api_key:
        st.warning("Please enter your OpenAI API key.")
    elif uploaded_file is None:
        st.warning("Please upload an image.")
    elif not question:
        st.warning("Please enter a question.")
    else:
        with st.spinner("Analyzing the image..."):
            try:
                # Getting the base64 string
                base64_image = encode_image(uploaded_file)

                headers = {
                    "Content-Type": "application/json",
                    "Authorization": f"Bearer {openai_api_key}"
                }

                payload = {
                    "model": "gpt-4o",
                    "messages": [
                        {
                            "role": "user",
                            "content": [
                                {
                                    "type": "text",
                                    "text": question
                                },
                                {
                                    "type": "image_url",
                                    "image_url": {
                                        "url": f"data:image/jpeg;base64,{base64_image}"
                                    }
                                }
                            ]
                        }
                    ],
                    "max_tokens": 300
                }

                response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
                response.raise_for_status()  # Will raise an HTTPError if the HTTP request returned an unsuccessful status code

                answer = response.json()['choices'][0]['message']['content']
                st.success("Here's the answer:")
                st.write(answer)

            except requests.exceptions.HTTPError as err:
                st.error(f"HTTP error occurred: {err.response.text}")
            except Exception as e:
                st.error(f"An error occurred: {e}")




Overwriting app.py


Start the frontend

In [12]:
!streamlit run app.py --server.port=8989 &>./logs.txt &

Load NGROK AuthToken Credentials

In [13]:
from pyngrok import ngrok
from google.colab import userdata
import time

# Terminate open tunnels if exist
ngrok.kill()
time.sleep(5) # Add a small delay

# Setting the authtoken
# Get your authtoken from Colab secrets
NGROK_AUTH_TOKEN = userdata.get('NGROK_AUTH_TOKEN')
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Open an HTTPs tunnel on port XXXX which you get from your `logs.txt` file
ngrok_tunnel = ngrok.connect(8989)
print("Streamlit App:", ngrok_tunnel.public_url)

Streamlit App: https://00071a6cc2b3.ngrok-free.app
