<a href="https://colab.research.google.com/github/dayody/Build_Generative_AI_APP/blob/main/Visual_QA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction
This guide will walk you through creating a web application that can answer questions about an image you upload. We will use the following tools:

Streamlit: To create the web interface for our application.

ChatGPT-4: OpenAI powerful multimodal large language model that will understand the image and your question to provide an answer.

Google Colab/Jupyter Notebook: As the environment to write and run our code.

ngrok: To create a public URL for our Streamlit app, so you can access it from your browser.




Install Necessary Libraries

In [14]:
!pip install streamlit langchain langchain-openai langchain-community pyngrok Pillow

Collecting langchain-openai
  Downloading langchain_openai-0.3.28-py3-none-any.whl.metadata (2.3 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<

Load OpenAI API Credentials

In [32]:
from google.colab import userdata

api_creds = {
    'OPENAI_API_KEY': userdata.get('OPENAI_API_KEY')
}

# You can access the key like this:
# openai_key = api_creds['OPENAI_API_KEY']



In [33]:
import os

os.environ['OPENAI_API_KEY'] = api_creds['OPENAI_API_KEY']

Write the Frontend (UI) code here and store it in a py file

In [34]:
%%writefile app.py
import streamlit as st
from PIL import Image
import base64
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# --- Page Setup ---
st.set_page_config(
    page_title="Visual QA BOT",
    page_icon="🔗",
    layout="centered",
    initial_sidebar_state="auto",
)

st.title("🤖 Visual QA BOT")
st.write("🚀 Visual Question-Answering with Multimodal LLMs.")
st.write("🖼️ Visual QA BOT with Multimodal LLMs")
st.header("🔮 Generate image to test responses.")

# --- Image Upload and Question ---
question = st.text_input("Ask any question about the image:")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])

if uploaded_file is not None:
    image = Image.open(uploaded_file)
    st.image(image, caption="Uploaded Image.", use_container_width=True)

# --- Function to encode the image ---
def encode_image(image_file):
    # The file uploader widget in streamlit returns a file-like object
    # that needs to be read.
    file_bytes = image_file.getvalue()
    return base64.b64encode(file_bytes).decode('utf-8')

# --- Main Logic ---
if st.button("Get Answer"):
    if not os.environ.get("OPENAI_API_KEY"):
        st.error("OPENAI_API_KEY is not set. Please set it in your environment.")
    elif uploaded_file is None:
        st.warning("Please upload an image.")
    elif not question:
        st.warning("Please enter a question.")
    else:
        with st.spinner("Analyzing the image..."):
            try:
                # Initialize the ChatOpenAI model. It will automatically use the API key from the environment.
                llm = ChatOpenAI(model="gpt-4o")

                # Getting the base64 string
                base64_image = encode_image(uploaded_file)

                # Create the HumanMessage for multimodal input
                message = HumanMessage(
                    content=[
                        {"type": "text", "text": question},
                        {
                            "type": "image_url",
                            "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                        },
                    ]
                )

                # Invoke the model
                response = llm.invoke([message])

                st.success("Here's the answer:")
                st.write(response.content)

            except Exception as e:
                st.error(f"An error occurred: {e}")



Overwriting app.py


Start the frontend

In [35]:
!streamlit run app.py --server.port=8989 &>./logs.txt &

Load NGROK AuthToken Credentials

In [36]:
from pyngrok import ngrok
from google.colab import userdata
import time

# Terminate open tunnels if exist
ngrok.kill()
time.sleep(5) # Add a small delay

# Setting the authtoken
# Get your authtoken from Colab secrets
NGROK_AUTH_TOKEN = userdata.get('NGROK_AUTH_TOKEN')
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Open an HTTPs tunnel on port XXXX which you get from your `logs.txt` file
ngrok_tunnel = ngrok.connect(8989)
print("Streamlit App:", ngrok_tunnel.public_url)

Streamlit App: https://936554369860.ngrok-free.app
