
# Multi‑Modal AI Chatbot — **Task 3 (NullClass Internship)**
**Modes:**  
- **Image → Text** (Gemini 1.5)  
- **Text → Image** (OpenAI `gpt-image-1`)  
- **Text → Image** (Stability AI SDK)

> **Note:** This notebook is intended as the required `.ipynb` deliverable. It demonstrates how to run the app, documents dependencies, and provides evaluation placeholders. API calls require valid keys and internet access in your environment.



## Submission Requirements Mapping
- ✅ `.ipynb` notebook (this file) — **included**
- ✅ `requirements.txt` — **ensure in repo**
- ✅ GUI (Streamlit) — **provided as `app.py`**
- ✅ README with setup & screenshots — **add in repo**
- ⬜ Evaluation notes (qualitative/quantitative) — **section provided below**
- ⬜ Originality & disclaimer notes — **add in README/UI**


## 1. Install Dependencies

In [None]:

# Run this once in a clean environment (uncomment as needed)
# !pip install --upgrade pip
# !pip install streamlit google-generativeai openai pillow requests stability-sdk


## 2. Imports

In [None]:

import os
import io
from PIL import Image
import requests

# Optional: only needed when running inside the app
import streamlit as st  # noqa: F401
import google.generativeai as genai  # noqa: F401
import openai  # noqa: F401

from stability_sdk import client  # noqa: F401
import stability_sdk.interfaces.gooseai.generation.generation_pb2 as generation  # noqa: F401


## 3. API Keys (Environment Variables Recommended)

In [None]:

# Set via environment variables or pass securely from a .env file
# os.environ['GEMINI_API_KEY'] = '...'
# os.environ['OPENAI_API_KEY'] = '...'
# os.environ['STABILITY_API_KEY'] = '...'

GEMINI_API_KEY = os.getenv('GEMINI_API_KEY')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
STABILITY_API_KEY = os.getenv('STABILITY_API_KEY')

print('Gemini Key set:', bool(GEMINI_API_KEY))
print('OpenAI Key set:', bool(OPENAI_API_KEY))
print('Stability Key set:', bool(STABILITY_API_KEY))


## 4. Streamlit App Code (from `app.py`)

In [None]:

app_code = r"""
import streamlit as st
import google.generativeai as genai
import openai
import io
from PIL import Image
import requests
from stability_sdk import client
import stability_sdk.interfaces.gooseai.generation.generation_pb2 as generation

# -------------------- UI HEADER --------------------
st.set_page_config(page_title="MultiModal AI Tool", page_icon="🤖", layout="centered")
st.title("🖼️ MultiModal AI Tool")
st.write("Convert Images to Text (Gemini) & Text to Images (DALL·E / Stability AI)")

# -------------------- API KEY INPUTS --------------------
st.sidebar.header("🔑 API Keys")
gemini_api_key = st.sidebar.text_input("Gemini API Key", type="password")
openai_api_key = st.sidebar.text_input("OpenAI API Key (for DALL·E)", type="password")
stability_api_key = st.sidebar.text_input("Stability AI API Key", type="password")

# -------------------- MODE SELECTION --------------------
mode = st.radio("Select Mode", ["Image to Text (Gemini)", "Text to Image (DALL·E)", "Text to Image (Stability AI)"])

# -------------------- GEMINI: IMAGE TO TEXT --------------------
if mode == "Image to Text (Gemini)":
    uploaded_image = st.file_uploader("Upload an Image", type=["jpg", "jpeg", "png"])
    if uploaded_image and gemini_api_key:
        image = Image.open(uploaded_image)
        st.image(image, caption="Uploaded Image", use_column_width=True)

        genai.configure(api_key=gemini_api_key)
        model = genai.GenerativeModel("gemini-1.5-flash")

        if st.button("Generate Description"):
            with st.spinner("Generating description..."):
                response = model.generate_content([image, "Describe this image in detail."])
                st.subheader("Image Description:")
                st.write(response.text)

# -------------------- DALL-E: TEXT TO IMAGE --------------------
elif mode == "Text to Image (DALL·E)":
    prompt = st.text_area("Enter your prompt")
    if prompt and openai_api_key:
        openai.api_key = openai_api_key
        if st.button("Generate Image (DALL·E)"):
            with st.spinner("Generating image..."):
                try:
                    result = openai.images.generate(
                        model="gpt-image-1",
                        prompt=prompt,
                        size="1024x1024"
                    )
                    image_url = result.data[0].url
                    st.image(image_url, caption="Generated Image (DALL·E)", use_column_width=True)
                except Exception as e:
                    st.error(f"Error: {e}")

# -------------------- STABILITY AI: TEXT TO IMAGE --------------------
elif mode == "Text to Image (Stability AI)":
    prompt = st.text_area("Enter your prompt")
    if prompt and stability_api_key:
        stability_api = client.StabilityInference(
            key=stability_api_key,
            verbose=True
        )
        if st.button("Generate Image (Stability AI)"):
            with st.spinner("Generating image..."):
                answers = stability_api.generate(
                    prompt=prompt,
                    steps=30,
                    cfg_scale=8.0,
                    width=512,
                    height=512,
                    samples=1,
                    sampler=generation.SAMPLER_K_DPMPP_2M
                )
                for resp in answers:
                    for artifact in resp.artifacts:
                        if artifact.type == generation.ARTIFACT_IMAGE:
                            img = Image.open(io.BytesIO(artifact.binary))
                            st.image(img, caption="Generated Image (Stability AI)", use_column_width=True)
"""

# Write/overwrite app.py next to this notebook if needed
with open('app.py', 'w', encoding='utf-8') as f:
    _ = f.write(app_code)

print("app.py written successfully.")


## 5. How to Run the App

In [None]:

# From a terminal in this folder:
# streamlit run app.py
#
# Inside Jupyter (may not work in all environments):
# !streamlit run app.py --server.headless true --server.port 8501


## 6. Optional: Minimal API Call Placeholders (Run only with valid keys)

In [None]:

# GEMINI (Image → Text) — Example (requires key and internet)
# import google.generativeai as genai
# genai.configure(api_key=GEMINI_API_KEY)
# model = genai.GenerativeModel("gemini-1.5-flash")
# img = Image.open('your_image.jpg')
# resp = model.generate_content([img, "Describe this image in detail."])
# print(resp.text)

# OPENAI (Text → Image)
# import openai
# openai.api_key = OPENAI_API_KEY
# out = openai.images.generate(model='gpt-image-1', prompt='A futuristic city', size='512x512')
# out.data[0].url



## 7. Evaluation (Add Observations/Numbers)
Suggested criteria (qualitative/quantitative):
- **Latency**: response time for each mode.
- **Relevance/Coherence**: human judgment (1–5 scale) for 10 prompts/images.
- **Image Quality** (for generation): rate visual appeal & prompt alignment.
- **Failure Handling**: error messages when keys missing or API limit exceeded.

> Create a small table of 10 test cases and record scores per mode.


In [None]:

# Example evaluation table scaffold (fill manually after running real tests)
import pandas as pd

df = pd.DataFrame({
    'case_id': range(1, 6),
    'mode': ['Image->Text', 'Image->Text', 'Text->Image(OpenAI)', 'Text->Image(Stability)', 'Text->Image(OpenAI)'],
    'prompt_or_image': ['sample1.jpg', 'sample2.jpg', 'A cozy cabin', 'A neon dragon', 'A robot in a garden'],
    'latency_sec': [None]*5,
    'relevance_score_1to5': [None]*5,
    'notes': ['']*5
})
df



## 8. Ethical & Usage Notes
- Outputs are **AI-generated** and may be inaccurate. Verify before use in critical contexts.
- Respect content policies and copyright for generated/processed media.
- Add attribution where required by API providers.
