<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/assignments/assignment_yourname_t81_559_class9.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative AI
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

**Module 9 Assignment: MultiModal Models**

**Student Name: Your Name**

# Google CoLab Instructions

If you are using Google CoLab, it will be necessary to mount your GDrive so that you can send your notebook during the submit process. Running the following code will map your GDrive to ```/content/drive```.

In [None]:
import os

try:
  from google.colab import drive, userdata
  drive.mount('/content/drive', force_remount=True)
  COLAB = True
  print("Note: using Google CoLab")
except:
  print("Note: not using Google CoLab")
  COLAB = False

# Assignment Submission Key - Was sent you first week of class.
# If you are in both classes, this is the same key.
if COLAB:
  # For Colab, add to your "Secrets" (key icon at the left)
  key = userdata.get('T81_559_KEY')
else:
  # If not colab, enter your key here, or use an environment variable.
  # (this is only an example key, use yours)
  key = ""

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain openai langchain_openai

Mounted at /content/drive
Note: using Google CoLab
Collecting langchain_openai
  Downloading langchain_openai-0.3.11-py3-none-any.whl.metadata (2.3 kB)
Collecting langchain-core<1.0.0,>=0.3.45 (from langchain)
  Downloading langchain_core-0.3.49-py3-none-any.whl.metadata (5.9 kB)
Collecting tiktoken<1,>=0.7 (from langchain_openai)
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading langchain_openai-0.3.11-py3-none-any.whl (60 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.1/60.1 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langchain_core-0.3.49-py3-none-any.whl (420 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m420.1/420.1 kB[0m [31m33.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m71.4 MB/s[0

# Assignment Submit Function

You will submit the 10 programming assignments electronically.  The following submit function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any basic problems.

**It is unlikely that should need to modify this function.**

In [None]:
import base64
import os
import numpy as np
import pandas as pd
import requests
import PIL
import PIL.Image
import io
from typing import List, Union

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - List of pandas dataframes or images.
# key - Your student key that was emailed to you.
# course - The course that you are in, currently t81-558 or t81-559.
# no - The assignment class number, should be 1 through 10.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.

def submit(
    data: List[Union[pd.DataFrame, PIL.Image.Image]],
    key: str,
    course: str,
    no: int,
    source_file: str = None
) -> None:
    if source_file is None and '__file__' not in globals():
        raise Exception("Must specify a filename when in a Jupyter notebook.")
    if source_file is None:
        source_file = __file__

    suffix = f'_class{no}'
    if suffix not in source_file:
        raise Exception(f"{suffix} must be part of the filename.")

    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb', '.py']:
        raise Exception(f"Source file is {ext}; must be .py or .ipynb")

    with open(source_file, "rb") as file:
        encoded_python = base64.b64encode(file.read()).decode('ascii')

    payload = []
    for item in data:
        if isinstance(item, PIL.Image.Image):
            buffered = io.BytesIO()
            item.save(buffered, format="PNG")
            payload.append({'PNG': base64.b64encode(buffered.getvalue()).decode('ascii')})
        elif isinstance(item, pd.DataFrame):
            payload.append({'CSV': base64.b64encode(item.to_csv(index=False).encode('ascii')).decode("ascii")})
        else:
            raise ValueError(f"Unsupported data type: {type(item)}")

    response = requests.post(
        "https://api.heatonresearch.com/wu/submit",
        headers={'x-api-key': key},
        json={
            'payload': payload,
            'assignment': no,
            'course': course,
            'ext': ext,
            'py': encoded_python
        }
    )

    if response.status_code == 200:
        print(f"Success: {response.text}")
    else:
        print(f"Failure: {response.text}")

# Assignment Instructions

For this assignment you are provided with 10 image files that contain 10 different webcam pictures taken at the [Venice Sidewalk Cafe](https://www.westland.net/beachcam/) a WebCam that has been in opration since 1996.  You can find the 10 images here:

* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk1.jpg
* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk2.jpg
* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk3.jpg
* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk4.jpg
* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk5.jpg
* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk6.jpg
* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk7.jpg
* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk8.jpg
* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk9.jpg
* https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk10.jpg

You can see a sample of the WebCam here:

![alt text](https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk1.jpg)


* image - The image number, 1 through 10.
* crowded - Is this image crowded with people? (1=yes, 0=no)
* cars - Are there cars in this image? (1=yes, 0=no)
* bikes - Are there bikes in this image? (1=yes, 0=no)

Your submitted data frame should also contain a column that identifies which image generated each row.  This column should be named **image** and contain integer numbers between 1 and 10.  There should be 10 rows in total.  The complete data frame should look something like this (not necessarily exactly these numbers).

|image|crowded|cars|bikes|
|-|-|-|-|
|1|0|0|1
|2|0|1|1
|3|1|0|0
|...|...|...|...|



### Example MultiModal Code

You should use a MultiModal model to obtain the data for each of the 10 images. You should be able to construct a single prompt that gets you the three needed values for each item. I suggest you use the "gpt-4o-mini" model with a temperature of 0.1. You will need to develop a prompt that looks for each of the requested values.

The following code shows an example of running a MultiModal model with a prompt on the first image.

In [None]:
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
import base64
import httpx
import textwrap

MODEL = "gpt-4o-mini"
image_url = 'https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk1.jpg'
prompt = "Describe this image."

# Initialize the GPT model
model = ChatOpenAI(model="gpt-4o-mini")

# Fetch image data and encode it in base64
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

# Create a message with both text and the image
message = HumanMessage(
    content=[
        {"type": "text", "text": prompt},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
    ],
)

# Get response with a modified prompt from GPT
response = model.invoke([message])

# Wrap the text output to avoid scrolling off the screen in Colab
wrapped_output = textwrap.fill(response.content, width=80)
print(wrapped_output)

The image depicts a sunny day at a beach scene, likely at a popular coastal
area. In the foreground, there is a path along the beach where people are
walking and engaging in various activities. Palm trees line the area, providing
a tropical vibe.   To the left, there appears to be some beach equipment and
possibly a food stand, indicated by a white canopy. The sandy beach stretches
out towards the water, which is visible in the background. A few individuals are
seen sitting on benches or walking along the path, and there are some small
structures or tents set up nearby. The overall atmosphere looks lively and
casual, typical of a beach setting.


### Solution

In [None]:
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
import base64
import httpx
import pandas as pd

MODEL = "gpt-4o-mini"
#model = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)

# image_url_template = 'https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk{}.jpg'

# You must identify your source file.  (modify for your local setup)
# file="/content/drive/My Drive/Colab Notebooks/assignment_solution_t81_559_class9.ipynb"  # Google CoLab
# file='C:\\Users\\jeffh\\projects\\t81_559_deep_learning\\assignments\\assignment_yourname_t81_559_class9.ipynb'  # Windows
# file='/Users/jheaton/projects/t81_559_deep_learning/assignments/assignment_yourname_t81_559_class9.ipynb'  # Mac/Linux

file="/content/drive/My Drive/Colab Notebooks/assignment_ZhijiangLi_t81_559_class9.ipynb"

## ... continue your code...
# image_urls = [
#     f"https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk{i}.jpg"
#     for i in range(1, 11)
# ]

image_urls = [
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk1.jpg",
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk2.jpg",
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk3.jpg",
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk4.jpg",
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk5.jpg",
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk6.jpg",
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk7.jpg",
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk8.jpg",
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk9.jpg",
    "https://data.heatonresearch.com/data/t81-558/sidewalk/sidewalk10.jpg"
]


prompt = (
    "Is the image crowded with people? Are there any cars? Are there any bikes? "
    "Answer only in this exact JSON format: "
    '{"crowded": 1 or 0, "cars": 1 or 0, "bikes": 1 or 0}.'
)


def analyze_image(image_url):
    image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
    message = HumanMessage(content=[
        {"type": "text", "text": prompt},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
    ])
    response = MODEL.invoke([message])
    return eval(response.content)


results = []
for i, url in enumerate(image_urls, start=1):
    try:
        result = analyze_image(url)
        result["name"] = i
        results.append(result)
    except:
        results.append({"name": i, "crowded": None, "cars": None, "bikes": None})


df = pd.DataFrame(results)
df = df[["name", "crowded", "cars", "bikes"]]


## Submit assignment

submit(source_file=file,data=[df],key=key,course='t81-559',no=9)



Success: Submitted Assignment 9 (t81-559) for l.zhijiang:
You have submitted this assignment 9 times. (this is fine)
