In [1]:
prompt = """ For each screenshot extract the following. Activity is the activity being performed in the screenshot. Description is a more detailed explanation of what is going on. For the activity, there are only 5 classifications. For activity, each screenshot needs to be labeled as one of the following [Coding, Browsing, Meeting,  Communicating, Scheduling, Chatting, Off-Topic]. 

Communicating would be when someone has an application like microsoft teams open to the chat bar, or slack messages, or discord. Meeting would be if the user appears to be on zoom or in some sort of video conference. Scheduling is when a calendar type app is open.  Chatting is when an AI like ChatGPT or claude is open. For coding, make sure to note in the description what the overall project folder opened is and what the name of the file is that is being edited.


EXAMPLES: 


Example input 1

\{screenshot.png\}(imagine this example image is a screenshot of someone editing helloworld.py in vscode but it is half complete)


Output:

Activity : Programming

Description: Visual studio code is opened and helloworld.py is being written. It appears to be in progress.



Example input 2:

\{screenshot2.png\}(imagine this is a screenshot of stack overflow looking at  fixing a valueerror)


Output:

Activity: Browsing

Description: Stack overflow is open in the web browser being looked at The current page is looking at how to fix a valueerror in python"""

In [2]:
import base64
import requests
import os

# OpenAI API Key
api_key = os.environ.get("OPENAI_API_KEY")

def explain_images(image_paths, prompt, api_key):
    # Function to encode a single image to base64
    def encode_image(image_path):
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')

    # Encoding all images
    base64_images = [encode_image(path) for path in image_paths]

    # Setting up headers for the API request
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    # Constructing the payload with multiple images
    payload = {
        "model": "gpt-4-vision-preview",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt
                    }
                ] + [{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image}"}} for image in base64_images]
            }
        ],
        "max_tokens": 300
    }

    # Sending the request and returning the response
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    return response["choices"][0]["message"]["content"]
    #return response.json()


# Example usage
image_paths = ["./screen1.png", "./screen2.png", "screen3.png", "screen4.png", "screen5.png"]
response = explain_images(image_paths, prompt, api_key)

# Do something with the response, like printing it
print(response)


FileNotFoundError: [Errno 2] No such file or directory: './screen1.png'

In [38]:
print(response["choices"][0]["message"]["content"])

For the first image:

Activity: Coding
Description: Visual Studio Code is open with a file named "test.ipynb". The user appears to have just started typing with the content "bob is funbasn adsm," which does not seem to be code but more likely just input testing.

For the second image:

Activity: Communicating
Description: Microsoft Teams is open with the chat interface visible. The user is currently in a chat with "Gastaldi, Lorenzo" but has not yet written a message.

For the third image:

Activity: Browsing
Description: A YouTube video is playing, titled "Chill Night Drives" by the user "SilverReaper." It looks like the video is about driving with lo-fi music for a relaxing experience.

For the fourth image:

Activity: Browsing
Description: The web browser is displaying Instagram with the user's profile logged in. There is a photograph shown in the main feed of a person celebrating in a swimming pool.

For the fifth image:

Activity: Browsing
Description: The Nike online store is ope

In [6]:
import re 
import json
input_string = """For the first image:

Activity: Coding
Description: Visual Studio Code is open with a file named "test.ipynb". The user appears to have just started typing with the content "bob is funbasn adsm," which does not seem to be code but more likely just input testing.

For the second image:

Activity: Communicating
Description: Microsoft Teams is open with the chat interface visible. The user is currently in a chat with "Gastaldi, Lorenzo" but has not yet written a message.

For the third image:

Activity: Browsing
Description: A YouTube video is playing, titled "Chill Night Drives" by the user "SilverReaper." It looks like the video is about driving with lo-fi music for a relaxing experience.

For the fourth image:

Activity: Browsing
Description: The web browser is displaying Instagram with the user's profile logged in. There is a photograph shown in the main feed of a person celebrating in a swimming pool.

For the fifth image:

Activity: Browsing
Description: The Nike online store is open in a web browser with the Nike Metcon 9 Men's Workout Shoes product page displayed. The page is offering an extra 30% off as a daily offer for members."""



pattern = r'Activity: (.*?)\nDescription: (.*?)(?=\n\n|\Z)'
matches = re.findall(pattern, input_string, re.DOTALL)
activities = [{"Activity": activity, "Description": desc.strip()} for activity, desc in matches]
json_output = json.dumps(activities, indent=4)
print(json_output)


[
    {
        "Activity": "Coding",
        "Description": "Visual Studio Code is open with a file named \"test.ipynb\". The user appears to have just started typing with the content \"bob is funbasn adsm,\" which does not seem to be code but more likely just input testing."
    },
    {
        "Activity": "Communicating",
        "Description": "Microsoft Teams is open with the chat interface visible. The user is currently in a chat with \"Gastaldi, Lorenzo\" but has not yet written a message."
    },
    {
        "Activity": "Browsing",
        "Description": "A YouTube video is playing, titled \"Chill Night Drives\" by the user \"SilverReaper.\" It looks like the video is about driving with lo-fi music for a relaxing experience."
    },
    {
        "Activity": "Browsing",
        "Description": "The web browser is displaying Instagram with the user's profile logged in. There is a photograph shown in the main feed of a person celebrating in a swimming pool."
    },
    {
      