In [18]:
from openai import OpenAI
import base64
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Set the API key
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

client = OpenAI()
class GPTModel:
    def __init__(self):
        self.model = "gpt-4o"
        self.client = OpenAI()

    # Function to encode the image
    def encode_image(self, image_path):
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
    
    def forward(self, image, prompt):
        formatted_image = image if image.startswith("http") else f"data:image/jpeg;base64,{self.encode_image(image)}"
        image_dict = {"url": formatted_image}
        prompt_dict = {"type": "text", "text": prompt}
        image_dict = {"type": "image_url", "image_url": image_dict}
        user_content = [prompt_dict, image_dict]
        messages = [{"role": "user", "content": user_content}]
        response = self.client.chat.completions.create(model=self.model, messages=messages)
        return response.choices[0]

In [19]:
gpt = GPTModel()

In [22]:
print(gpt.forward("images/calendar.png", "Look at my calendar and output all the events that I have, when they start, and when they end.").message.content)

Here are the events from your calendar for September 7th to September 10th, 2024:

### Saturday, September 7th, 2024
- **Event:** Can’t Stop! (Rush/Mmmpie Knar En)
  - **Start Time:** 1 AM
  - **End Time:** 2 AM
  
- **Event:** Weekly Reflection
  - **Start Time:** 4 AM
  - **End Time:** 4:45 AM

### Sunday, September 8th, 2024
There are no events on this day.

### Monday, September 9th, 2024
- **Event:** Yamusan
  - **Start Time:** 8 AM
  - **End Time:** 12 PM
  
- **Event:** AI Goes Local
  - **Start Time:** 11 AM
  - **End Time:** 8 PM

- **Event:** Calvin
  - **Start Time:** 3:30 PM
  - **End Time:** 4:30 PM
  - **Location:** 1600 Mission St

- **Event:** SF Startup Series
  - **Start Time:** 5 PM
  - **End Time:** 6 PM
  - **Location:** 441 Oak St

- **Event:** Arya
  - **Start Time:** 7 PM
  - **End Time:** 8 PM

- **Event:** Housewarming Dinner
  - **Start Time:** 7:30 PM
  - **End Time:** 9:30 PM

### Tuesday, September 10th, 2024
- **Event:** Eif w/ Deniz
  - **Start Time:** 8

In [20]:
print(gpt.forward("images/evals.png", "Parse all the information on the string, output in terms of text or a markdown format. Do not truncate information.").message.content)

## Main Evaluation Results

### Metrics:
- **Avg Score:** The average score on all VLM Benchmarks (normalized to 0 - 100, the higher the better).
- **Avg Rank:** The average rank on all VLM Benchmarks (the lower the better).
- **Avg Score & Rank** are calculated based on the selected benchmark. When results for some selected benchmarks are missing, Avg Score / Rank will be None.
- By default, the overall evaluation results are based on 8 VLM benchmarks, sorted by the descending order of Avg Score.
  - The following datasets are included in the main results: MMBench_V11, MMStar, MMMU_VAL, MathVista, OCRBench, AID2, HallusionBench, MMVet.
  - Detailed evaluation results for each dataset (included or not included in main) are provided in the consequent tabs.

### Evaluation Dimension

| Avg Score | Avg Rank | MMBench_V11 | MMStar | MME | MMMU_VAL | MathVista | OCRBench | AID2 | HallusionBench | SEEDBench_IMG | LLaVA-Bench | CCBench | RealWorldQA | POPE | ScienceQA_TEST | SEEDBench2_Plus |

In [21]:
prompt = """
You're a helpful assistant to take a screenshot of a UI and explain everything going on in the interface. 

For this image, explain each graph you see. Write a few details on how runs have formulated. 

Imagine that you're explaining this to someone that will never see the image.

Explain each graph and how each run's performance on each graph looks like. 
"""

print(gpt.forward("images/wandb.png", "").message.content)

The screenshot shows a dashboard for visualizing machine learning experiments in a project titled "**hw2p2**" within the workspace of a user called "Denizbirlikci" on a platform that appears to be Weights & Biases (W&B).

Key elements of the screenshot:

   
2. **Project Header**: The project is named "raj_deniz_josh_simrit," and the specific sub-project or notebook in use is "hw2p2."

3. **Workspace Overview**: 

    - The workspace is showing multiple runs (62 total in this example) with a table of the different runs on the left side, each with unique names (e.g., "raj-rewritten-convnxt," "raj-adam-with-warmup," "deniz:viShNet:AdamW-RETRY").
    - Various icons indicate the status, tags, and other attributes of these runs.

4. **Charts/Visualizations**: Six main charts illustrate the results of different runs:
   
    - **Train Accuracy (train_Acc)**
    - **Train Loss (train_loss)**
    - **Validation Accuracy (validation_Acc)**
    - **Validation Loss (validation_loss)**
    - **Le

In [None]:
print(gpt.forward("images/calendar.png", "Parse all the information on the string, output in terms of text or a markdown format. Do not truncate information.").message.content)

In [14]:
gpt.forward("images/evals.png", "Parse all the information on the string, output in terms of text or a markdown format. Do not truncate information.")

Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='# Main Evaluation Results\n\n## Metrics:\n- **Avg Score**: The average score on all VLM Benchmarks (normalized to 0 - 100, the higher the better).\n- **Avg Rank**: The average rank on all VLM Benchmarks (the lower the better).\n- **Avg Score & Rank** are calculated based on selected benchmark. When results for some selected benchmarks are missing, Avg Score / Rank will be None!!\n\nBy default, we present the overall evaluation results based on 8 VLM benchmarks, sorted by the descending order of Avg Score.\n- The following datasets are included in the main results: MMBench_V11, MMStar, MME, MMU_VAL, MathVista, OCRBench, AID2, HallusionBench, SEEDBench, MMVet.\n- Detailed evaluation results for each dataset (included or not included in main) are provided in the consequent tabs.\n\n## Evaluation Dimension\n- Avg Score\n- Avg Rank\n- MMBench_V11\n- MMStar\n- MME\n- MMU_VAL\n- MathVista\n- OCRBench\n

In [15]:
print(_12.message.content)

Here’s a parsed representation of the information presented in the image:

## Main Evaluation Results

### Metrics
- **Avg Score**: The average score on all VLM Benchmarks (normalized to 0 - 100, the higher the better).
- **Avg Rank**: The average rank on all VLM Benchmarks (the lower the better).
- **Avg Score & Rank**: Calculated based on selected benchmarks. If results for some benchmarks are missing, Avg Score/Rank will be `None`.

### Evaluation Dimension
- The results are organized by descending order of Avg Score.
- The following datasets are included:
  - MMBench_V11
  - MMStar
  - MME
  - MMU_VAL
  - MathVista
  - OCRBench
  - AID2
  - HallusionBench
  - SEEDBench_Plus
  - MMT-Bench_VAL
  - BLINK

### Model Size
- Options are available for <4B, 4B-10B, 10B-20B, 20B-40B, >40B, Unknown.

### Table: Model Performance
| Rank | Method               | Params (B) | Language Model     | Vision Model       | Avg Score | Avg Rank | MMBench_V11 | MMStar | MMU_VAL | MathVista | OCR |
|---