# Lab 02: Software Development with o-series models

This notebook explores the code development capabilities of o1 models and compares its reliability with GPT models.

In [7]:
"""Run this model in Python

> pip install openai
"""
import os
from openai import OpenAI
from dotenv import load_dotenv
from IPython.display import display, Markdown, HTML

# Load environment variables from .env file
load_dotenv()

# To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings. 
# Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
client = OpenAI(
    base_url="https://models.inference.ai.azure.com",
    api_key=os.environ["githubtoken"],
)

#Testing the model
response = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?",
        }
    ],
    model="o1-mini"
)

print(response.choices[0].message.content)

The capital of France is **Paris**.


## Step 1: Crafting a prompt to generate a complete HTML page for a Space Invaders game

In [8]:
game_dev_prompt = ("Make a full HTML page containing a playable Space Invaders game written in Javascript using solely SVG graphics defined inline on the page. "
               "Make the aliens colorful and look like pixelated bugs. "
               "Make the ship that shoots the bullets colorful and look like a little spaceship. "
               "When the game ends, just restart the game automatically.")

## Step 2: Retrieving a Response from GPT-4o

In [9]:
response_4o = client.chat.completions.create(model="gpt-4o"
                                          ,messages=[{
                                              "role": "user",
                                              "content": game_dev_prompt
                                          }]
                                         )

### Step 3: Saving the GPT-4o Response as an HTML File

In [10]:
# Assuming response.choices[0].message.content contains the HTML content
html_content_gpt4o = response_4o.choices[0].message.content

# Define the file path where you want to save the HTML content
file_path = 'output_gpt4o.html'

# Write the HTML content to the file
with open(file_path, 'w', encoding='utf-8') as file:
    file.write(html_content_gpt4o)

print(f"HTML content has been written to {file_path}")

HTML content has been written to output_gpt4o.html


## Step 4: Retrieving a Response from o3-mini

In [11]:
game_dev_prompt = ("Make a full HTML page containing a playable Space Invaders game written in Javascript using solely SVG graphics defined inline on the page. "
               "Make the aliens colorful and look like pixelated bugs. "
               "Make the ship that shoots the bullets colorful and look like a little spaceship. "
               "When the game ends, just restart the game automatically.")

In [12]:
response_o3 = client.chat.completions.create(model="o3-mini"
                                          ,messages=[{
                                              "role": "user",
                                              "content": game_dev_prompt
                                          }]
                                         )

In [22]:
# Assuming response.choices[0].message.content contains the HTML content
html_content_o3 = response_o3.choices[0].message.content

# Define the file path where you want to save the HTML content
file_path = 'output_o3.html'

# Write the HTML content to the file
with open(file_path, 'w', encoding='utf-8') as file:
    file.write(html_content_o3)

print(f"HTML content has been written to {file_path}")

HTML content has been written to output_o3.html


## Step 6: Evaluating code from GPT-4o and o3-mini using o1 (Since the context length is long)

In [20]:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"I am trying to evaluate the game development code response from GPT-4o and o3-mini. Which is a better code in terms of reliability for the game dev prompt: {game_dev_prompt}\n\n o3-mini output: {html_content_o3} gpt-4o output \n\n{html_content_gpt4o}"
    }]
)

In [21]:
display(HTML('<div style="background-color: #f0fff8; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>🔽 &nbsp; Markdown Output – Beginning</h2></hr></div>'))
display(Markdown(response.choices[0].message.content))
display(HTML('<div style="background-color: #fff4f4; padding: 10px; border-radius: 5px; border: 1px solid #d3d3d3;"></hr><h2>🔼 &nbsp; Markdown Output – End</h2></hr></div>'))

In terms of **reliability for game development** and adhering to the constraints of the prompt (Space Invaders, inline SVG rendering, colorful visuals, and automatic game restart), here is an evaluation of the two responses:

---

### **Evaluation Framework**
- **Code Completeness**: Does the code fully implement a playable Space Invaders game (aliens, bullets, movement, collision, victory reset)?
- **Adherence to Prompt**: Are the requirements met — inline SVG graphics, colorful aliens resembling pixelated bugs, and automatic game restart?
- **Efficiency & Simplicity**: Is the code structured efficiently and easy to modify, maintain, and debug?
- **Gameplay Quality**: Does the game offer a smooth playable experience with proper mechanics?
- **Reliability**: Is the implementation resilient to issues such as edge cases, bugs, or performance degradation?

---

### **Breakdown of Each Output**

---

#### **1. `o3-mini` Output**
**Key Strengths:**
- The `o3-mini` output shows remarkable attention to detail:
  1. **Pixelated and Randomly Colored Bugs:** The aliens are implemented using a 5×5 grid pattern, giving them a pixelated bug-like appearance, which satisfies this part of the prompt better than `gpt-4o`.
  2. **Ship Design:** The player ship is drawn as a compact polygonally defined spaceship, fulfilling the requirement of a colorful spaceship.
  3. **Inline SVG Rendering:** All game elements (aliens, spaceship, bullets) are rendered inline using the `<svg>` element with dynamically generated sub-elements. This meets the SVG-only rendering requirement perfectly.
  4. **Automatic Reset:** It handles game reset by checking for a "game-over" state (either all aliens are destroyed or one alien reaches the ship zone) and restarts the game automatically after a delay, which aligns with the prompt's requirements.

**Weaknesses:**
- **Performance at Scale:** The use of individual `<rect>` elements for each cell in the aliens' grid (5×5 per alien) might make the game sluggish, especially with a grid of 10×5 aliens (50 aliens × 25 rectangles = 1,250 objects to render and update).
- **Complexity of Code**: Although comprehensive, the code is verbose and can be challenging for beginners to navigate or tweak.
- **Limited Bullet-Ship Interaction:** The code allows only a single active bullet at any given time, limiting gameplay fluidity.
- **Minor Bug with Alien Movement:** The alien bottom collision detection is somewhat tied to their position update logic and may cause small inaccuracies (delayed detection).

**Gameplay Quality:**
- The game feels relatively well-defined and consistent. The aliens move side-to-side and drop down as expected, bullets interact with aliens, and game-ending conditions are appropriately handled.

---

#### **2. `gpt-4o` Output**
**Key Strengths:**
- **Conciseness:** The code is more concise compared to `o3-mini`, making it easier to read, maintain, and modify. It nicely encapsulates the essential mechanics.
- **Moving Aliens and Automatic Restart:** Aliens move horizontally and shift down when reaching edges, and the game restarts when all aliens are destroyed or bullets collide with all.
- **Simplified Rendering:** Instead of rendering each alien as a pixelated bug, the use of a `<rect>` element decorated with vibrant hues (`hsl` color) keeps rendering lightweight without sacrificing colorful visuals.

**Weaknesses:**
- **Aliens Are Not Pixelated Bugs:** Aliens are implemented as simple colored rectangles, which does not fulfill the prompt's emphasis on "pixelated bug-like" visuals.
- **Bullet-Ship Interaction:** Similar to `o3-mini`, this implementation supports firing only one bullet at a time, limiting gameplay dynamics.
- **Less SVG Customization:** While rendering is done purely in SVG, the game could use more creative spaceship or alien designs (e.g., detailed geometry like in `o3-mini`).

**Gameplay Quality:**
- Faster and slightly simpler gameplay compared to `o3-mini`. However, it doesn't feel as detailed or polished visually. Most actions (collision, alien movement) work as expected without major bugs.

---

### **Comparison Table**

| **Criteria**              | **o3-mini**                                                    | **gpt-4o**                                                   |
|---------------------------|----------------------------------------------------------------|-------------------------------------------------------------|
| **Code Completeness**     | Fully implements Space Invaders with pixelated bugs, ship, and bullets. | Fully implements core game logic but lacks pixelated bugs. |
| **Adherence to Prompt**   | Fully adheres to the inline SVG requirement, colorful visuals (bugs/ship), and auto-restart behavior. | Mostly adheres but fails to create pixelated aliens (uses rectangles). |
| **Efficiency**            | Alien rendering (5x5 grid) is resource-heavy for larger grids. | Simplified SVG design ensures better efficiency.            |
| **Simplicity**            | Verbose and harder for beginners to tweak or maintain.         | Concise, readable, and beginner-friendly.                  |
| **Gameplay Quality**      | Smooth gameplay but limited to one bullet at a time; aliens move properly. | Smooth gameplay, efficient logic, but lacks detailed visuals. |
| **Reliability**           | Rare minor bugs with alien movement; handles reset logic robustly. | Very reliable; minimal edge case handling needed.           |

---

### **Conclusion**

- **Best for Reliability & Adherence to Prompt:** 
  - The **`o3-mini` output** stands out for its **attention to detail** in creating pixelated, multi-colored bug-like aliens and a well-drawn spaceship, both customizable and artistically closer to Space Invaders. It surpasses in visual fidelity while maintaining auto-restart mechanics.
  - It does, however, suffer from minor performance concerns and verbosity.

- **Best for Simplicity & Performance:** 
  - The **`gpt-4o` output** is better for **performance** and simplicity, with a faster-moving alien grid and concise implementation. However, it fails to meet a key visual requirement — creating pixelated aliens, which makes it less compliant with the prompt.

---

### **Final Recommendation**:
If **fidelity to the prompt and visual authenticity** are more important, **choose `o3-mini`**.  
If **performance, simplicity, and readability** are more critical, **choose `gpt-4o`**.