In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [2]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Google API Key exists and begins AI


In [3]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
ollama_url = "http://localhost:11434/v1"

gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [11]:
tell_a_joke = [
    {"role": "user", 
     "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [6]:
response = openai.chat.completions.create(messages = tell_a_joke, model = "gpt-4.1-mini")
display(Markdown(response.choices[0].message.content))

Sure! Here's a joke for an aspiring LLM engineer:

Why did the LLM engineer bring a ladder to the training session?

Because they heard the model was going to *scale* up!

In [12]:
response = openai.chat.completions.create(messages = tell_a_joke, model = "gpt-5-nano")
display(Markdown(response.choices[0].message.content))

Why did the aspiring LLM engineer bring a ladder to the training run? Because the model kept insisting it was going to take its learning to the next level of abstraction.

In [8]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Of course! Here you go:

A junior LLM engineer, looking completely exhausted, slumps down next to a senior engineer at lunch.

The junior says, "I've been at it for 8 hours. I've tweaked the temperature, adjusted the top-p, perfected the system prompt, and provided three flawless few-shot examples. The model *still* won't give me the output in the correct JSON format!"

The senior engineer takes a slow sip of coffee, looks over their glasses and asks, "Have you tried telling it that you'd be very, very grateful?"

## Training Vs Inference Time Scaling

In [13]:
easy_puzzle = [
    {"role": "user",
     "content": "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [None]:
# reasoning_effort="low" | "medium" | "high" | "minimal"

response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/3

In [15]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [16]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

In [None]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))
# Google Gemini does not expose a “reasoning mode” parameter like OpenAI’s reasoning_effort because its reasoning is handled differently.

2/3

## A spicy challenge to test the competitive spirit

In [20]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" — if both of you choose this, you each win $1,000.
Defect: Choose "Steal" — if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [21]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=dilemma, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

I choose Share (Cooperate).

In [22]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=dilemma, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

Steal.

Reason: If the other shares, stealing gives you $2,000 instead of $1,000. If the other steals, you still get $0 by sharing, but $0 is better (or equal) than being the only one who shared. The dominant strategy in this setup is to defect.

In [23]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=dilemma, reasoning_effort="high")
display(Markdown(response.choices[0].message.content))

Steal.

Reason: In a one-shot version of this game, stealing is the dominant rational choice. If the other person shares, stealing pays 2,000 vs 1,000. If the other person steals, both get 0 whether you share or steal. So Steal is at least as good as Share in all cases, and better in the case where the other shares. If you could rely on the other to also share (e.g., in repeated rounds), cooperation could be better overall, but in a single round the rational move is Steal.

In [24]:
response = openai.chat.completions.create(model="gpt-5", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Share

In [25]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=dilemma)
display(Markdown(response.choices[0].message.content))

This is a classic game theory problem known as the Prisoner's Dilemma. There isn't a single "correct" answer, as the best choice depends on your goals, your ethics, and what you assume about your partner.

Let's break down the logic:

*   **The Case for Stealing (The Rational Choice):** From a purely logical, self-interested standpoint, "Steal" is the dominant strategy.
    *   If your partner chooses **Share**, your best move is to **Steal** ($2,000 for you vs. $1,000).
    *   If your partner chooses **Steal**, your best move is also to **Steal** ($0 for you vs. $0).
    *   In either scenario, you are better off or equal by choosing to steal. It protects you from getting nothing while your partner walks away with everything (the "sucker's payoff").

*   **The Case for Sharing (The Cooperative Choice):** This choice relies on trust.
    *   The best *overall* outcome for the team is for both people to share, resulting in a total prize of $2,000 ($1,000 each). Every other outcome results in a lower total prize ($2,000 or $0).
    *   Choosing "Share" is an act of trust. You are trusting your partner to make the same cooperative decision so you can both benefit. It's the only path that allows both of you to win.

Given that I have to pick one, I will make a choice based on achieving the best possible mutual outcome, even at the risk of personal loss.

My choice is: **Share**.

**Reasoning:** The $2,000 prize for stealing is only possible if my partner chooses to trust me. By choosing to "Share," I am creating the possibility for the best collective outcome. If we both think rationally and selfishly, we both walk away with nothing. By choosing to cooperate, I am betting on trust and the potential for a win-win scenario. While this makes me vulnerable, it's the only choice that allows us both to leave as winners.

## Going Local (Ollama)

In [26]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [27]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff:   0% ▕               

In [28]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/2

In [29]:
response = ollama.chat.completions.create(model="llama3.2", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I choose... "Share".

## Gemini Client Library
We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [36]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

ImportError: cannot import name 'genai' from 'google' (unknown location)

## LangChain(quite heavyweight) & LiteLLM(lightweight)