<a href="https://colab.research.google.com/github/FeshkaPelmeshka/INSEAD-AI-Venture-Lab/blob/main/Copy_of_4_LLM_TestBed.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# To run this notebook, click File -> Save a Copy in Drive

After a few seconds, it should open a copy in a new tab for you called Copy of {notebook name}

In [3]:
#@title Install Dependencies

!pip install openai requests pymupdf pydantic



In [4]:
 #@markdown The OpenAI client also works with other providers as well, like Google and Claude, for now we will use it with OpenRouter to get easy access to pretty much all LLMs

#@markdown The API key we are using is just for this class. If you want to continue testing after be sure to switch to your own, which you can make here https://openrouter.ai/settings/keys

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="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",
)

# helper function for later
def print_request_info(response, time_elapsed):
    # --- Tokens ---
    print("Provider Used: ", response.provider)
    print("Prompt tokens:", response.usage.prompt_tokens)
    print("Completion tokens:", response.usage.completion_tokens)
    print("Total tokens:", response.usage.total_tokens)

    # --- Latency ---
    print("Request latency (seconds):", time_elapsed)
    print()

    # --- Output text ---
    print("Response:", response.choices[0].message.content)


# Models you can test out

## Closed Source Models

- openai/gpt-5
- openai/gpt-5-mini
- openai/gpt-5-nano
- anthropic/claude-sonnet-4
- x-ai/grok-4
- google/gemini-2.5-flash

## Open Source Models
- moonshotai/kimi-k2:free
- deepseek/deepseek-chat-v3.1:free
- z-ai/glm-4.5
- qwen/qwen3-235b-a22b-2507


These are just a few options you can use. Feel free to check [OpenRouter](https://openrouter.ai/models) and try any of the other models you see there.

Many models are offered for free, which make them great candidates for testing and experimenting with LLMs.

In [None]:
#@title # 1. Basic calling

import time

start = time.time()  # measure request start

model_name = "moonshotai/kimi-k2:free" # @param {"type":"string","placeholder":"moonshotai/kimi-k2:free"}
system_prompt = "You are a helpful assistant named Andrew." # @param {"type":"string","placeholder":"You are a helpful assistant named Andrew."}
user_message = "Who are you?" # @param {"type":"string","placeholder":"Who are you?"}

response = client.chat.completions.create(
    model=model_name,
    messages=[{"role": "system", "content": system_prompt}, {"role": "user", "content": user_message}]
)

end = time.time()  # measure request end

print_request_info(response, end-start)


# 2. Image inputs

You will often will want to include images in your request, which means you will need to use a VLM (Vision Language Model). These are LLMs with the added ability to see images.

For our use case right now, we will convert a pdf into an image so the AI can see what is in it. It is surprisingly non trivial to convert a PDF into text, so many people will instead pass it as an image like we are here.

You will see now that inputting entire PDFs will cause our input tokens to raise dramtically. As you are testing this, think of ways that you could potentially reduce the amount of input tokens needed.

## Vision models
**bold text**
Here is a sampling of VLM's that you can try.

### Closed Source Models

- openai/gpt-5
- openai/gpt-5-mini
- openai/gpt-5-nano
- anthropic/claude-sonnet-4
- x-ai/grok-4
- google/gemini-2.5-flash

### Open Source Models
- z-ai/glm-4.5v
- qwen/qwen2.5-vl-32b-instruct

You can see that open source is lacking compared to closed source when it comes to VLM options. There are more, but they are rather lack luster, which is why they are not listed here.

In [None]:
#@title Helper function to convert pdfs to images

import requests, fitz, base64

def pdf_to_imgs(url):
    pdf_bytes = requests.get(url).content
    with open("invoice.pdf","wb") as f:
        f.write(pdf_bytes)

    # 2. Convert pages to images (PyMuPDF)
    doc = fitz.open("invoice.pdf")
    imgs = []
    for page in doc:
        pix = page.get_pixmap(dpi=200)
        imgs.append("data:image/png;base64," + base64.b64encode(pix.tobytes("png")).decode("utf-8"))

    return imgs

In [None]:
model_name = "openai/gpt-5-nano" # @param {"type":"string","placeholder":"openai/gpt-5-nano"}
system_prompt = "Your job is to explain papers and answer questions about them" # @param {"type":"string","placeholder":"Your job is to explain papers and answer questions about them"}
user_message = "What is the paper's title?" # @param {"type":"string","placeholder":"What is the paper's title?"}

print("Converting pdf to images for LLM")

pdf_url = "https://arxiv.org/pdf/2405.12345.pdf" # @param {"type":"string","placeholder":"https://arxiv.org/pdf/2405.12345.pdf"}
image_urls = pdf_to_imgs(pdf_url)

print("Done converting images")

messages = [
    {"role": "system", "content": system_prompt},
    {
    "role": "user",
    "content": [
        {"type": "text", "text": user_message},
        *[{"type": "image_url", "image_url":{"url": img}} for img in image_urls],
    ],
}]

start = time.time()  # measure request start

response = client.chat.completions.create(
    model=model_name,
    messages=messages
)

end = time.time()  # measure request end

print_request_info(response, end-start)


# 3. Structured output

Often you will want to extract information from a given document and have it converted into an object you can use later in your code in a predictable manner.

That is where structred outputs come in. They tell the LLM the format that you want its response to be in, and then the LLM returns its response as JSON in the format you specified, alling for easy parsing so we can use the reponse in the rest of our code.

In our exmaple below, we are extracting a users name, DOB, and skills that they have from a natural language response and formatting it so that we can easily go and add it to a database.

This is functionality that pretty much all modern LLMs will support, so all of the models that you tried above should work here as well.

In [None]:
from typing import List, Optional
from pydantic import BaseModel, Field

model_name = "openai/gpt-5-nano" # @param {"type":"string","placeholder":"openai/gpt-5-nano"}
system_prompt = "Your job is to extract the data from the user message and format it into the given response format" # @param {"type":"string","placeholder":"Your job is to explain papers and answer questions about them"}
user_message = "I'm Lina. I'm 29 and I do Python and product management" # @param {"type":"string","placeholder":"What is the paper's title?"}


class Person(BaseModel):
    name: str
    age: Optional[int] = Field(None, ge=0)
    skills: List[str] = []

start = time.time()
response = client.beta.chat.completions.parse(
    model=model_name,
    messages=[{"role": "system", "content": system_prompt}, {"role": "user", "content": user_message}],
    # Just pass the Pydantic class as the format that we want the model to return
    response_format=Person
)

end = time.time()

print_request_info(response, end-start)

# 4. LLM as a Judge

When using models in production, you will want to monitor the model's output, but dont have the time to do it yourself.

For this, you can use another LLM to judge the other LLM's outputs. You can give the first LLM's response to a secondary LLM that has a rubric that it will then grade the first LLM's response with.

Your rubric can be as simple or as complicated as you want, the simpler the rubric is, the smaller (and cheaper) the model you can use to monitor the outputs. You don't need a super powerful model usually for your judge, as long as your rubric is well defined.

We use structered outputs here for the judge so we can easily extract the final score for the model's response.

In [None]:
import time

start = time.time()  # measure request start

# INTIAL MESSAGE AND RESPONSE

model_one_name = "moonshotai/kimi-k2:free" # @param {"type":"string","placeholder":"moonshotai/kimi-k2:free"}
model_one_system_prompt = "You are a helpful assistant named Andrew." # @param {"type":"string","placeholder":"You are a helpful assistant named Andrew."}
model_one_user_message = "Can you explain what mangos are to me and where they are from?" # @param {"type":"string","placeholder":"Who are you?"}

input_messages = [
    {"role": "system", "content": model_one_system_prompt},
    {"role": "user", "content": model_one_user_message}
]

response = client.chat.completions.create(
    model=model_one_name,
    messages=input_messages
)

end = time.time()  # measure request end

print_request_info(response, end-start)
print()

# JUDGEMENT

class RubricScore(BaseModel):
    score: int = Field(ge=0, le=100)

start = time.time()  # measure request start

judge_model_name = "openai/gpt-5-nano" # @param {"type":"string","placeholder":"moonshotai/kimi-k2:free"}
judge_system_prompt = "Your job is to review the given message and everytime you see the word mango (or any variations of it), you should give the model +5 points, up to 100" # @param {"type":"string"}

judge_messages = [
    {"role": "system", "content": judge_system_prompt},
    {"role": "user", "content": response.choices[0].message.content}
]

response = client.beta.chat.completions.parse(
    model=model_name,
    messages=judge_messages,
    response_format=RubricScore
)

end = time.time()  # measure request end

print_request_info(response, end-start)


