# 📘 Kahoot VLM Quiz Solver – Notebook Overview

This notebook automates the process of **joining a Kahoot game**, **taking screenshots**, **extracting quiz questions and choices**, and **submitting the correct answer** using a **Vision-Language Model (VLM)** hosted locally (e.g., via Ollama at `localhost:11434`).


## 1️⃣ Setup & Configuration

This section prepares your environment with required Python packages and initializes key variables:

- Imports libraries for browser automation (`playwright`), image handling (`PIL`), HTTP requests, and base64 encoding.
- Creates a `debug/` folder to store screenshots and intermediate crops.
- Defines **bounding boxes** to crop out the question and answer regions from the Kahoot screen.
- Sets up **click coordinates** to simulate mouse clicks on each answer choice.
- Prepares coordinates for simulating clicks on answer buttons.

## ✅ Requirements

Before using this notebook, make sure:

- You’ve installed Playwright and run `playwright install`
- A local VLM server (e.g., [Ollama](https://ollama.com)) is running at `http://localhost:11434`
- The model you are using supports **image + text input** (e.g., `qwen2.5vl`, etc.)



In [None]:
from playwright.async_api import async_playwright
from datetime import datetime
from IPython.display import Image as IPyImage, display 
from PIL import Image
import os
import requests
import base64
from io import BytesIO

DEBUG_DIR = "debug"
os.makedirs(DEBUG_DIR, exist_ok=True)

MODEL = "qwen2.5vl:7b-fp16"

box_dict = {
    "choice1": (20, 510, 635, 580),
    "choice2": (635, 510, 1270, 580),
    "choice3": (20, 570, 635, 650),
    "choice4": (635, 570, 1270, 650),
    "question": (20, 100, 1270, 510)
}

choice_dict={
    "choice1": (325, 545),
    "choice2": (950, 545),
    "choice3": (325, 610),
    "choice4": (950, 610)
}

async def click_browser_by_choice(page, answer):
    x=choice_dict[f'choice{answer}'][0]
    y=choice_dict[f'choice{answer}'][1]
    print(f"click {x},{y} for answer {answer}")
    await page.mouse.move(x, y)
    await page.mouse.click(x, y)

## 2️⃣ Image Cropping & LLM Prompting Utilities

This section defines a set of helper functions that:

- Encode an image into base64 format suitable for API input.
- Construct a valid prompt payload for multimodal inference (image + question).
- Crop specific regions (question or choices) from a screenshot, save to `debug/`, and send to the VLM for text extraction.
- Aggregate the cropped texts into a structured prompt and get the answer from the model (only returns the choice number: 1-4).


In [None]:
def encode_image_to_base64(image_path):
    with Image.open(image_path) as img:
        buffered = BytesIO()
        img.save(buffered, format="JPEG")
        img_bytes = buffered.getvalue()
        base64_str = base64.b64encode(img_bytes).decode("utf-8")
        return base64_str

    
def create_payload(image_path, model_id, question):
    image_b64= encode_image_to_base64(image_path)
    return {
        "model": model_id,
        "messages": [
            {"role": "user", "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}},
                {"type": "text", "text": question}
            ]}
        ],
        "temperature": 0.2,
        "max_tokens": 512
    }


def extract_text_with_llm(imageFn,endpoint='http://localhost:11434'):
    question="Extract text in the image and respond with only the text"
    model_id=MODEL
    payload=create_payload(imageFn, model_id, question)
    chat_url=f"{endpoint}/v1/chat/completions"
    response = requests.post(chat_url, json=payload)
    text = response.json()['choices'][0]['message']['content']
    return(text)   

def crop_and_extract_with_llm(inputImageFn,tag,endpoint='http://localhost:11434'):
    box=box_dict[tag]
    basename = os.path.splitext(inputImageFn)[0]
    im = Image.open(inputImageFn)
    im1 = im.crop(box)
    #im1.show()
    #outputFn=f"{basenam}_{tag}.png"
    outputFn = os.path.join(DEBUG_DIR, f"{os.path.basename(basename)}_{tag}.png")
    im1.save(outputFn, 'png')
    text=extract_text_with_llm(outputFn,endpoint)
    return(text)

## 3️⃣ VLM-Based Answering Logic

This section contains core strategy for answering:

- **Text-based mode** (`get_answer_with_llm`) – crops and extracts each component (question + choices) separately, then forms a prompt.

This function return a digit (1-4) as the predicted correct answer with **no explanation**.


In [None]:
# get_answer_with_llm(filename,endpoint='http://localhost:11434')
def get_answer_with_llm(filename,endpoint='http://localhost:11434'):
    question=crop_and_extract_with_llm(filename,'question',endpoint)
    choice1=crop_and_extract_with_llm(filename,'choice1',endpoint)
    choice2=crop_and_extract_with_llm(filename,'choice2',endpoint)
    choice3=crop_and_extract_with_llm(filename,'choice3',endpoint)
    choice4=crop_and_extract_with_llm(filename,'choice4',endpoint)
    model_id=MODEL
    prompt=f"{question}. Choice 1={choice1}, Choice 2={choice2}, Choice 3={choice3}, Choice 4={choice4}"
    print(prompt)
    payload = {
        "model": model_id,  # Replace with your actual model name
        "messages": [
            {
                "role": "system",
                "content": (
                    "You are a Kahoot quiz competitor. Your goal is to select the most accurate answer "
                    "as fast as possible, based only on the question and the available choices. "
                    "Always respond with the correct choice number only (e.g., '3') without explanation. "
                    "Avoid any extra words or reasoning — be concise and immediate."
                )
            },
            {
                "role": "user",
                "content": (
                    prompt
                )
            }
        ],
        "temperature": 0,
        "max_tokens": 512,
        "stop": ["\n"]
    }
    chat_url=f"{endpoint}/v1/chat/completions"
    response = requests.post(chat_url, json=payload)
    answer = response.json()['choices'][0]['message']['content']
    return(answer)

## 4️⃣ Kahoot Game Automation

This is the main function that connects everything together. It:

- Launches a browser session and joins a Kahoot game using the given **PIN** and **nickname**.
- Continuously listens for your input:
  - `1` → Screenshot → extract text → send to VLM → auto-click the answer
  - `q` → Exit the session
- Measures and prints the time taken to compute each answer.


In [None]:
async def join_kahoot_game(pin: int, nickname: str):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        print("Navigating to Kahoot...")
        await page.goto("https://kahoot.it/")
        print("Entering Game PIN...")
        await page.fill('input[name="gameId"]', str(pin))
        await page.press('input[name="gameId"]', 'Enter')
        await page.wait_for_selector('input[type="text"]', timeout=10000)
        print("Entering Nickname...")
        await page.fill('input[type="text"]', nickname)
        await page.press('input[type="text"]', 'Enter')
        try:
            await page.wait_for_selector("text=You're in!", timeout=10000)
            print("✅ Joined successfully")
        except:
            print("⚠️ Join failed or timed out.")
            
        while True:
            cmd = input("Type '1' to answer question or 'q' to quit: ").strip()
            if cmd == "1":
                print("Option 1: Answer question")
                #start timer
                start_time = datetime.now()
                timestamp = start_time.strftime("%Y%m%d_%H%M%S")
                #filename = f"screenshot_{timestamp}.png"
                filename = os.path.join(DEBUG_DIR, f"screenshot_{timestamp}.png")
                await page.screenshot(path=filename)
                print(f"Screenshot saved to {filename}")
                #display(Image(filename))
                display(IPyImage(filename))
                answer=get_answer_with_llm(filename,'http://localhost:11434')
                await click_browser_by_choice(page,answer)
                #end timer
                end_time = datetime.now()
                elapsed = (end_time - start_time).total_seconds()
                #output
                print(f"Answer: {answer} ({elapsed:.2f} seconds)")             
            elif cmd.lower() == "q":
                print("Exiting session.")
                break
            else:
                print("Unknown input. Please type '1' or 'q'.")
        await browser.close()

## 🚀 How to Play

Run the final cell and follow the prompts:

1. Enter the **Kahoot Game PIN**.
2. Enter your **nickname**.
3. During the game, type:
   - `1`: Answer using **text-based method**
   - `q`: Exit when the game is over

⚠️ **Do not type `q` in the middle of a quiz round**, or you won’t be able to rejoin.


In [None]:
pin = int(input("Enter Kahoot Game PIN: "))
nickname = input("Enter your nickname: ")
await join_kahoot_game(pin, nickname)

## ✅ Requirements

Before using this notebook, make sure:

- You’ve installed Playwright and run `playwright install`
- A local VLM server (e.g., [Ollama](https://ollama.com)) is running at `http://localhost:11434`
- The model you are using supports **image + text input** (e.g., `llava`, `bakllava`, etc.)
