# Summary

I'm using this notebook to reproduce interesting parts of the DeepLearning.ai course [Reinforcement Fine-Tuning LLMs with GRPO](https://learn.deeplearning.ai/courses/reinforcement-fine-tuning-llms-grpo/lesson/sjbja/introduction).

# Lesson three: Build an AI that actually plays Wordle, using an LLM

The course first starts doing code in lesson three.

## Setup

In [15]:
import os
from dotenv import load_dotenv
from openai import OpenAI

from predibase import Predibase

In [21]:
_ = load_dotenv(override=True) # reload ok here

# Predibase provides a OpenAI chat completions v1 endpoint, which is what the tutorial uses
# the model URL changes by user and I've created my own based on docs at https://docs.predibase.com/user-guide/inference/migrate-openai
client = OpenAI(
    base_url=os.environ['PREDIBASE_QWEN_MODEL_URL'], 
    api_key=os.environ['PREDIBASE_API_KEY'],
)

In [3]:
from transformers import AutoTokenizer

base_model_id = 'Qwen/Qwen2.5-7B-Instruct'
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

In [4]:
SYSTEM_PROMPT = """
You are playing Wordle, a word-guessing game.

### Game Rules:
- You have **6 tries** to guess a secret **5-letter** word.
- Each guess must be a valid **5-letter English word**.
- After each guess, you will receive feedback indicating how close 
your guess was.

### Feedback Format:
Each letter in your guess will receive one of three symbols:
1. ✓ : The letter is in the word and in the CORRECT position.
2. - : The letter is in the word but in the WRONG position.
3. x : The letter is NOT in the word.

### Example:
Secret Word: BRISK

Guess 1: STORM → Feedback: S(-) T(x) O(x) R(-) M(x)
Guess 2: BRAVE → Feedback: B(✓) R(✓) A(x) V(x) E(x)
Guess 3: BRISK → Feedback: B(✓) R(✓) I(✓) S(✓) K(✓)

### Response Format:
Think through the problem and feedback step by step. Make sure to 
first add your step by step thought process within <think> </think> 
tags. Then, return your guessed word in the following format: 
<guess> guessed-word </guess>.
"""

In [5]:
from dataclasses import dataclass
from enum import Enum
from typing import List

class LetterFeedback(Enum):
    CORRECT = "✓"
    WRONG_POS = "-"
    WRONG_LETTER = "x"

@dataclass
class GuessWithFeedback:
    guess: str
    feedback: List[LetterFeedback]

    def __repr__(self) -> str:
        "A human (and LLM)-readable string showing the guess and letter-by-letter feedback"
        feedback_str = " ".join(f"{letter}({fb.value})" for letter, fb in zip(self.guess, self.feedback))
        return f"{self.guess} -> Feedback: {feedback_str}"

In [22]:
def render_user_prompt(past_guesses: List[GuessWithFeedback]) -> str:
    "Creates a user-facing prompt that includes past guesses and the matching feedback."
    prompt = "Make a new 5-letter word guess."
    if past_guesses:
        prompt += "\n\nHere is some previous feedback:"
        for i, guess in enumerate(past_guesses):
            prompt += f"\nGuess {i+1}: {guess}"
    return prompt

In [23]:
def render_prompt(past_guesses: List[GuessWithFeedback]) -> str:
    """Formats a full chat prompt using a system message, user prompt, and assistant 
    preamble to start the model's step-by-step reasoning."""
    messages = [
        {
            "role": "system",
            "content": SYSTEM_PROMPT
        },
        {
            "role": "user",
            "content": render_user_prompt(past_guesses)
        },
        {
            "role": "assistant",
            "content": "Let me solve this step by step.\n<think>"
        }
    ]

    return tokenizer.apply_chat_template(messages, tokenize=False, continue_final_message=True)

In [24]:
def generate_stream(prompt: str, adapter_id: str = "") -> str:
    "Streams a model-generated response from a prompt in real time, printing as tokens arrive."
    response = client.completions.create(
        model=adapter_id,
        prompt=prompt,
        temperature=0.0, # deterministic, for evaluation
        max_tokens=2048,
        stream=True,
    )

    completion = ""
    for chunk in response:
        if chunk.choices[0].text is not None:
            content = chunk.choices[0].text
            print(content, end="", flush=True)
            completion += content
    print

    return completion

## Play a single Worldle turn, using the LLM

In [25]:
secret_word = "CRAFT"

past_guesses = [
    GuessWithFeedback(
        "CRANE", [
            LetterFeedback.CORRECT,
            LetterFeedback.CORRECT,
            LetterFeedback.CORRECT,
            LetterFeedback.WRONG_LETTER,
            LetterFeedback.WRONG_LETTER,
        ]),
    GuessWithFeedback(
        "CRASH", [
            LetterFeedback.CORRECT,
            LetterFeedback.CORRECT,
            LetterFeedback.CORRECT,
            LetterFeedback.WRONG_LETTER,
            LetterFeedback.WRONG_LETTER,
        ]),
]

prompt = render_prompt(past_guesses)
print(prompt)

<|im_start|>system

You are playing Wordle, a word-guessing game.

### Game Rules:
- You have **6 tries** to guess a secret **5-letter** word.
- Each guess must be a valid **5-letter English word**.
- After each guess, you will receive feedback indicating how close 
your guess was.

### Feedback Format:
Each letter in your guess will receive one of three symbols:
1. ✓ : The letter is in the word and in the CORRECT position.
2. - : The letter is in the word but in the WRONG position.
3. x : The letter is NOT in the word.

### Example:
Secret Word: BRISK

Guess 1: STORM → Feedback: S(-) T(x) O(x) R(-) M(x)
Guess 2: BRAVE → Feedback: B(✓) R(✓) A(x) V(x) E(x)
Guess 3: BRISK → Feedback: B(✓) R(✓) I(✓) S(✓) K(✓)

### Response Format:
Think through the problem and feedback step by step. Make sure to 
first add your step by step thought process within <think> </think> 
tags. Then, return your guessed word in the following format: 
<guess> guessed-word </guess>.
<|im_end|>
<|im_start|>user
Make

In [26]:
base_completion = generate_stream(prompt)

From the feedback, we know:
- C is in the correct position (1st letter)
- R is in the correct position (2nd letter)
- A is in the correct position (3rd letter)
- N is not in the word
- E is not in the word
- S is not in the word
- H is not in the word

This means the word must contain the letters C, R, and A, and the other two letters must be from the set {B, D, F, G, I, J, K, L, M, O, P, Q, T, U, V, W, X, Y, Z}.

Let's try a word that includes C, R, and A, and uses other letters from the available set.</think>
<guess>CRANE</guess>

At this point in the tutorial, they call generate stream but against a model that they've fine-tuned using RFT, and show the much better results. I haven't trained that model yet - I can do it in what they call lesson eight - so can't run the below. (I think that custom fine-tuned models aren't shared, like the base model I'm using above is, so I can't use the one they trained.)

In [29]:
# prompt fine-tuned model
# ft_completion = generate_stream(prompt, adapter_id="wordle-dlai/2")

## Playing a complete game of Wordle

In [30]:
import re

def get_feedback(guess: str, secret_word: str) -> List[LetterFeedback]:
    "Generate a list of tripartite feedback for each letter of the guess."
    valid_letters = set(secret_word)
    feedback = []
    for letter, secret_letter in zip(guess, secret_word):
        if letter == secret_letter:
            feedback.append(LetterFeedback.CORRECT)
        elif letter in valid_letters:
            feedback.append(LetterFeedback.WRONG_POS)
        else:
            feedback.append(LetterFeedback.WRONG_LETTER)
    return feedback

In [33]:
def next_turn(
    past_guesses: List[GuessWithFeedback],
    secret_word: str,
    adapter_id = ""
):
    """Do a full turn: build a prompt incorporating previous guesses, 
    use the LLM to get a new guess, and check and score it."""
    prompt = render_prompt(past_guesses)
    completion = generate_stream(prompt, adapter_id)
    match = re.search(r"<guess>\s*(.*?)\s*</guess>", completion, re.DOTALL)
    if not match:
        print(completion)
        raise RuntimeError("Invalid guess - no guess found in response.")

    guess = match.group(1).upper()
    feedback = get_feedback(guess, secret_word)
    past_guesses.append(GuessWithFeedback(guess, feedback))
    print("\n\n")
    print(("-" * 100) + "\n")
    for past_guess in past_guesses:
        print(past_guess)

    if guess == secret_word:
        print("SUCCESS - win")
    elif len(past_guesses) >= 6:
        print("LOSS - ran out of guesses")

In [49]:
secret_word = "BRICK"
# secret_word = "STAIR"
past_guesses = []
adapter_id = ""

In [50]:
next_turn(past_guesses, secret_word, adapter_id)

 
- The word is 5 letters long.
- We have 6 tries to guess the word.
- We need to use valid 5-letter English words.
- We should try to use words that cover a wide range of letters to maximize our chances of getting feedback.
- A good starting word could be a common word that includes a mix of vowels and consonants, such as "CRANE".
</think>
<guess> CRANE </guess>


----------------------------------------------------------------------------------------------------

CRANE -> Feedback: C(-) R(✓) A(x) N(x) E(x)


In [51]:
next_turn(past_guesses, secret_word, adapter_id)

From the feedback, we know:
- R is in the correct position (2nd letter).
- C, A, N, and E are not in the word at all.
- The word is 5 letters long and must include R.

Given these constraints, we can try to guess a word that includes R and avoids the letters C, A, N, and E. A good candidate could be a common 5-letter word that includes R and doesn't use these letters. 

Some possible words could be "BRICK", "BRINE", "BRIDE", etc. Let's try "BRICK" as it is a common word and fits the criteria.</think>
<guess>BRICK</guess>


----------------------------------------------------------------------------------------------------

CRANE -> Feedback: C(-) R(✓) A(x) N(x) E(x)
BRICK -> Feedback: B(✓) R(✓) I(✓) C(✓) K(✓)
SUCCESS - win


Now, if I had the further trained model, I could try it, and judging from the course video, it'd work well. For now, I don't have a model, as noted above, so can't.

In [37]:
# secret_word = "BRICK"
# past_guesses = []
# adapter_id = "wordle-dlai/2"

In [None]:
# next_turn(past_guesses, secret_word, adapter_id)