# 🧩✨ Extra Tree‑/Graph‑of‑Thought Puzzles + LLM‑Scoring Examples

*Supplementary Lab – Day 5*

Explore **three** additional classic puzzles with both heuristic *and* **LLM‑driven** scoring options:

1. **Eight‑Puzzle** (3×3 sliding tiles)  
2. **Tower of Hanoi** (minimal-move search)  
3. **Word Ladder** (single‑letter transformations)  

Each demo is implemented with a generic `TreeOfThoughtSolver` (beam search) and shows how to plug in OpenAI scoring so the language model evaluates partial states.

> 💡 Set the `OPENAI_API_KEY` environment variable for LLM scoring, or run heuristic‑only.


## 🔧 0. Setup

In [None]:
%pip -q install --upgrade openai networkx matplotlib
import os, random, copy, math, itertools, collections
import openai, networkx as nx, matplotlib.pyplot as plt
openai.api_key = os.getenv("OPENAI_API_KEY")  # optional
MODEL = "gpt-4o-mini"


### Generic ToT Beam Solver (reuse)

Same as earlier notebook – beam search with pluggable `expand_fn` & `score_fn`.

In [None]:
class ToTSolver:
    def __init__(self, expand_fn, score_fn, beam=5, max_depth=30, verbose=False):
        self.expand=expand_fn; self.score=score_fn
        self.beam=beam; self.max_depth=max_depth; self.verbose=verbose
    def solve(self, init_state, goal_test):
        frontier=[(init_state, self.score(init_state))]
        for depth in range(self.max_depth):
            if self.verbose:
                print(f"Depth {depth}: frontier {len(frontier)}")
            new=[]
            for state, _ in frontier:
                if goal_test(state):
                    return state
                for child in self.expand(state):
                    new.append((child, self.score(child)))
            frontier=sorted(new, key=lambda x:-x[1])[:self.beam]
        return None

---

## 1️⃣ Eight‑Puzzle (Sliding Tiles)

**State**: tuple of 9 ints, 0 = blank  
**Expand**: slide blank up/down/left/right  
**Heuristic Score**: Negative Manhattan distance to goal.  
**LLM Score** (optional): Ask the model to rate closeness to goal.


In [None]:
GOAL=(1,2,3,4,5,6,7,8,0)

def pos(idx): return divmod(idx,3)

moves=[(-1,0),(1,0),(0,-1),(0,1)]
def expand_8p(state):
    idx=state.index(0)
    r,c=pos(idx)
    children=[]
    for dr,dc in moves:
        nr,nc=r+dr,c+dc
        if 0<=nr<3 and 0<=nc<3:
            nidx=nr*3+nc
            lst=list(state)
            lst[idx],lst[nidx]=lst[nidx],lst[idx]
            children.append(tuple(lst))
    return children

def manhattan(state):
    dist=0
    for i,val in enumerate(state):
        if val==0: continue
        goal_r,goal_c=pos(val-1)
        cur_r,cur_c=pos(i)
        dist+=abs(goal_r-cur_r)+abs(goal_c-cur_c)
    return -dist

def llm_score_8p(state):
    if not openai.api_key: return manhattan(state)
    prompt=f"Eight‑Puzzle board {state}. Rate on a scale -30 (far) to 0 (goal) how close this board is to solved (1 2 3 / 4 5 6 / 7 8 _). Return a single integer."
    try:
        resp=openai.ChatCompletion.create(model=MODEL,messages=[{"role":"user","content":prompt}],temperature=0)
        return int(resp.choices[0].message.content.strip())
    except:
        return manhattan(state)

start=(1,2,3,4,5,6,0,7,8)
print("Start",start)
solver8=ToTSolver(expand_8p,llm_score_8p,beam=10,max_depth=30)
solution=solver8.solve(start, lambda s:s==GOAL)
print("Solved?", solution)

#### 📝 Try‑it

* Shuffle `start` with more complex permutations.  
* Compare `manhattan` vs. `llm_score_8p`.  
* Does the LLM score ever outperform heuristics?

---

## 2️⃣ Tower of Hanoi (3 rods)

Goal: move entire stack from rod A to rod C following rules.

We demonstrate ToT beam search with **LLM pruning**: model rates partial states on “steps away from goal.”

In [None]:
NUM=4  # discs
GOAL=((),(),tuple(range(NUM,0,-1)))

def expand_hanoi(state):
    rods=[list(r) for r in state]
    children=[]
    for i in range(3):
        if not rods[i]: continue
        disc=rods[i][-1]
        for j in range(3):
            if i==j: continue
            if rods[j] and rods[j][-1]<disc: continue
            new_rods=[list(r) for r in rods]
            new_rods[i].pop()
            new_rods[j].append(disc)
            children.append(tuple(tuple(r) for r in new_rods))
    return children

def heuristic_h(state):
    return -sum(len(r)*(idx!=2) for idx,r in enumerate(state))  # more on rod C = better

def llm_score_h(state):
    if not openai.api_key: return heuristic_h(state)
    prompt=f"Tower of Hanoi state {state}. How close is this to all discs on rod C (goal)? Reply an integer from -20 (far) to 0 (goal)." 
    try:
        out=openai.ChatCompletion.create(model=MODEL,messages=[{"role":"user","content":prompt}],temperature=0)
        return int(out.choices[0].message.content.strip())
    except:
        return heuristic_h(state)

init=(tuple(range(NUM,0,-1)),(),())
solverH=ToTSolver(expand_hanoi,llm_score_h,beam=15,max_depth=40)
sol=solverH.solve(init, lambda s:s==GOAL)
print("Success:", sol==GOAL)

#### 📝 Challenge

* Raise `NUM` to 5 or 6 discs. Beam search blows up – can LLM scores guide search better?  
* Compare required depth vs. theoretical minimum (2^n‑1).

---

## 3️⃣ Word Ladder (LLM‑scored)

Transform *start* ➜ *end* by changing **one letter** at a time, each intermediate must be an English word.

LLM helps score partial ladders by evaluating semantic closeness or plausibility.

In [None]:
import requests, string
wordlist=[w.strip() for w in requests.get('https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt').text.splitlines() if len(w)==5]
DICT=set(wordlist)

def neighbors(word):
    out=[]
    for i,ch in enumerate(word):
        for l in string.ascii_lowercase:
            if l==ch: continue
            new=word[:i]+l+word[i+1:]
            if new in DICT:
                out.append(new)
    return out

def expand_ladder(state):
    path=state
    last=path[-1]
    return [path+[n] for n in neighbors(last) if n not in path]

target="crown"; start="stone"

def heuristic_ladder(path):
    last=path[-1]
    matches=sum(a==b for a,b in zip(last,target))
    return matches

def llm_score_ladder(path):
    if not openai.api_key: return heuristic_ladder(path)
    prompt=f"We are playing Word Ladder. Current path: {path}. Target word: '{target}'. Rate how promising this path is (0‑5, 5=almost there). Return a number."
    try:
        r=openai.ChatCompletion.create(model=MODEL,messages=[{"role":"user","content":prompt}],temperature=0)
        return float(r.choices[0].message.content.strip())
    except:
        return heuristic_ladder(path)

solverW=ToTSolver(expand_ladder,llm_score_ladder,beam=25,max_depth=10)
solution=solverW.solve([start], lambda p:p[-1]==target)
print("Solution path:", solution)

#### 📝 Experiments

* Change `start`/`target` pair.  
* Observe how LLM scoring differs from naive letter‑match heuristic – does it favor meaningful intermediates?

---

## 🔗 References & Ideas

* Silver et al., “Mastering the Game of Go with Deep Neural Networks and Tree Search”, 2016 – classic example of MCTS  
* Yao et al., 2023 – ToT with game & math puzzles  
* Long & Bosch, 2024 – Graph‑of‑Thought memory graphs

Happy puzzling!