## Background

A team of researchers recently [published a paper](https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf) claiming that large reasoning models (LRMs) are not capable of "thinking" or making logical choices using some simple logical puzzles. In the paper, LLMs (Large Language Models) and LRMs are both capable of solving the puzzles when low-complexity versions of the puzzles are presented, but when high-complexity versions are presented, both LRMs and LLMs are incapable of logic. 

In this post, I'll be referring to the Tower of Hanoi puzzle, one of the puzzles used in the paper. The rules and an explanation of the game can be found on [Wikipedia](https://en.wikipedia.org/wiki/Tower_of_Hanoi). 

In this post, I'll attempt to modify the experiment design to account for two possible flaws in the paper's methodology: 
1. Perhaps the language models were failing simply because the context is too small
2. Perhaps the language models were failing because they did not have a way to update and understand the "current state" of the problem. 

I had some personal experience with the second potential flaw when, in an attempt to prove that I am smarter than a machine, I tried to write out all of the moves necessary to solve the Tower of Hanoi with 9 rings. I ended up spending a lot of time trying to go back through moves and remember what the "current state" was. However, when I solved the puzzle in a way that I could always see the "current state", I was a lot faster. 




## Game setup

Okay, here we will set up some functions so that the LLM can play our game. 

In [17]:
import numpy as np
import re
STACK_OFFSET = 3 ## An offset from the "current state" string to identify where the stacks start
STACK_LETTERS = "ABC" ## All of the letters in stacks
INCORRECT_MOVE_LANGUAGE = "Invalid move. "

def process_move(move, previous_state):
    """
    Take a move an either return the state after the move was created or return None.
    """
    if len(move) != 2:
        return None
    move_disc = move[0]
    move_stack = move[1]
    
    
    if not move_disc.isdigit():
        return None
    if move_stack not in STACK_LETTERS:
        return None
    previous_state_list = previous_state.split("\n")
    final_discs = [previous_state_list[STACK_OFFSET + i].strip()[-1] for i in range(3)]
    if move_disc not in final_discs:

        return None
    final_disc_stack = previous_state_list[STACK_OFFSET + STACK_LETTERS.index(move_stack)]
    
    if final_discs[STACK_LETTERS.index(move_stack)].isdigit() and (move_disc > final_discs[STACK_LETTERS.index(move_stack)]):
        return None
    
    previous_state_list[STACK_OFFSET + final_discs.index(move_disc)] = previous_state_list[STACK_OFFSET + final_discs.index(move_disc)][:-2]
    previous_state_list[STACK_OFFSET + STACK_LETTERS.index(move_stack)] += " " + move_disc
    
    return "\n".join(previous_state_list)
    
    
    

def get_next_state(previous_state, response):
    """
    Get the next state from a move, calling process_move. Parse the move from a LLM response, and 
    """
    
    move = re.sub("[^\w\s]", "", response).strip().split("\n")[-1].split(" ")[-1]
    next_state = process_move(move, previous_state)
    if next_state is not None:
        return next_state.replace(INCORRECT_MOVE_LANGUAGE, "")
    else:
        return INCORRECT_MOVE_LANGUAGE + previous_state.replace(INCORRECT_MOVE_LANGUAGE, "")
    
    
def get_valid_moves(current_state):
    """
    Get a list of possible valid moves based on the current staate of the game
    """
    end_chars = [x.strip()[-1] for x in current_state.split("\n")[STACK_OFFSET:STACK_OFFSET+3]]
    valid_moves = []
    for i, disc in enumerate(end_chars):
        if disc.isdigit():
            for j in range(3):
                if j == i:
                    continue
                if end_chars[j] <= disc:
                    continue
                valid_moves.append(disc + STACK_LETTERS[j])
    return valid_moves

def get_random_move(current_state):
    """Choose a random move"""
    return np.random.choice(get_valid_moves(current_state))

def get_full_stack(num_discs):
    """Get a full stack of discs together in the proper order. To be used in building a system prompt."""
    full_stack = ""
    for i in range(num_discs):
        full_stack = " " + str(i) + full_stack
    return full_stack


def get_game_start_end(num_discs):
    """
    Get starting and final states of the game. 
    """
    full_stack = get_full_stack(num_discs)
    starting_game = f"""The current state of the game is:

```
A | {full_stack}
B | 
C | 
```"""
    
    ending_game = f"""The current state of the game is:

```
A | 
B | 
C | {full_stack}
```"""
    return starting_game, ending_game
    
                
    
    
    

I used this chunk to debug the gameplay and demonstrate how long it takes for a model to solve the puzzle at random.

In [18]:
current_state, ending_state = get_game_start_end(2)
num_iterations = 0
num_discs = 3

current_state, ending_state = get_game_start_end(num_discs)

## Looking at 
while current_state != ending_state:
    num_iterations += 1
    if num_iterations > 20:
        break
    move = get_random_move(current_state)
    current_state = get_next_state(current_state, move)
    
    print(current_state)
    print(get_valid_moves(current_state))
    
print(f"Completed the game at {num_iterations} iterations")

The current state of the game is:

```
A |  2 1
B |  0
C | 
```
['1C', '0A', '0C']
The current state of the game is:

```
A |  2 1 0
B | 
C | 
```
['0B', '0C']
The current state of the game is:

```
A |  2 1
B | 
C |  0
```
['1B', '0A', '0B']
The current state of the game is:

```
A |  2 1 0
B | 
C | 
```
['0B', '0C']
The current state of the game is:

```
A |  2 1
B | 
C |  0
```
['1B', '0A', '0B']
The current state of the game is:

```
A |  2 1
B |  0
C | 
```
['1C', '0A', '0C']
The current state of the game is:

```
A |  2 1 0
B | 
C | 
```
['0B', '0C']
The current state of the game is:

```
A |  2 1
B |  0
C | 
```
['1C', '0A', '0C']
The current state of the game is:

```
A |  2 1
B | 
C |  0
```
['1B', '0A', '0B']
The current state of the game is:

```
A |  2
B |  1
C |  0
```
['1A', '0A', '0B']
The current state of the game is:

```
A |  2 0
B |  1
C | 
```
['0B', '0C', '1C']
The current state of the game is:

```
A |  2 0
B | 
C |  1
```
['0B', '0C', '1B']
The current state of t

Now, we're ready to set up a function that calls the LLM to play the game. 

In [29]:
from config import OPENAI_API_KEY
import os
from openai import OpenAI
import re
import time

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

def llm_plays_tower_hanoi(num_discs, model, restrict_token_length = False, 
                          max_num_iterations = 50, seconds_sleep_per_step = 1, temperature = 1, starting_current_state = None):
    """Have an openAI model play the tower of hanoi
    
    Args:
        num_discs: The number of discs to include in the game
        model: The model to use
        restrict_token_length: Whether to restrict the model's input to only include the most recent game state. If True, the model will only receive the current state
        max_num_iterations: the maximum number of iterations. 
        seconds_sleep_per_step: The number of seconds to sleep for each move in gameplay. Useful if you are hitting rate limits. 
        temperature: The model's temperature
    """
    sys_prompt = f"""
You are a puzzle master completing the tower of Hanoi puzzle. 

This puzzle will be represented to you in this fashion:

A | {get_full_stack(num_discs)}
B | 
C | 

There are three stacks: A, B, and C. 

There are {num_discs} discs on the stacks, represented as numbers, with 0 being the smallest and {num_discs - 1} being the largest. Each digit is a disc

Your goal is to move all of the discs from stack A to stack C, in the proper order. 

You must follow the following rules: 
1. Only one disc may be moved at a time.
2. Each move consists of taking the top disc from one of the stacks and placing it on the top of another stack or on an empty stack.
3. No disc may be placed on top of a disc that is smaller than it. (so, disk 1 cannot be placed on top of disk 0)

The game will be represented to you in an intermediate state, where the right-most disc represents the "top" of each stack. 


At the start of the game, the puzzle looked like:

A | {get_full_stack(num_discs)}
B | 
C | 

To win, you must move all the discs to stack C, so the puzzle will look like:

A | 
B | 
C | {get_full_stack(num_discs)}

Your role is to plan ahead and make the best move based on the current state of the puzzle, so that the puzzle moves closer to the winning state. Think step by step and when you are ready, return your move as the last line of your response. 

To make a move, state the number of the disk and the stack you want to move it to in the following manner:
```
1B
```

The move `1B` moves the disc 1 to the top of stack B. The move `0C` moves the disc 0 to the top of stack C.

Think step by step about your move, considering how your move will enable future moves to reach the winning state.

When you are ready, return your chosen move as the last line of your response.
"""

    print(sys_prompt)


    client = OpenAI()



    current_state, ending_state = get_game_start_end(num_discs)
    
    if starting_current_state is not None:
        current_state = starting_current_state
        
    num_iterations = 0

    messages = [
            {"role": "system", "content": sys_prompt},
            {"role": "user", "content": current_state}
            ]


    while current_state != ending_state:
        valid_moves = "\nValid Moves are: " + "[`" + "`, `".join(get_valid_moves(current_state)) + "`]"
        
        num_iterations += 1
        if num_iterations >=  max_num_iterations:
            break


        response = client.chat.completions.create(
            model=model,
            temperature=temperature,
            messages=messages
        )
        move = response.choices[0].message.content
        print("-"*20)
        print(current_state + valid_moves)
        print("-"*20)
        print(move)

        
        messages.append({"role": "assistant", "content": move})
        
        previous_state = current_state + valid_moves
        previous_move = move
        current_state = get_next_state(current_state, move)
        
        if restrict_token_length:
            ## Only pass in the current state of the game
            messages = [
            {"role": "system", "content": sys_prompt},
            {"role": "user", "content": previous_state.replace(INCORRECT_MOVE_LANGUAGE, "")},
            {"role": "user", "content": previous_move},
            {"role": "user", "content": current_state + valid_moves}
            ]
        else:
            # Append the new game state as a user message
            messages.append({"role": "user", "content": current_state + valid_moves})
            
        time.sleep(seconds_sleep_per_step) ## Prevent myself from hitting rate limits.
    if current_state == ending_state:
        
        print(f"Completed the game at {num_iterations} iterations")
    else:
        print(f"Stoped gameplay by reaching the max number of iterations ({max_num_iterations})")

One note about tokenization: I put spaces between the "disc" numbers in the stack to ensure that those numbers are broken up into different tokens, but the actual moves (e.g. "3C") don't need a space between them because the letters and numbers are broken into different tokens automatically. 

If you're curious about how text gets tokenized for openAI models, you can check out this [cool tool](https://platform.openai.com/tokenizer) that visualizes how things get tokenized.

## Experiment 1a: an LLM plays the game with context of all previous moves. 

In [30]:
llm_plays_tower_hanoi(num_discs = 3, model = "gpt-4.1")




You are a puzzle master completing the tower of Hanoi puzzle. 

This puzzle will be represented to you in this fashion:

A |  2 1 0
B | 
C | 

There are three stacks: A, B, and C. 

There are 3 discs on the stacks, represented as numbers, with 0 being the smallest and 2 being the largest. Each digit is a disc

Your goal is to move all of the discs from stack A to stack C, in the proper order. 

You must follow the following rules: 
1. Only one disc may be moved at a time.
2. Each move consists of taking the top disc from one of the stacks and placing it on the top of another stack or on an empty stack.
3. No disc may be placed on top of a disc that is smaller than it. (so, disk 1 cannot be placed on top of disk 0)

The game will be represented to you in an intermediate state, where the right-most disc represents the "top" of each stack. 


At the start of the game, the puzzle looked like:

A |  2 1 0
B | 
C | 

To win, you must move all the discs to stack C, so the puzzle will look li

--------------------
The current state of the game is:

```
A |  2 1
B |  0
C | 
```
Valid Moves are: [`1C`, `0A`, `0C`]
--------------------
Let's analyze the situation:

Current state:  
A |  2 1  
B |  0  
C |  

Valid moves: [`1B`, `0A`, `0B`]

Disc positions:
- Disk 0 is on B.
- Disk 1 is on top of 2 on A.

We want to eventually move all disks to C.

Let's consider each move:

- `1B`: This would attempt to move disk 1 from A to B, but disk 0 (smaller) is already on B. Rules prohibit placing a larger disk on top of a smaller one. So, this is not a legal move.
- `0A`: Move disk 0 from B back to A. This would make stack A: 2 1 0, B: empty, and C: empty — returning to the previous state.
- `0B`: Disk 0 is already on B; moving it to B changes nothing.

The only move that will eventually let us progress (by unblocking B so we can move disk 1 there on the next turn) is:
```
0A
```
--------------------
The current state of the game is:

```
A |  2 1 0
B | 
C | 
```
Valid Moves are: [`0B`,

--------------------
The current state of the game is:

```
A |  2 1 0
B | 
C | 
```
Valid Moves are: [`0B`, `0C`]
--------------------
Let's analyze the current state and options:

Current state:
```
A |  2 1 0
B | 
C | 
```

Valid Moves: [`1C`, `0A`, `0C`]

- `1C`: Disk 1 is not on top (disk 0 is), so this is not a valid move now.
- `0A`: Disk 0 is already on A; this changes nothing.
- `0C`: Move disk 0 from A to C, which is the classic first move in the Tower of Hanoi for 3 disks and sets up all future moves.

Therefore, the optimal move is:

```
0C
```
--------------------
The current state of the game is:

```
A |  2 1
B | 
C |  0
```
Valid Moves are: [`1B`, `0A`, `0B`]
--------------------
Let's analyze the current state and valid moves:

Current state:
A |  2 1  
B |  
C |  0  

Valid Moves: [`0B`, `0C`]

- Disk 0 is currently on C.
- To progress, we need to move disks 1 and 2 to C, but they can't be placed on C until it's empty.

Considering the available moves:
- `0C`: Moving 

--------------------
The current state of the game is:

```
A |  2 1 0
B | 
C | 
```
Valid Moves are: [`0B`, `0C`]
--------------------
Let's analyze the situation:

Current state:
A |  2 1 0  
B |  
C |  

Valid Moves: [`1C`, `0A`, `0C`]

- `1C`: Disk 1 cannot be moved (disk 0 is on top).
- `0A`: Disk 0 is already on A; no effect.
- `0C`: Move disk 0 from A to C — this is the classic first move and opens the path for future progress.

So, the best and only meaningful move is:
```
0C
```
--------------------
The current state of the game is:

```
A |  2 1
B | 
C |  0
```
Valid Moves are: [`1B`, `0A`, `0B`]
--------------------
Let's consider the current state:

A |  2 1  
B |  
C |  0  

Valid Moves: [`0B`, `0C`]

- Disk 0 is on C. To move the larger disks (1 and 2) to C, we must first clear C.
- `0C` would simply leave disk 0 on C (no change).
- `0B` would move disk 0 to B, opening up C for the next necessary moves.

To make progress, the correct move is:
```
0B
```
------------------

The model frequently made moves that would not possibly help move to a winning state. In fact, the model only moved disc 0 the whole game!

The model does follow the directions to "think step by step", but it is not capable of making correct logical deductions as part of that process. 

## Experiment 1b: an LLM plays the game with only context from one previous move. 


In [32]:
llm_plays_tower_hanoi(num_discs = 3, model = "gpt-4.1", restrict_token_length  = True)


You are a puzzle master completing the tower of Hanoi puzzle. 

This puzzle will be represented to you in this fashion:

A |  2 1 0
B | 
C | 

There are three stacks: A, B, and C. 

There are 3 discs on the stacks, represented as numbers, with 0 being the smallest and 2 being the largest. Each digit is a disc

Your goal is to move all of the discs from stack A to stack C, in the proper order. 

You must follow the following rules: 
1. Only one disc may be moved at a time.
2. Each move consists of taking the top disc from one of the stacks and placing it on the top of another stack or on an empty stack.
3. No disc may be placed on top of a disc that is smaller than it. (so, disk 1 cannot be placed on top of disk 0)

The game will be represented to you in an intermediate state, where the right-most disc represents the "top" of each stack. 


At the start of the game, the puzzle looked like:

A |  2 1 0
B | 
C | 

To win, you must move all the discs to stack C, so the puzzle will look li

--------------------
Invalid move. The current state of the game is:

```
A |  2
B |  1 0
C | 
```
Valid Moves are: [`2C`, `0A`, `0C`]
--------------------
Let's look carefully at the current state and valid moves:

```
A |  2
B |  1 0
C | 
```

- Stack A: 2 (2 is at the top)
- Stack B: 1 0 (0 is at the top)
- Stack C: (empty)

**Valid Moves:**
- `2C`: Move disk 2 from A to C
- `0A`: Move disk 0 from B to A
- `0C`: Move disk 0 from B to C

**Immediate goal:** Move all discs to C.

Let's consider each move:

### Move `2C`  
Move disk 2 from A to C  
Resulting state:
```
A | 
B |  1 0
C |  2
```
Now, B's top is 0, and we can move it to either A or C. This move helps get the largest disk to its destination stack.

### Move `0A`  
Move 0 from B to A  
Resulting state:
```
A |  2 0
B |  1
C | 
```
Now, A has 2 on the bottom and 0 on top. Moving 1 next is blocked since 1 cannot go on top of 0.

### Move `0C`  
Move 0 from B to C  
Resulting state:
```
A |  2
B |  1
C |  0
```
Now, 2 can go t

Here, the model successfully completed the game in 12 iterations (only 5 more moves than the 7 minimum needed to complete the game with 3 moves. However, there were moves where there was a clear "correct" move (disc 2 was at the top of one stack and stack B was empty) and the model did not take the move. 

## Experiment 2a: a reasoning model solves the puzzle, with full context

I unfortunately only have access to o4-mini (my openAI developer account isn't connected to an organization) so this is the highest-quality LRM I have access to. 

In [33]:
llm_plays_tower_hanoi(num_discs = 3, model = "o4-mini")


You are a puzzle master completing the tower of Hanoi puzzle. 

This puzzle will be represented to you in this fashion:

A |  2 1 0
B | 
C | 

There are three stacks: A, B, and C. 

There are 3 discs on the stacks, represented as numbers, with 0 being the smallest and 2 being the largest. Each digit is a disc

Your goal is to move all of the discs from stack A to stack C, in the proper order. 

You must follow the following rules: 
1. Only one disc may be moved at a time.
2. Each move consists of taking the top disc from one of the stacks and placing it on the top of another stack or on an empty stack.
3. No disc may be placed on top of a disc that is smaller than it. (so, disk 1 cannot be placed on top of disk 0)

The game will be represented to you in an intermediate state, where the right-most disc represents the "top" of each stack. 


At the start of the game, the puzzle looked like:

A |  2 1 0
B | 
C | 

To win, you must move all the discs to stack C, so the puzzle will look li

--------------------
The current state of the game is:

```
A | 
B |  1
C |  2 0
```
Valid Moves are: [`1A`, `0A`, `0B`]
--------------------
0B
--------------------
The current state of the game is:

```
A | 
B |  1 0
C |  2
```
Valid Moves are: [`0A`, `0C`, `2A`]
--------------------
We need to clear disk 0 off peg B so that disk 1 can move onto disk 2 at C. The next optimal move is:

0A
--------------------
The current state of the game is:

```
A |  0
B |  1
C |  2
```
Valid Moves are: [`0B`, `0C`, `1C`]
--------------------
We need to stack disk 1 onto disk 2 at C next—but disk 0 on A is in the way of that optimal move. Moving 0 onto C (on top of 2) clears A and sets up the final transfer of disk 1. 

Therefore, the move is:

0C
--------------------
The current state of the game is:

```
A | 
B |  1
C |  2 0
```
Valid Moves are: [`1A`, `0A`, `0B`]
--------------------
We need to clear disk 0 off of peg C so that disk 1 on B can move onto disk 2 at C. Of the three listed moves, onl

This time, the model completed the game, which is an improvement. However, I am not convinced that the model is truly "reasoning" about the next step.

## Experiment 2b: using a reasoning model with limited context

In [34]:
llm_plays_tower_hanoi(num_discs = 3, model = "o4-mini", restrict_token_length  = True)


You are a puzzle master completing the tower of Hanoi puzzle. 

This puzzle will be represented to you in this fashion:

A |  2 1 0
B | 
C | 

There are three stacks: A, B, and C. 

There are 3 discs on the stacks, represented as numbers, with 0 being the smallest and 2 being the largest. Each digit is a disc

Your goal is to move all of the discs from stack A to stack C, in the proper order. 

You must follow the following rules: 
1. Only one disc may be moved at a time.
2. Each move consists of taking the top disc from one of the stacks and placing it on the top of another stack or on an empty stack.
3. No disc may be placed on top of a disc that is smaller than it. (so, disk 1 cannot be placed on top of disk 0)

The game will be represented to you in an intermediate state, where the right-most disc represents the "top" of each stack. 


At the start of the game, the puzzle looked like:

A |  2 1 0
B | 
C | 

To win, you must move all the discs to stack C, so the puzzle will look li

This time, the reasoning model never deviated from the optimal 7-move solution. 

## Experiment 3a: using a reasoning to complete a game with a large number of discs. 

In the paper on LRM reasoning, both LLMs and LRMs had a "complete collapse" in reasoning after 8-10 discs were added to the puzzle. It may be that this breakdown is simply due to the model running out of tokens, but it's also possible that the model's "reasoning ability" is incapable of handling the complexity resulting from a large number of discs being present. So here, I'll start the model with a nearly-completed puzzle with 10 disks, and evaluate whether the model is able to complete the puzzle in its nearly-finished state. 

Note that this puzzle can be completed with 15 moves (7 to move the 2-1-0 stack to B, 1 to move disc 3 to stack C, and 7 to move the 2-1-0 stack to C). This is well within the 50-move maximum, and will not exceed the model's context. 

In [35]:
starting_state = """The current state of the game is:

```
A |  3
B | 
C |  9 8 7 6 5 4 2 1 0
```"""

llm_plays_tower_hanoi(num_discs = 10, model = "o4-mini", starting_current_state = starting_state)


You are a puzzle master completing the tower of Hanoi puzzle. 

This puzzle will be represented to you in this fashion:

A |  9 8 7 6 5 4 3 2 1 0
B | 
C | 

There are three stacks: A, B, and C. 

There are 10 discs on the stacks, represented as numbers, with 0 being the smallest and 9 being the largest. Each digit is a disc

Your goal is to move all of the discs from stack A to stack C, in the proper order. 

You must follow the following rules: 
1. Only one disc may be moved at a time.
2. Each move consists of taking the top disc from one of the stacks and placing it on the top of another stack or on an empty stack.
3. No disc may be placed on top of a disc that is smaller than it. (so, disk 1 cannot be placed on top of disk 0)

The game will be represented to you in an intermediate state, where the right-most disc represents the "top" of each stack. 


At the start of the game, the puzzle looked like:

A |  9 8 7 6 5 4 3 2 1 0
B | 
C | 

To win, you must move all the discs to stack 

--------------------
The current state of the game is:

```
A |  3 1 0
B | 
C |  9 8 7 6 5 4 2
```
Valid Moves are: [`0B`, `0C`, `2B`]
--------------------
0C
--------------------
The current state of the game is:

```
A |  3 1
B | 
C |  9 8 7 6 5 4 2 0
```
Valid Moves are: [`1B`, `0A`, `0B`]
--------------------
0B
--------------------
The current state of the game is:

```
A |  3 1
B |  0
C |  9 8 7 6 5 4 2
```
Valid Moves are: [`1C`, `0A`, `0C`]
--------------------
We need to free disk 2 on peg C so that disk 3 can move there. The correct move is to shift disk 1 from A onto peg C (placing it atop disk 2):  
1C
--------------------
The current state of the game is:

```
A |  3
B |  0
C |  9 8 7 6 5 4 2 1
```
Valid Moves are: [`0A`, `0C`, `1A`]
--------------------
We need to clear disk 1 off peg C so that disk 2 can move. The only legal spot for disk 1 is onto disk 3 on peg A. Hence the next move is:

1A
--------------------
The current state of the game is:

```
A |  3 1
B |  0
C |

## Experiment 3b: using a reasoning to complete a game with a large number of discs, removing previous context

In experiment 2, the reasoning model did a lot better when it had its context restricted. Let's see if restricting its context here improves performance.

In [36]:
llm_plays_tower_hanoi(num_discs = 10, model = "o4-mini", starting_current_state = starting_state, restrict_token_length  = True)


You are a puzzle master completing the tower of Hanoi puzzle. 

This puzzle will be represented to you in this fashion:

A |  9 8 7 6 5 4 3 2 1 0
B | 
C | 

There are three stacks: A, B, and C. 

There are 10 discs on the stacks, represented as numbers, with 0 being the smallest and 9 being the largest. Each digit is a disc

Your goal is to move all of the discs from stack A to stack C, in the proper order. 

You must follow the following rules: 
1. Only one disc may be moved at a time.
2. Each move consists of taking the top disc from one of the stacks and placing it on the top of another stack or on an empty stack.
3. No disc may be placed on top of a disc that is smaller than it. (so, disk 1 cannot be placed on top of disk 0)

The game will be represented to you in an intermediate state, where the right-most disc represents the "top" of each stack. 


At the start of the game, the puzzle looked like:

A |  9 8 7 6 5 4 3 2 1 0
B | 
C | 

To win, you must move all the discs to stack 

--------------------
The current state of the game is:

```
A |  3 2 0
B | 
C |  9 8 7 6 5 4 1
```
Valid Moves are: [`0B`, `0C`, `1B`]
--------------------
To free up peg C so we can later move disk 3 there, we must first remove disk 1 from C. The only spot it can go is onto the empty peg B:

1) Move disk 1 from C to B.

So the move is:

1B
--------------------
The current state of the game is:

```
A |  3 2 0
B |  1
C |  9 8 7 6 5 4
```
Valid Moves are: [`0B`, `0C`, `1C`]
--------------------
We need to free up peg B to build the three-disk stack (disks 0,1,2) there so disk 3 can move to C. Right now B is occupied by disk 1, so our very next move should be to clear disk 1 off of B. The only legal place to move it is onto C (top of C is disk 4). After that, peg B will be empty and ready to receive disk 0, then disk 2, then disk 1 in the proper order. 

Thus the optimal move now is:

1C
--------------------
The current state of the game is:

```
A |  3 2 0
B | 
C |  9 8 7 6 5 4 1
```
Va

--------------------
Invalid move. The current state of the game is:

```
A |  1 0
B |  2
C |  9 8 7 6 5 4 3
```
Valid Moves are: [`0B`, `0C`, `2C`]
--------------------
Let’s evaluate each legal option from the current position:

A | 1 0  
B | 2  
C | 9 8 7 6 5 4 3  

1) 0B  
   – Resulting state:  
     A | 1  
     B | 2 0  
     C | 9 … 3  
   – From here, you can move 1→C (1<3) then 0→C (0<1), but after that disk 2 on B cannot go onto disk 0 (2>0). Dead end.

2) 0C  
   – Resulting state:  
     A | 1  
     B | 2  
     C | 9 … 3 0  
   – Neither 1 (needs to go onto >1) nor 2 (needs to go onto >2) can be placed on top of 0. Dead end.

3) 2C  
   – Resulting state:  
     A | 1 0  
     B |  
     C | 9 8 7 6 5 4 3 2  
   – Now you can solve the remaining two disks on A by the usual 3‐move sequence:  
     0→B, 1→C, 0→C.

Only move “2C” preserves a path to the final solution.  

2C
--------------------
The current state of the game is:

```
A |  1 0
B | 
C |  9 8 7 6 5 4 3 2
```
V

I'm a bit surprised at how bad the LLM is at playing the tower of hanoi, so as a sanity check, here I'll replicate Apple's paper.

In [57]:
def replicate_apple_procedure(num_discs, model = "gpt-4.1"):

    apple_replication_sys_prompt = f"""You are a helpful assistant. Solve this puzzle for me.
    There are three pegs and {num_discs} disks of different sizes stacked on the first peg. The disks are numbered from 1 (smallest) to n (largest). Disk moves in this puzzle should follow:
    1. Only one disk can be moved at a time.
    2. Each move consists of taking the upper disk from one stack and placing it on top of another stack.
    3. A larger disk may not be placed on top of a smaller disk.
    The goal is to move the entire stack to the third peg.
    Example: With 3 disks numbered 1 (smallest), 2, and 3 (largest), the initial state is [[3, 2, 1], [], []], and a solution might be:
    moves = [[1, 0, 2], [2, 0, 1], [1, 2, 1], [3, 0, 2], [1, 1, 0], [2, 1, 2], [1, 0, 2]]

    Thismeans: Move disk  1 from peg 0 to peg 2, then move disk 2 from peg 0 to peg 1, and so on.
    Requirements:
    • When exploring potential solutions in your thinking process, always include the corresponding complete list of moves.
    • The positions are 0-indexed (the leftmost peg is 0).
    • Ensure your final answer includes the complete list of moves in the format:
           moves = [[disk id, from peg, to peg], ...]
    """

    apple_replication_user_prompt = f"""
    The initial state of the game is: [[{",".join([str(num_discs - i ) for i in range(num_discs)])}], [], []]

    The winning state of the game is: [[], [], [{",".join([str(num_discs - i ) for i in range(num_discs)])}]]

    Share your complete list of moves to move from the initial state to the winning state
    """


    messages = [
                {"role": "system", "content": apple_replication_sys_prompt},
        {"role": "user", "content": apple_replication_user_prompt}
    ]


    client = OpenAI()

    response = client.chat.completions.create(
                model=model,
                messages=messages
            )

    print(apple_replication_sys_prompt)
    print("-"*20)
    print(apple_replication_user_prompt)
    print("-"*20)
    print(response.choices[0].message.content)



In [58]:
replicate_apple_procedure(num_discs = 3)

You are a helpful assistant. Solve this puzzle for me.
    There are three pegs and 3 disks of different sizes stacked on the first peg. The disks are numbered from 1 (smallest) to n (largest). Disk moves in this puzzle should follow:
    1. Only one disk can be moved at a time.
    2. Each move consists of taking the upper disk from one stack and placing it on top of another stack.
    3. A larger disk may not be placed on top of a smaller disk.
    The goal is to move the entire stack to the third peg.
    Example: With 3 disks numbered 1 (smallest), 2, and 3 (largest), the initial state is [[3, 2, 1], [], []], and a solution might be:
    moves = [[1, 0, 2], [2, 0, 1], [1, 2, 1], [3, 0, 2], [1, 1, 0], [2, 1, 2], [1, 0, 2]]

    Thismeans: Move disk  1 from peg 0 to peg 2, then move disk 2 from peg 0 to peg 1, and so on.
    Requirements:
    • When exploring potential solutions in your thinking process, always include the corresponding complete list of moves.
    • The positions are

And here, I'll do one final demonstration to show that I am in fact smarter than a machine. I bet the model cannot solve the 9-disc puzzle that I did

In [59]:
replicate_apple_procedure(num_discs = 9)

You are a helpful assistant. Solve this puzzle for me.
    There are three pegs and 9 disks of different sizes stacked on the first peg. The disks are numbered from 1 (smallest) to n (largest). Disk moves in this puzzle should follow:
    1. Only one disk can be moved at a time.
    2. Each move consists of taking the upper disk from one stack and placing it on top of another stack.
    3. A larger disk may not be placed on top of a smaller disk.
    The goal is to move the entire stack to the third peg.
    Example: With 3 disks numbered 1 (smallest), 2, and 3 (largest), the initial state is [[3, 2, 1], [], []], and a solution might be:
    moves = [[1, 0, 2], [2, 0, 1], [1, 2, 1], [3, 0, 2], [1, 1, 0], [2, 1, 2], [1, 0, 2]]

    Thismeans: Move disk  1 from peg 0 to peg 2, then move disk 2 from peg 0 to peg 1, and so on.
    Requirements:
    • When exploring potential solutions in your thinking process, always include the corresponding complete list of moves.
    • The positions are

## Conclusions:

This was a fun way to do a modest replication/extension the paper on language model reasoning. 
1. This replication reduces my confidence in LLMs and LRMs ability to "reason" through problems more than the original paper did. If LLMs and LRMs are not able to reliably choose the next step to solve the Tower of Hanoi when the current state is presented to them, I am not confident that there is any logic/reasoning happening. 
2. From my limited experience, it does seem that LRMs have a better ability to "reason" than LLMs do. Perhaps a stronger reasoning model would have more "reasoning" ability than the one I used and be more able to solve the puzzle when prompted in this fashion
3. LLMs seem to be better at giving solutions to models all at once, rather than looking at the current state of the problem and making incremental fixes. This does seem to be a fundamental weakness in LLMs, as a lot of real-world work that happens takes place in incremental fashions, where you are not exactly at the start state or end state of a problem.
4. It seems that restricting context available to a reasoning model helps it to improve reasoning performance and avoid repeating poor moves. 


## Future steps:

1. It would be interesting to do more trials and explore more quantitatively how LLMs and LRMs compare, and how the context provided influences performance. 
2. It would be interesting to see how this would perform on a better reasoning model than o4-mini. If any of you are curious and want to extend this work with a a better reasoning model, I would be interested to hear about it!
3. It might be interesting to extend the length of the context for the models that are provided with the limited context window (i.e. give them 5 previous context states). 
4. It would be interesting to see if a voting system with multiple agents would be more likely to return the correct answer. 
5. The system prompt could probably be improved to improve model responsiveness. 

