# DX 704 Week 5 Project

This week's project will test your understanding of this week's concepts by asking you to simulate various algorithms by hand.
You will apply search, minimax and dynamic programming concepts to solve a variety of small planning problems.

The full project description, a template notebook and supporting materials are available on GitHub: [Project 5 Materials](https://github.com/bu-cds-dx704/dx704-project-05).

## Example Code

This week's assignment does not involve any coding.

## Part 1: Searching vs Games

Consider the state space illustrated below.
Each terminal state state is marked with a reward for reaching that state.
Each non-terminal state has two possible actions represented by the two outgoing arrows to later (lower) states.
The only rewards are for reaching the terminal states, there are no diminishing returns (i.e. $\gamma=1$), and there is no randomness so actions may be freely chosen.

![](https://github.com/bu-cds-dx704/dx704-project-05/blob/main/part1.png?raw=true)

Solve for the value of each non-terminal state according to the following three scenarios.

1. Search: There is one agent that picks all actions with the goal of maximizing the final reward.
2. Minimax: There are two agents P1 and P2. P1 controls the actions for the 1st and 3rd rows (i.e. the states marked A and D-G) while P2 controls the actions for the 2nd and 4th rows (i.e. the states B-C and H-J). P1 seeks to maximize the final reward, and P2 seeks to minimize the final reward.
3. Maximin: P1 and P2 control the same states as before, but P1 seeks to minimize the final reward, and P2 seeks to maximize the final reward.

Save your results in a file "values-1.tsv" with the column state with label A-J and columns search_value, minimax_value, and maximin_value that respectively correspond to the three scenarios.

Hint: Print out the image above and compute the values by hand in a bottom up fashion.

Submit the file "values-1.tsv" in Gradescope.

In [None]:
import pandas as pd

rows = [
    ("A", 9,   1,   0),
    ("B", 9,  -1,   1),
    ("C", 1,   1,   0),
    ("D", 9,  -1,   1),
    ("E", 3,   2,   1),
    ("F", 1,   1,  -1),
    ("G", 1,   1,   0),
    ("H", 9,  -1,   9),
    ("I", 1, -20,   1),
    ("J", 3,   2,   3),
]

df = pd.DataFrame(rows, columns=["state", "search_value", "minimax_value", "maximin_value"])

df.to_csv("values-1.tsv", sep="\t", index=False)
print("File saved as values-1.tsv")

File saved as values-1.tsv


## Part 2: Picking Up Sticks

The state space illustrated below corresponds to a variation of the game [Nim](https://en.wikipedia.org/wiki/Nim).
States labeled with a prefix of "p1_" correspond to states where player P1 chooses the action while states labeled with a prefix of "p2_" correspond to states where player P2 chooses the action.
The number in the suffix is the number of "sticks" remaining.
The players take turns choosing actions, and each action corresponds to removing one or two sticks.
When there are no more sticks, the player who would have picked an action loses.


![](https://github.com/bu-cds-dx704/dx704-project-05/blob/main/part2.png?raw=true)

For example, from the state labeled "p1_1", there is one stick left, player P1 removes the last stick, and player P2 loses.
The loss for P2 is represented by a final reward of +1.
A loss for P1 is represented by a final reward of -1.
Player P1 tries to maximize the final reward, and player P2 tries to minimize the final reward.

Solve for the value of each of the non-terminal states.
Save the results in a file "values-2.tsv" with columns state and value.

Submit the file "values-2.tsv" in Gradescope.

In [None]:
import pandas as pd

def solve_values(max_sticks=5):
    """
    States: p1_k (P1 to move), p2_k (P2 to move), k sticks left.
    Moves: remove 1 or 2 sticks (if available).
    Terminal: if it's your turn and k==0, you lose.
      -> value = +1 if P1 wins, -1 if P1 loses.
    """
    memo = {}

    def V(player, k):
        if (player, k) in memo:
            return memo[(player, k)]

        if k == 0:
            memo[(player, k)] = -1 if player == "p1" else +1
            return memo[(player, k)]

        next_ks = [k - 1] + ([k - 2] if k >= 2 else [])
        if player == "p1":  
            val = max(V("p2", nk) for nk in next_ks)
            val = min(V("p1", nk) for nk in next_ks)

        memo[(player, k)] = val
        return val

    rows = []
    for k in range(1, max_sticks + 1):
        rows.append((f"p1_{k}", V("p1", k)))
        rows.append((f"p2_{k}", V("p2", k)))
    return rows

rows = solve_values(5)

df = pd.DataFrame(rows, columns=["state", "value"]).sort_values("state")
df.to_csv("values-2.tsv", sep="\t", index=False)
print("Wrote values-2.tsv")

Wrote values-2.tsv


## Part 3: Searching a Maze

Consider the following maze.

![](https://github.com/bu-cds-dx704/dx704-project-05/blob/main/part3.png?raw=true)

State C is a terminal state giving reward +100.
The remaining states have a reward of -1 when they are reached.
So moving to state F has a value of +99 do to the reward of -1 at state F and the optimal action of moving to state C for the reward of +100 afterwards.

Compute the values for states A-J and S and save them in a file "values-3.tsv" with columns state and value.

Submit "values-3.tsv" in Gradescope.

In [None]:
import pandas as pd

neighbors = {
    "A": ["S"],
    "S": ["A", "B"],
    "B": ["S", "D", "E"],
    "D": ["B", "E", "G"],
    "E": ["B", "D", "H"],
    "G": ["D", "I"],
    "H": ["E", "I"],
    "I": ["G", "H", "J"],
    "J": ["I", "F"],
    "F": ["J", "C"],   
    "C": []            
}

reward = {s: -1 for s in neighbors}
reward["C"] = 100

V = {s: 0.0 for s in neighbors}
V["C"] = reward["C"]     
for _ in range(200):
    delta = 0.0
    for s in neighbors:
        if s == "C":
            continue
        nxts = neighbors[s]
        new_v = reward[s] + (max(V[t] for t in nxts) if nxts else 0.0)
        delta = max(delta, abs(new_v - V[s]))
        V[s] = new_v
    if delta < 1e-9:
        break

order = ["A","S","B","D","E","G","H","I","J","F"]
df = pd.DataFrame([(s, V[s]) for s in order], columns=["state","value"])
df.to_csv("values-3.tsv", sep="\t", index=False)
print("Wrote values-3.tsv")

Wrote values-3.tsv


## Part 4: Acknowledgements

If you discussed this assignment with anyone, please acknowledge them here.
If you did this assignment completely on your own, simply write none below.

If you used any libraries not mentioned in this module's content, please list them with a brief explanation what you used them for. If you did not use any other libraries, simply write none below.

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the generative AI policy. If you did not use any generative AI tools, simply write none below.

In [5]:
from pathlib import Path

ACK_FILE = Path("acknowledgments.txt")
ACK_TEXT = "My dad helped me with this"

def main():
    ACK_FILE.write_text(ACK_TEXT)
    print(f"Saved acknowledgments to: {ACK_FILE.resolve()}")

if __name__ == "__main__":
    main()

Saved acknowledgments to: /workspaces/dx704-project-05/acknowledgments.txt
