# DX 704 Week 5 Project

This week's project will test your understanding of this week's concepts by asking you to simulate various algorithms by hand.
You will apply search, minimax and dynamic programming concepts to solve a variety of small planning problems.

The full project description, a template notebook and supporting materials are available on GitHub: [Project 5 Materials](https://github.com/bu-cds-dx704/dx704-project-05).

## Example Code

This week's assignment does not involve any coding.

## Part 1: Searching vs Games

Consider the state space illustrated below.
Each terminal state state is marked with a reward for reaching that state.
Each non-terminal state has two possible actions represented by the two outgoing arrows to later (lower) states.
The only rewards are for reaching the terminal states, there are no diminishing returns (i.e. $\gamma=1$), and there is no randomness so actions may be freely chosen.

![](https://github.com/bu-cds-dx704/dx704-project-05/blob/main/part1.png?raw=true)

Solve for the value of each non-terminal state according to the following three scenarios.

1. Search: There is one agent that picks all actions with the goal of maximizing the final reward.
2. Minimax: There are two agents P1 and P2. P1 controls the actions for the 1st and 3rd rows (i.e. the states marked A and D-G) while P2 controls the actions for the 2nd and 4th rows (i.e. the states B-C and H-J). P1 seeks to maximize the final reward, and P2 seeks to minimize the final reward.
3. Maximin: P1 and P2 control the same states as before, but P1 seeks to minimize the final reward, and P2 seeks to maximize the final reward.

Save your results in a file "values-1.tsv" with the column state with label A-J and columns search_value, minimax_value, and maximin_value that respectively correspond to the three scenarios.

Hint: Print out the image above and compute the values by hand in a bottom up fashion.

Submit the file "values-1.tsv" in Gradescope.

state	search_value	minimax_value	maximin_value
A	20	1	0
B	20	1	9
C	1	1	0
D	20	1	9
E	5	5	3
F	1	1	-1
G	1	1	0
H	9	-1	9
I	20	1	20
J	3	2	3


In [3]:
import os
import pandas as pd

# ---- values for Part 1 ----
data = [
    ("A", 20, 1, 0),
    ("B", 20, 1, 9),
    ("C",  1, 1, 0),   # minimax(C) = min(1,1) = 1
    ("D", 20, 1, 9),
    ("E",  5, 5, 3),
    ("F",  1, 1, -1),
    ("G",  1, 1, 0),
    ("H",  9, -1, 9),
    ("I", 20, 1, 20),
    ("J",  3, 2, 3),
]

df = pd.DataFrame(data, columns=["state","search_value","minimax_value","maximin_value"])

# pick a write location that exists
out_dir = "/mnt/data" if os.path.isdir("/mnt/data") else "."
out_path = os.path.join(out_dir, "values-1.tsv")

df.to_csv(out_path, sep="\t", index=False)
print(f"Saved to: {out_path}")
print(df)


Saved to: ./values-1.tsv
  state  search_value  minimax_value  maximin_value
0     A            20              1              0
1     B            20              1              9
2     C             1              1              0
3     D            20              1              9
4     E             5              5              3
5     F             1              1             -1
6     G             1              1              0
7     H             9             -1              9
8     I            20              1             20
9     J             3              2              3


## Part 2: Picking Up Sticks

The state space illustrated below corresponds to a variation of the game [Nim](https://en.wikipedia.org/wiki/Nim).
States labeled with a prefix of "p1_" correspond to states where player P1 chooses the action while states labeled with a prefix of "p2_" correspond to states where player P2 chooses the action.
The number in the suffix is the number of "sticks" remaining.
The players take turns choosing actions, and each action corresponds to removing one or two sticks.
When there are no more sticks, the player who would have picked an action loses.


![](https://github.com/bu-cds-dx704/dx704-project-05/blob/main/part2.png?raw=true)

For example, from the state labeled "p1_1", there is one stick left, player P1 removes the last stick, and player P2 loses.
The loss for P2 is represented by a final reward of +1.
A loss for P1 is represented by a final reward of -1.
Player P1 tries to maximize the final reward, and player P2 tries to minimize the final reward.

Solve for the value of each of the non-terminal states.
Save the results in a file "values-2.tsv" with columns state and value.

In [4]:
import csv
from functools import lru_cache
MAX=5
@lru_cache(None)
def V(p,n):
    if n==0: return -1 if p==1 else 1
    ns=[n-1]+([n-2] if n>1 else [])
    return (max if p==1 else min)(V(3-p,m) for m in ns)
rows=[(f"p1_{n}",V(1,n)) for n in range(1,MAX+1)]+[(f"p2_{n}",V(2,n)) for n in range(1,MAX+1)]
with open("values-2.tsv","w",newline="") as f:
    csv.writer(f,delimiter="\t").writerows([["state","value"],*rows])
print("\n".join(f"{s}\t{v}" for s,v in rows))


p1_1	1
p1_2	1
p1_3	-1
p1_4	1
p1_5	1
p2_1	-1
p2_2	-1
p2_3	1
p2_4	-1
p2_5	-1


In [5]:
import csv
from functools import lru_cache

@lru_cache(None)
def V(player, n):
    if n == 0:  # player to move loses
        return -1 if player == 1 else 1
    nxt = (n-1, n-2) if n > 1 else (n-1,)
    choose = max if player == 1 else min
    return choose(V(3 - player, m) for m in nxt)

rows = [("state","value")]
rows += [(f"p1_{n}", V(1, n)) for n in range(1, 6)]
rows += [(f"p2_{n}", V(2, n)) for n in range(1, 6)]

with open("values-2.tsv", "w", newline="") as f:
    csv.writer(f, delimiter="\t").writerows(rows)

print("Wrote values-2.tsv")


Wrote values-2.tsv


Submit the file "values-2.tsv" in Gradescope.

## Part 3: Searching a Maze

Consider the following maze.

![](https://github.com/bu-cds-dx704/dx704-project-05/blob/main/part3.png?raw=true)

State C is a terminal state giving reward +100.
The remaining states have a reward of -1 when they are reached.
So moving to state F has a value of +99 do to the reward of -1 at state F and the optimal action of moving to state C for the reward of +100 afterwards.

Compute the values for states A-J and S and save them in a file "values-3.tsv" with columns state and value.

state	value
A	    92
B	    94
C	   100
D	    95
E	    95
F	    99
G	    96
H	    96
I	    97
J	    98
S	    93


In [6]:
import csv
from collections import deque

# Directed edges from the figure
edges = {
    "A": ["S"],
    "B": ["D", "S"],
    "C": [],                 # terminal (+100)
    "D": ["B", "E", "G"],
    "E": ["D", "H"],
    "F": ["C", "J"],
    "G": ["D", "I"],
    "H": ["E", "I"],
    "I": ["G", "H", "J"],
    "J": ["F", "I"],
    "S": ["A", "B"],
}

# Build reverse graph and BFS from C to get shortest steps TO C
rev = {n: [] for n in edges}
for u, nbrs in edges.items():
    for v in nbrs:
        rev[v].append(u)

dist = {n: float("inf") for n in edges}
q = deque(["C"])
dist["C"] = 0
while q:
    v = q.popleft()
    for u in rev[v]:
        if dist[u] > dist[v] + 1:
            dist[u] = dist[v] + 1
            q.append(u)

# Value = 100 - distance-to-C
order = ["A","B","C","D","E","F","G","H","I","J","S"]
rows = [("state","value")] + [(n, int(100 - dist[n])) for n in order]

with open("values-3.tsv", "w", newline="") as f:
    csv.writer(f, delimiter="\t").writerows(rows)

print("Wrote values-3.tsv")


Wrote values-3.tsv


Submit "values-3.tsv" in Gradescope.

## Part 4: Acknowledgements

If you discussed this assignment with anyone, please acknowledge them here.
If you did this assignment completely on your own, simply write none below.

If you used any libraries not mentioned in this module's content, please list them with a brief explanation what you used them for. If you did not use any other libraries, simply write none below.

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the generative AI policy. If you did not use any generative AI tools, simply write none below.

none