# DX 704 Week 5 Project

This week's project will test your understanding of this week's concepts by asking you to simulate various algorithms by hand.
You will apply search, minimax and dynamic programming concepts to solve a variety of small planning problems.

The full project description, a template notebook and supporting materials are available on GitHub: [Project 5 Materials](https://github.com/bu-cds-dx704/dx704-project-05).

## Example Code

This week's assignment does not involve any coding.

## Part 1: Searching vs Games

Consider the state space illustrated below.
Each terminal state state is marked with a reward for reaching that state.
Each non-terminal state has two possible actions represented by the two outgoing arrows to later (lower) states.
The only rewards are for reaching the terminal states, there are no diminishing returns (i.e. $\gamma=1$), and there is no randomness so actions may be freely chosen.

![](https://github.com/bu-cds-dx704/dx704-project-05/blob/main/part1.png?raw=true)

Solve for the value of each non-terminal state according to the following three scenarios.

1. Search: There is one agent that picks all actions with the goal of maximizing the final reward.
2. Minimax: There are two agents P1 and P2. P1 controls the actions for the 1st and 3rd rows (i.e. the states marked A and D-G) while P2 controls the actions for the 2nd and 4th rows (i.e. the states B-C and H-J). P1 seeks to maximize the final reward, and P2 seeks to minimize the final reward.
3. Maximin: P1 and P2 control the same states as before, but P1 seeks to minimize the final reward, and P2 seeks to maximize the final reward.

Save your results in a file "values-1.tsv" with the column state with label A-J and columns search_value, minimax_value, and maximin_value that respectively correspond to the three scenarios.

Hint: Print out the image above and compute the values by hand in a bottom up fashion.

In [15]:
from pathlib import Path
from collections import defaultdict, deque
from functools import lru_cache
import re, csv

DOT_PATH = Path("part1.dot")  # change if needed

# --- Parse the DOT (edges + labels) ---
edge_re  = re.compile(r'(\w+)\s*->\s*(\w+)\s*;')
node_decl_re = re.compile(r'(\w+)\s*\[(.*?)\]', re.DOTALL)
label_re = re.compile(r'label\s*=\s*"([^"]+)"')

raw = DOT_PATH.read_text()

adj = defaultdict(list)
nodes = set()
for a,b in edge_re.findall(raw):
    adj[a].append(b)
    nodes.add(a); nodes.add(b)

id_to_label = {}
for node_id, attrs in node_decl_re.findall(raw):
    m = label_re.search(attrs)
    if m:
        id_to_label[node_id] = m.group(1)
    nodes.add(node_id)

# prefer a node whose label is 'A' as root; otherwise zero-indegree fallback
label_to_id = {v:k for k,v in id_to_label.items()}
root = label_to_id.get("A")
if root is None:
    indeg = {u:0 for u in nodes}
    for u, vs in adj.items():
        for v in vs:
            indeg[v] = indeg.get(v, 0) + 1
            indeg.setdefault(u, 0)
    zero_in = [u for u,d in indeg.items() if d == 0]
    if not zero_in:
        raise RuntimeError("No root found.")
    root = zero_in[0]

# --- Depths (row numbers) ---
depth = {root: 0}
dq = deque([root])
while dq:
    u = dq.popleft()
    for v in adj.get(u, []):
        if v not in depth:
            depth[v] = depth[u] + 1
            dq.append(v)

# Leaves have numeric labels like +5, -1, 0, etc.
def is_num(lbl): 
    return re.fullmatch(r'[+-]?\d+', lbl.strip()) is not None

leaf_values = {}
for n in nodes:
    if len(adj.get(n, [])) == 0:
        lbl = id_to_label.get(n, "")
        if not is_num(lbl):
            raise RuntimeError(f"Leaf {n} does not have a numeric label: '{lbl}'")
        leaf_values[n] = int(lbl)

# --- Bottom-up evaluation helpers ---
@lru_cache(None)
def children(n): 
    return tuple(adj.get(n, []))

@lru_cache(None)
def value_search(n):
    # Search: every internal node is MAX over children
    if n in leaf_values: 
        return leaf_values[n]
    return max(value_search(c) for c in children(n))

@lru_cache(None)
def value_minimax(n):
    # Minimax: rows 1 & 3 (depth 0 and 2) are P1=MAX; rows 2 & 4 (depth 1 and 3) are P2=MIN
    if n in leaf_values:
        return leaf_values[n]
    d = depth[n]
    vals = [value_minimax(c) for c in children(n)]
    return max(vals) if d % 2 == 0 else min(vals)

@lru_cache(None)
def value_maximin(n):
    # Maximin: same row control, but P1=MIN (even depths), P2=MAX (odd depths)
    if n in leaf_values:
        return leaf_values[n]
    d = depth[n]
    vals = [value_maximin(c) for c in children(n)]
    return min(vals) if d % 2 == 0 else max(vals)

def role_for_minimax(n):
    if n in leaf_values: 
        return "LEAF"
    return "MAX" if depth[n] % 2 == 0 else "MIN"  # per spec: rows 1&3 MAX, rows 2&4 MIN

# --- Build TSV rows ---
rows = []
for n in sorted(nodes, key=lambda x: (depth.get(x, 1_000_000), id_to_label.get(x, x))):
    rows.append({
        "node_id": n,
        "label": id_to_label.get(n, n),
        "depth": depth.get(n, -1),
        "role_minimax": role_for_minimax(n),
        "value_search": value_search(n),
        "value_minimax": value_minimax(n),
        "value_maximin": value_maximin(n),
    })

with open("values-1.tsv", "w", newline="") as f:
    w = csv.DictWriter(
        f,
        fieldnames=["node_id","label","depth","role_minimax","value_search","value_minimax","value_maximin"],
        delimiter="\t"
    )
    w.writeheader()
    for r in rows: w.writerow(r)

print("Root:", root, "label:", id_to_label.get(root, root))
print("Search(A) =", value_search(root))
print("Minimax(A) =", value_minimax(root))
print("Maximin(A) =", value_maximin(root))
print("Wrote values-1.tsv with", len(rows), "rows")


Root: n1 label: A
Search(A) = 20
Minimax(A) = 1
Maximin(A) = 0
Wrote values-1.tsv with 21 rows


Submit the file "values-1.tsv" in Gradescope.

## Part 2: Picking Up Sticks

The state space illustrated below corresponds to a variation of the game [Nim](https://en.wikipedia.org/wiki/Nim).
States labeled with a prefix of "p1_" correspond to states where player P1 chooses the action while states labeled with a prefix of "p2_" correspond to states where player P2 chooses the action.
The number in the suffix is the number of "sticks" remaining.
The players take turns choosing actions, and each action corresponds to removing one or two sticks.
When there are no more sticks, the player who would have picked an action loses.


![](https://github.com/bu-cds-dx704/dx704-project-05/blob/main/part2.png?raw=true)

For example, from the state labeled "p1_1", there is one stick left, player P1 removes the last stick, and player P2 loses.
The loss for P2 is represented by a final reward of +1.
A loss for P1 is represented by a final reward of -1.
Player P1 tries to maximize the final reward, and player P2 tries to minimize the final reward.

Solve for the value of each of the non-terminal states.
Save the results in a file "values-2.tsv" with columns state and value.

In [16]:
from pathlib import Path
from collections import defaultdict
from functools import lru_cache
import re, csv

DOT_PATH = Path("part2.dot")  # change if needed

# --- parse DOT edges ---
def parse_dot_edges(dot_path: Path):
    text = dot_path.read_text()
    edge_pat = re.compile(r'(\w+)\s*->\s*(\w+)\s*;')
    adj = defaultdict(list); nodes = set()
    for a,b in edge_pat.findall(text):
        adj[a].append(b)
        nodes.add(a); nodes.add(b)
    return dict(adj), nodes

adj, nodes = parse_dot_edges(DOT_PATH)

# states look like p1_k or p2_k
def is_state(s: str):
    return re.fullmatch(r'p[12]_\d+', s) is not None

states = sorted([s for s in nodes if is_state(s)],
                key=lambda s: (int(s.split('_')[1]), s.split('_')[0]))  # sort by k then player

# Only terminals per assignment:
TERMINAL = {"p1_0": -1, "p2_0": +1}

@lru_cache(None)
def negamax(s: str) -> int:
    """Zero-sum negamax value from the perspective of the player-to-move at s."""
    if s in TERMINAL:
        return TERMINAL[s]
    succ = adj.get(s, [])
    if not succ:
        # Shouldn't happen if graph is well-formed; neutral fallback.
        return 0
    # negamax: value(s) = max_{s'} ( - value(s') )
    return max(-negamax(s2) for s2 in succ)

# write TSV: only state, value
with open("values-2.tsv", "w", newline="") as f:
    w = csv.writer(f, delimiter="\t")
    w.writerow(["state", "value"])
    for s in states:
        w.writerow([s, negamax(s)])

print("Wrote values-2.tsv with", len(states), "rows")




Wrote values-2.tsv with 12 rows


Submit the file "values-2.tsv" in Gradescope.

## Part 3: Searching a Maze

Consider the following maze.

![](https://github.com/bu-cds-dx704/dx704-project-05/blob/main/part3.png?raw=true)

State C is a terminal state giving reward +100.
The remaining states have a reward of -1 when they are reached.
So moving to state F has a value of +99 do to the reward of -1 at state F and the optimal action of moving to state C for the reward of +100 afterwards.

Compute the values for states A-J and S and save them in a file "values-3.tsv" with columns state and value.

In [17]:
from pathlib import Path
from collections import defaultdict, deque
import re, csv, math

DOT_PATH = Path("part3.dot")  # point this at the MDP graph (with S, A, B, ..., C:+100)

edge_re       = re.compile(r'(\w+)\s*->\s*(\w+)\s*;')
node_decl_re  = re.compile(r'(\w+)\s*\[(.*?)\]', re.DOTALL)
label_re      = re.compile(r'label\s*=\s*(?:"([^"]+)"|<([^>]+)>|([^\s,\]]+))', re.DOTALL)

raw = DOT_PATH.read_text()

# edges + labels
adj = defaultdict(list); nodes = set()
for a,b in edge_re.findall(raw):
    adj[a].append(b); nodes.add(a); nodes.add(b)

id_to_label = {}
for node_id, attrs in node_decl_re.findall(raw):
    m = label_re.search(attrs)
    if m:
        id_to_label[node_id] = (m.group(1) or m.group(2) or m.group(3)).strip()
    nodes.add(node_id)

# find terminal C (prefer label mentioning +100, else label starting with 'C')
def is_C_label(lbl: str):
    t = lbl.strip()
    return ('+100' in t) or (t == 'C') or t.startswith('C:')

C_ids = [nid for nid,lbl in id_to_label.items() if is_C_label(lbl)]
if not C_ids:
    raise RuntimeError("Couldn't find terminal C (looked for label 'C' or '+100').")
C = C_ids[0]

# reverse graph + BFS from C to get shortest steps to C
rev = defaultdict(list)
for u, vs in adj.items():
    for v in vs:
        rev[v].append(u)

dist = {n: math.inf for n in nodes}
dq = deque([C]); dist[C] = 0
while dq:
    x = dq.popleft()
    for p in rev.get(x, []):
        if dist[p] == math.inf:
            dist[p] = dist[x] + 1
            dq.append(p)

# compute V*(s) = 100 - steps (and V*(C)=100)
def is_numeric_label(lbl: str) -> bool:
    return re.fullmatch(r'[+-]?\d+', lbl.strip()) is not None

rows = []
for nid in nodes:
    lbl = id_to_label.get(nid, nid)
    if is_numeric_label(lbl):
        continue  # ignore any numeric-only helper nodes if present
    if nid == C:
        val = 100
    else:
        if dist[nid] == math.inf:
            # unreachable -> -inf (loops hurt with gamma=1 and step cost -1)
            # if you want blanks instead, set val = "" here
            val = "-inf"
        else:
            val = 100 - dist[nid]
    rows.append((lbl, val))

rows.sort(key=lambda t: t[0])  # alphabetical by state label

with open("values-3.tsv", "w", newline="") as f:
    w = csv.writer(f, delimiter="\t")
    w.writerow(["state", "value"])
    for s,v in rows:
        w.writerow([s, v])

print("C id:", C, "label:", id_to_label.get(C, C))
print("Wrote values-3.tsv with", len(rows), "rows")



C id: C label: C: +100
Wrote values-3.tsv with 11 rows


Submit "values-3.tsv" in Gradescope.

## Part 4: Acknowledgements

If you discussed this assignment with anyone, please acknowledge them here.
If you did this assignment completely on your own, simply write none below.

None

If you used any libraries not mentioned in this module's content, please list them with a brief explanation what you used them for. If you did not use any other libraries, simply write none below.

None

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the generative AI policy. If you did not use any generative AI tools, simply write none below.

here is the conversation i had with chatgpt helping me go over my work

https://chatgpt.com/share/68e1d9c3-46d4-800d-9203-4fd1db4f6195