---

# CSCI 3202, Fall 2022
# Final Project
# Project Due: Thursday December 8, 2022 at 6:00 PM
## Proposals Due: Friday November 18, 2022 at 6:00 PM


You have two options for completing your final project for this course. 

#### Option 1 ####
The first option is presented in this notebook and involves implementing a Connect Four game with AB pruning and A* as player strategies. 

#### Option 2 ####
The second option is to design your own project that includes any of the algorithms we've discussed throughout the semester, or an algorithm that you're interested in learning that we haven't discussed in class. Your project also needs to include some kind of analysis of how it performed on a specific problem. If you're interested in the design your own project option, you need to discuss your idea with one of the course instructors to get approval. If you do a project without getting approval, you will receive a 0 regardless of the quality of the project. 

**The rules:**

1. Choose EITHER the given problem to submit OR choose your own project topic. 

2. If you choose your own project topic, please adhere to the following guidelines:
- Send an email to the course instructors before Friday, November 18 at 6pm, with a paragraph description of your project. We will respond within 24 hours with feedback.
- The project can include an algorithm we've discussed throughout the semester or an algorithm that you're been curious to learn. Please don't recycle a project that you did in another class. 
- If you do your own project without prior approval, you will receive a 0 for this project.
- Your project code, explanation, and results must all be contained in a Jupyter notebook. 

3. All work, code and analysis must be **your own**.
4. You may use your course notes, posted lecture slides, textbook, in-class notebooks and homework solutions as resources.  You may also search online for answers to general knowledge questions, like the form of a probability distribution function, or how to perform a particular operation in Python. You may not use entire segments of code as solutions to any part of this project, e.g. if you find a Python implementation of policy iteration online, you can't use it.
5. You may **not** post to message boards or other online resources asking for help.
6. **You may not collaborate with classmates or anyone else.**
7. This is meant to be like a coding portion of an exam. So, we will be much less helpful than we typically are with homework. For example, we will not check answers, help debug your code, and so on.
8. If you have a question, post it first as a **private** Piazza message. If we decide that it is appropriate for the entire class, then we will make it a public post (and anonymous).
9. If any part of the given project or your personal project is left open-ended, it is because we intend for you to code it up however you want. Our primary concern is with the plots/analysis that your code produces. Feel free to ask clarifying questions though.

Violation of these rules will result in an **F** and a trip to the Honor Code council.

---
**By writing your name below, you agree to abide by these rules:**

**Your name:** Ty VanEssen

---


---

Some useful packages and libraries:



In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import colors
from collections import deque
import heapq
import unittest
from scipy import stats
import copy as cp
from time import time

---

## Problem 1: Game Theory - Playing "intelligent" Connect Four

Connect Four is a two-player game where the objective is to get four pieces in a row - horizontally, vertically, or diagonally. Check out this video if you're unfamiliar with the game: https://www.youtube.com/watch?v=utXzIFEVPjA.

### (1a)   Defining the Connect Four class structure

We've provided the humble beginnings of a Connect Four game. You need to fill in this class structure for Connect Four using what we did during class as a guide, and then implement min-max search with AB pruning, and A* search with at least one heuristic function. The class structure includes the following:

* `moves` is a list of columns to represent which moves are available. Recall that we are using matrix notation for this, where the upper-left corner of the board, for example, is represented at (1,1).
* `result(self, move, state)` returns a ***hypothetical*** resulting `State` object if `move` is made when the game is in the current `state`. Note that when a 'move' is made, the column must have an open slot and the piece must drop to the lowest row. 
* `compute_utility(self, move, state)` calculates the utility of `state` that would result if `move` is made when the game is in the current `state`. This is where you want to check to see if anyone has gotten `nwin` in a row
* `game_over(self, state)` returns `True` if the game in the given `state` has reached a terminal state, and `False` otherwise.
* `utility(self, state, player)` returns the utility of the current state if the player is Red and $-1 \cdot$ utility if the player is Black.
* `display(self)` is a method to display the current game `state`. You get it for free because this would be super frustrating without it.
* `play_game(self, player1, player2)` returns an integer that is the utility of the outcome of the game (+1 if Red wins, 0 if draw, -1 if Black wins). `player1` and `player2` are functional arguments that we will deal with in parts **1b** and **1d**.

Some notes:
* Assume Red always goes first.
* Do **not** hard-code for 6x7 boards.
* You may add attributes and methods to these classes as needed for this problem.

In [342]:

from enum import Enum
from typing import Callable

class Player(Enum):
    RED = "R"
    BLACK = "B"
    EMPTY = "."

class State:
    def __init__(self, board_dimensions: tuple[int, int], utility: int = 0, board: dict[tuple[int,int], Player] = {}):
        self.to_move = Player.RED
        self.utility = utility
        self.dimensions = board_dimensions
        self.board : dict[tuple[int,int], Player] = board
        self._moves = None
        self._adjacent = None

    def moves(self) -> list[tuple[int, int]]:
        # Lazy Cache
        if (self._moves != None): 
            return self._moves

        available_moves : dict[int, int] = {col: self.dimensions[0] for col in range(1, self.dimensions[1] + 1)}
        for taken_row, taken_col in self.board:
            if taken_col in available_moves:
                next_row = taken_row - 1
                if next_row <= 0:
                    available_moves.pop(taken_col)
                elif next_row < available_moves[taken_col]:
                    available_moves[taken_col] = next_row
        self._moves = [ (available_moves[col], col) for col in available_moves]
        return self._moves


    def get_adjacent(self, move: tuple[int,int], n_adjacent: int, include_move: bool = True) -> list[list[Player]]:
        surrounding: dict[tuple[int,int], list[Player]] = {
            (iter_row, iter_col): []
                for iter_row in range(-1,2)
                for iter_col in range(-1,2)
                if (iter_row, iter_col) != (0,0)
        }
        lower_idx = 0 if include_move else 1

        for levels_out in range(lower_idx,n_adjacent):
            remove_direction : list[tuple[int,int]] = []
            for direction in surrounding:
                check_state = (move[0] + direction[0] * levels_out, move[1] + direction[1] * levels_out)
                check_state_value : Player | None = self.board.get(check_state)

                if check_state_value != None:
                    surrounding[direction].append(check_state_value)
                else:
                    remove_direction.append(direction)
            for direction in remove_direction:
                surrounding.pop(direction)
                
        return [ surrounding[direction] for direction in surrounding]

class ConnectFour:
    def __init__(self, nrow=6, ncol=7, nwin=4):
        self.nrow: int = nrow
        self.ncol: int = ncol
        self.nwin: int = nwin
        # moves: list[tuple[int, int]] = [ (0, col) for col in range(1, self.ncol + 1)]
        # moves: list[tuple[int, int]] = [(row,col) for row in range(1, nrow + 1) for col in range(1, ncol + 1)]
        self.state = State((nrow, ncol))

    def result(self, move: tuple[int,int], state: State) -> State:
        '''
        What is the hypothetical result of move `move` in state `state` ?
        move  = (row, col) tuple where player will put their mark (R or B)
        state = a `State` object, to represent whose turn it is and form
                the basis for generating a **hypothetical** updated state
                that will result from making the given `move`
        '''
        
        # your code goes here...
        if not move in state.moves():
            raise ValueError('Move {} is not in moves {}'.format(move, state.moves()))
        #? Should this be +=
        utility = self.compute_utility(move, state)
        new_state = State((self.nrow, self.ncol), utility, {**state.board, move: state.to_move})
        new_state.to_move = Player.BLACK if state.to_move == Player.RED else Player.RED

        return new_state

    def compute_utility(self, move: tuple[int,int], state: State) -> int:
        '''
        What is the utility of making move `move` in state `state`?
        If 'R' wins with this move, return 1;
        if 'B' wins return -1;
        else return 0.
        '''        
        
        # your code goes here...
        utility = 0
        adjacent = [ [state.to_move, *direction] for direction in state.get_adjacent(move, self.nwin, False)]
        for direction in adjacent:
            #? Should this be more robust counting the whole board
            if direction.count(Player.RED) == self.nwin:
                utility += 1
            elif direction.count(Player.BLACK) == self.nwin:
                utility -= 1
        
        # if utility not in range(-1,2):
        #     self.display()
        #     raise ValueError("Computed utility is out of bounds for {}, utility {} not in {}".format(move, utility, range(-1,2)))

        return utility

    def game_over(self, state: State) -> bool:
        '''game is over if someone has won (utility!=0) or there
        are no more moves left'''

        # your code goes here... 
        return state.utility != 0 or len(state.moves()) == 0

    def utility(self, state: State, player: Player) -> int:
        '''Return the value to player; 1 for win, -1 for loss, 0 otherwise.'''
        # your code goes here...
        #? This should be a toggle that allows for multi 4match completes
        red_utility: int = -1 if state.utility < 0 else int(state.utility > 0)
        return red_utility if player == Player.RED else -1 * red_utility

    def display(self, state: State = None) -> None:
        board = state.board if state != None else self.state.board

        for row in range(1, self.nrow + 1):
            for col in range(1, self.ncol + 1):
                print(board.get((row, col), Player.EMPTY).value, end=' ')
            print()

    def play_game(self, player1, player2, verbose: bool = False) -> int:
        '''Play a game of Connect Four!'''
        
        # your code goes here...
        player_actors: dict[Player, Callable[ConnectFour, tuple[int,int]]] = {
            Player.RED: player1,
            Player.BLACK: player2
        }

        while not self.game_over(self.state):
            actor = player_actors[self.state.to_move]
            move: tuple[int,int] = actor(self)
            new_state = self.result(move, self.state)
        
            relative_utility = self.utility(new_state, self.state.to_move)

            if verbose: print("{} places a piece at {}".format(self.state.to_move.name, move))
            
            print("Current Utility {}".format(relative_utility))
            if relative_utility != 0 and verbose:
                print("Player {} Won!".format(self.state.to_move.name if relative_utility > 0 else new_state.to_move))
            self.state = new_state
            
            if verbose: self.display()
        
        return self.utility(self.state, Player.RED)

### (1b) Define a random player

Define a function `random_player` that takes a single argument of the `ConnectFour` class and returns a random move out of the available legal moves in the `state` of the `ConnectFour` game.

In your code for the `play_game` method above, make sure that `random_player` could be a viable input for the `player1` and/or `player2` arguments.

In [343]:
def random_player(game: ConnectFour) -> tuple[int,int]:
    '''A player that chooses a legal move at random out of all
    available legal moves in ConnectFour state argument'''
    
    # your code goes here...
    avaliable_moves = game.state.moves()
    selected_move : int = np.random.choice(range(len(avaliable_moves)))
    return avaliable_moves[selected_move]


We know from experience and/or because I'm telling you right now that if two `random_player`s play many games of ConnectFour against one another, whoever goes first will win about 55% of the time.  Verify that this is the case by playing at least 1,000 games between two random players. Report the proportion of the games that the first player has won.**(Chris: is this true for TicTacToe, or Connect Four)**

**"Unit tests":** If you are wondering how close is close enough to 55%, I simulated 100 tournaments of 1,000 games each. The min-max range of win percentage by the first player was 52-59%.

In [99]:
# 1000 games between two random players

# Your code here
winners : dict[Player, int] = {p: 0 for p in Player}
iterations = 10000
for i in range(iterations):
    game_result = ConnectFour().play_game(random_player, random_player)
    winner = Player.EMPTY
    if game_result < 0:
        winner = Player.BLACK
    elif game_result > 0:
        winner = Player.RED

    winners[winner] += 1

print("Player {} won {:.4f}% of the time".format(Player.RED.name, winners[Player.RED] / iterations))

Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 1
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 0
Current Utility 1
Current Utility 0
Current Utility 0
Current Ut

### (1c) What about playing randomly on different-sized boards?

What does the long-term win percentage appear to be for the first player in a 10x10 ConnectFour tournament, where 4 marks must be connected for a win?  Support your answer using a simulation and printed output, similar to **1b**.

**Also:** The win percentage should have changed substantially. Did the decrease in wins turn into more losses for the first player or more draws? Write a few sentences explaining the behavior you observed.  *Hint: think about how the size of the state space has changed.*

In [5]:
# 1000 games between two random players
#todo @TV These numbers are not significantly different
# Your code here
winners : dict[Player, int] = {p: 0 for p in Player}
iterations = 10000

for i in range(iterations):
    game_result = ConnectFour(10, 10, 4).play_game(random_player, random_player)
    winner = Player.EMPTY
    if game_result < 0:
        winner = Player.BLACK
    elif game_result > 0:
        winner = Player.RED

    winners[winner] += 1

print(winners)
print("Player {} won {:.4f}% of the time".format(Player.RED.name, winners[Player.RED] / iterations))

KeyboardInterrupt: 

### (1d) Define an alpha-beta player

Alright. Let's finally get serious about our Connect Four game.  No more fooling around!

Craft a function called `alphabeta_player` that takes a single argument of a `ConnectFour` class object and returns the minimax move in the `state` of the `ConnectFour` game. As the name implies, this player should be implementing alpha-beta pruning as described in the textbook and lecture.

Note that your alpha-beta search for the minimax move should include function definitions for `max_value` and `min_value` (see the aggressively realistic pseudocode from the lecture slides).

In your code for the `play_game` method above, make sure that `alphabeta_player` could be a viable input for the `player1` and/or `player2` arguments.

In [400]:
# Your code here
#For AB Pruning, we are aware that most people are getting long runtimes. For now, reduce the board size and nwin parameters to reduce the complexity (try a 3x3 board and nwin=3 for example). We are working on an update to this part of the assignment, but you should be able to implement the AB pruning functionality with the reduced game size.  
from collections import namedtuple
from functools import reduce
import math

verbose = False
pruned_state_ctr = 0
seen_states = {}
discount_factor = .99
UtilityMove = namedtuple("UtilityMove", ["utility", "move"])

def alphabeta_player(game: ConnectFour) -> tuple[int,int]:
    print("INIT")

    # if game.state.to_move == Player.RED:
    #     utility = max_value(game, game.state, float("-inf"), float("inf"))
    # else:
    #     utility = min_value(game, game.state, float("-inf"), float("inf"))
    
    utility = alphabeta_value(game, game.state, float("-inf"), float("inf"), game.state.to_move == Player.RED)

    if verbose:
        print("Pruned {} or {:.3f}% of states".format(pruned_state_ctr, 100 * pruned_state_ctr / math.factorial(game.ncol * game.nrow)))
    
    print(utility)
    return utility.move

def max_value(game: ConnectFour, state: State, alpha: float, beta: float) -> UtilityMove:
    if game.game_over(state):
        return UtilityMove(state.utility, None)

    moveValue = UtilityMove(-np.inf, None)

    for action in state.moves():
        child_utility, child_move = min_value(game, game.result(action, state), alpha, beta)
        moveValue = max(moveValue, UtilityMove(child_utility, action))
        if moveValue.utility >= beta:
            return moveValue

        alpha = max(alpha, moveValue.utility)
    return moveValue

def min_value(game: ConnectFour, state: State, alpha: float, beta: float) -> UtilityMove:
    if game.game_over(state):
        return UtilityMove(state.utility, None)

    moveValue = UtilityMove(np.inf, None)
    for action in state.moves():
        child_utility, child_move = max_value(game, game.result(action, state), alpha, beta)
        moveValue = min(moveValue, UtilityMove(child_utility, action))
        if moveValue.utility <= alpha:
            return moveValue

        beta = min(beta, moveValue.utility)
    return moveValue


def alphabeta_value(game: ConnectFour, state: State, alpha: int, beta: int, maximize: bool, indent: int = 0) -> UtilityMove:
    indent_str = "-"*indent
    global verbose, pruned_state_ctr, discount_factor

    if game.game_over(state):
        if verbose: print(indent_str, "[ X ] Terminal State", state.utility)
        return UtilityMove(state.utility, None)

    beta_pointer, alpha_pointer = [beta], [alpha]

    if maximize:
        utility = UtilityMove(-np.inf, None)
        comparison_fn = max
        child_rechable_boundary = alpha_pointer
        reachable_bound = beta_pointer
        symbol = float.__ge__
        early_return = lambda util: util >= beta_pointer[0]
    else:
        utility = UtilityMove(np.inf, None)
        comparison_fn = min
        child_rechable_boundary = beta_pointer
        reachable_bound = alpha_pointer
        symbol = float.__le__
        early_return = lambda util: util <= alpha_pointer[0]
    
    #! Room to mess with the order we evaluate moves, might improve runtime
    moves = state.moves()
    for move_idx in range(len(moves)):
        move = moves[move_idx]

        if verbose: print(indent_str, "Exploring Move:", move, not maximize)
        args = (game.result(move, state), alpha_pointer[0], beta_pointer[0], not maximize)
        child_utility, child_move = alphabeta_value(game, *args, indent + 1)
        child_utility *= discount_factor

        if verbose: print(indent_str, move, child_utility, utility, alpha_pointer, beta_pointer, maximize)
        utility = comparison_fn(utility, UtilityMove(child_utility, move))
        # print(indent_str, utility)

        # if early_return(utility.utility):
        if symbol(utility.utility, reachable_bound[0]):
            if True:
                remaining_to_analyze = (len(moves) - (move_idx + 1))
                remaining_moves = reduce(lambda x,y: x*y, state.dimensions) - len(state.board)
                pruned_states = math.factorial(remaining_moves - 1) * remaining_to_analyze
                pruned_state_ctr += pruned_states
                print("Moves:", moves)
                print(indent_str, "{} states pruned from {} remaining options and {} remaining moves".format(pruned_states, moves[move_idx + 1:], remaining_moves))
                print(indent_str, "[ ! ] Early Exit")
            return utility
        child_rechable_boundary[0] = comparison_fn(utility.utility, child_rechable_boundary[0])
        # print(indent_str, "Util {}, child {}, fn {}".format(utility.utility, child_rechable_boundary[0], comparison_fn.__name__))
    if verbose:
        print(indent_str, "a {}, b {}".format(alpha_pointer[0], beta_pointer[0]))
        print(indent_str, "[ ! ] Explored all moves")
    return utility

Verify that your alpha-beta player code is working appropriately through the following tests, using a standard 6x7 ConnectFour board. Run **10 games for each test**, and track the number of wins, draws and losses. Report these results for each case.

1. An alpha-beta player who plays first should never lose to a random player who plays second.
2. Two alpha-beta players should always draw. One player is the max player and the other player is the min player.

**Nota bene:** Test your code with fewer games between the players to start, because the alpha-beta player will require substantially more compute time than the random player.  This is why I only ask for 10 games, which still might take a minute or two. You are welcome to run more than 10 tests if you'd like. 

In [401]:
# Your code here
# alphabeta_player(ConnectFour(3,3,2))
from copy import deepcopy


game = ConnectFour(3,3,3)
# game.state = State((3,3), 0, {
#     (3,3): Player.RED,
#     (2,3): Player.RED,
#     (3,2): Player.BLACK,
#     # (1,3): Player.BLACK
# })
# game.state.to_move = Player.BLACK

# game.state.to_move = Player.RED
game.play_game(alphabeta_player, alphabeta_player, True)
# save = deepcopy(game.state)

# game.state.get_adjacent((3,3), 3, False)
# assert save.__eq__(game.state), "\n{}\n{}".format(save.__reduce__(), game.state.__reduce__())
# game.display()
# print(game.compute_utility((1,1), game.state))

# nm = alphabeta_player(game)
# ns = game.result(nm, game.state)
# game.display()
# print("--")
# game.display(ns)
#Todo @TV Still Buggy

INIT
Moves: [(1, 3)]
-------- 0 states pruned from [] remaining options and 1 remaining moves
-------- [ ! ] Early Exit
Moves: [(1, 2), (2, 3)]
------ 2 states pruned from [(2, 3)] remaining options and 3 remaining moves
------ [ ! ] Early Exit
Moves: [(2, 3)]
------- 0 states pruned from [] remaining options and 2 remaining moves
------- [ ! ] Early Exit
Moves: [(1, 2), (1, 3)]
------- 1 states pruned from [(1, 3)] remaining options and 2 remaining moves
------- [ ! ] Early Exit
Moves: [(2, 2), (2, 3)]
----- 6 states pruned from [(2, 3)] remaining options and 4 remaining moves
----- [ ! ] Early Exit
Moves: [(1, 2)]
-------- 0 states pruned from [] remaining options and 1 remaining moves
-------- [ ! ] Early Exit
Moves: [(1, 3)]
-------- 0 states pruned from [] remaining options and 1 remaining moves
-------- [ ! ] Early Exit
Moves: [(1, 2)]
-------- 0 states pruned from [] remaining options and 1 remaining moves
-------- [ ! ] Early Exit
Moves: [(2, 2), (1, 3)]
------ 0 states pruned 

1

### (1e) What has pruning ever done for us?

Calculate the number of number of states expanded by the minimax algorithm, **with and without pruning**, to determine the minimax decision from the initial 6x7 ConnectFour board state.  This can be done in many ways, but writing out all the states by hand is **not** one of them (as you will find out!).

Then compute the percent savings that you get by using alpha-beta pruning. i.e. Compute $\frac{\text{Number of nodes expanded with pruning}}{\text{Number of nodes expanded with minimax}} $

Write a sentence or two, commenting on the difference in number of nodes expanded by each search.

In [None]:
# Your code here.

### (2) A* Search

In Part II of this project, you need to implement a player strategy to employ A* Search in order to win at ConnectFour. To test your A* player, play 10 games against the random player and 10 games against the AB pruning player. 

Write an explanation of this strategy's implementation and performance in comparison to the random player and the AB Pruning player from **1d**.

A lot of the code that you wrote for the minimax player and the Connect Four game structure can be reused for the A* player. However, you will need to write a new utility function for A* that considers the path cost and heuristic cost.


### (2a) Define a heuristic function
Your A* player will need to use a heuristic function. You have two options for heurstics: research an existing heuristic for Connect Four, or games similar to Connect Four, and use that. Or, design your own heuristic. Write a one-paragraph description of the heuristic you're using, including a citation if you used an existing heuristic.

### (2b) Compare A* to other algorithms
Next, play 10 games of Connect Four using your A* player and a random player and 10 games against the AB pruning player. In four or five paragraphs, report on the outcome. Did one player win more than the other? How often was the game a draw? How many moves did each player make? Were there situations where one player appeared to do better than the other? Given the outcome, are there other heuristics you would like to implement?

In [None]:
# Your code here.