# Introduction

In the previous tutorial, you learned how to build an agent with one-step lookahead.  This agent performs reasonably well, but definitely still has room for improvement!  For instance, consider the potential moves in the figure below.

NEED TO FIX FIGURE
<center>
<img src="https://i.imgur.com/Zi77Qf5.png" width=80%><br/>
</center>

With one-step lookahead, the red player picks one of column 4 or 5, each with 50% probability.  But, column 5 is the better move, as it puts the player in a position to certainly win the game in only one more turn.  (_So, ideally, the agent only selects column 5 from this board._)  Unfortunately, the agent doesn't know this, because it can only look one move into the future.  

In this tutorial, you'll use the **minimax algorithm** to help the agent look farther into the future and make better-informed decisions.

# Minimax

We'd like to leverage information from deeper in the game tree.  For now, assume we work with a depth of 3.  This way, when deciding its move, the agent considers all possible game boards that can result from  
1. the agent's move, 
2. the opponent's move, and 
3. the agent's next move.  

We'll work with a visual example.  For simplicity, we assume that at each turn, both the agent and opponent have only two possible moves.  Each of the blue rectangles in the figure below corresponds to a different game board.

<center>
<img src="https://i.imgur.com/Vgf1OwI.png" width=80%><br/>
</center>

As before, the current game state is at the top of the figure, and we've recorded the number of points assigned to each board at the bottom of the tree.  The agent's goal is to end up with a score that's as high as possible. 

But notice that the agent no longer has complete control over its score -- after the agent makes its move, the opponent selects its own move.  And, the opponent's selection can prove disastrous for the agent!  In particular, the opponent can ensure the agent never receives a score of +10.  
- If the agent chooses the left branch, the opponent can force a score of -10.  
- If the agent chooses the right branch, the opponent can force a score of 0.  

This is depicted in the figure below, where the agent's selection and the opponent's response are marked as (1) and (2), respectively.

<center>
<img src="https://i.imgur.com/x6AGOQf.png" width=80%><br/>
</center>

With this in mind, you might argue that the right branch is the better choice for the agent, since it is the less risky option.  Sure, it gives up the possibility of getting the large score (100) that can only be accessed on the left branch, but it also completely avoids the worst case scenario of the very small score (-100).

This is the main idea behind the **minimax algorithm**: when assessing potential moves, the agent assumes that its opponent will always choose a move in response that is most damaging for the agent.  That is, the agent assumes its opponent has access to the same game tree and heuristic scores that are available to the agent, and that the opponent always chooses its moves so that the agent gets the lowest possible score.  

Then, given that the opponent uses this strategy, the agent can then plan its best moves.  For instance, in the example above, the minimax agent will not pick the left branch, since it assumes that the opponent will certainly in that case select a move in response to force the agent to a score of -100.  Instead, it picks the right branch: in practice, ...

# Code

...

In [None]:
#$HIDE_INPUT$
import random
import numpy as np

In [None]:
#$HIDE_INPUT$
# Helper function for score_move: checks if window satisfies heuristic conditions
def check_window(window, num_discs, piece, config):
    return (window.count(piece) == num_discs and window.count(0) == config.inarow-num_discs)
    
# Helper function for score_move: counts number of windows satisfying specified heuristic conditions
def count_windows(grid, num_discs, piece, config):
    num_windows = 0
    # horizontal
    for row in range(config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[row, col:col+config.inarow])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    # vertical
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns):
            window = list(grid[row:row+config.inarow, col])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    # positive diagonal
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row+config.inarow), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    # negative diagonal
    for row in range(config.inarow-1, config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    return num_windows

In [None]:
# Uses minimax to calculate value of dropping piece in selected column
def score_move(grid, col, mark, config, nsteps):
    next_grid = drop_piece(grid, col, mark, config)
    score = minimax(next_grid, nsteps-1, False)
    return score

# Calculates value of heuristic for selected grid
def get_heuristic(grid, mark, config):
    num_threes = count_windows(grid, 3, mark, config)
    num_fours = count_windows(grid, 4, mark, config)
    num_threes_opp = count_windows(grid, 3, mark%2+1, config)
    num_fours_opp = count_windows(grid, 4, mark%2+1, config)
    score = num_threes - 1e2*num_threes_opp - 1e4*num_fours_opp + 1e6*num_fours 
    return score

# Gets board at next step if drop piece in selected column
def drop_piece(grid, piece, config):
    next_grid = grid.copy()
    for row in range(config.rows-1, -1, -1):
        if next_grid[row][col] == 0:
            break
    next_grid[row][col] = piece
    return next_grid

# Helper function for minimax: checks if agent or opponent has four in a row in the window
def is_terminal_window(window, config):
    return window.count(1) == config.inarow or window.count(2) == config.inarow

# Helper function for minimax: checks if game has ended
def is_terminal_node(grid, config):
    # horizontal 
    for row in range(config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[row, col:col+config.inarow])
            if is_terminal_window(window, config):
                return True
    # vertical
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns):
            window = list(grid[row:row+config.inarow, col])
            if is_terminal_window(window, config):
                return True
    # positive diagonal
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row+config.inarow), range(col, col+config.inarow)])
            if is_terminal_window(window, config):
                return True
    # negative diagonal
    for row in range(config.inarow-1, config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
            if is_terminal_window(window, config):
                return True
    # TODO: Check for draw
    return False

# Returns list of valid moves
def get_valid_moves(grid, config):
    return [c for c in range(config.columns) if grid[0][c] == 0]

# Minimax implementation
def minimax(node, depth, maximizingPlayer, mark, config):
    is_terminal = is_terminal_node(node, config)
    valid_moves = get_valid_moves(node, config)
    if depth == 0 or is_terminal:
        return get_heuristic(node, mark, config)
    if maximizingPlayer:
        value = -np.Inf
        for col in valid_moves:
            child = drop_piece(node, mark, config)
            value = max(value, minimax(child, depth-1, False, mark, config))
        return value
    else:
        value = np.Inf
        for col in valid_moves:
            child = drop_piece(node, mark%2+1, config)
            value = min(value, minimax(child, depth-1, True, mark, config))
        return value

The minimax agent is implemented in the code cell below.

In [None]:
N_STEPS = 3

# Minimax agent
def agent1(obs, config):
    # Get list of valid moves
    valid_moves = [c for c in range(config.columns) if obs.board[c] == 0]
    # Convert the board to a 2D grid
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)
    # Use the heuristic to assign a score to each possible board in the next step
    scores = dict(zip(valid_moves, [score_move(grid, col, obs.mark, config, N_STEPS) for col in valid_moves]))
    # Get a list of columns (moves) that maximize the heuristic
    max_cols = [key for key in scores.keys() if scores[key] == max(scores.values())]
    # Select at random from the maximizing columns
    return random.choice(max_cols)

then they play against the agent

# Your turn

Continue to **[...link...](#$NEXT_NOTEBOOK_URL$)** ...