# Introduction

Even if you're new to Connect Four, you've likely developed several strategies for playing the game.  In this tutorial, you'll learn how to use a **heuristic** to share your knowledge with the agent.  

# Game trees

As a human player, how do you think about how to play the game?  How do you weigh alternative moves?

You likely do a bit of forecasting.  For each potential move, you predict what your opponent is likely to do in response, along with how you'd then respond, and what the opponent is likely to do then, and so on.  Then, you choose the move where you think you're most likely to win.

We can formalize this idea and represent all possible outcomes in a **(complete) game tree**.  

<center>
<img src="https://i.imgur.com/EZKHxyy.png"><br/>
</center>

The game tree represents each possible move (by player and opponent), starting with an empty board.  Then, the first row shows all possible moves the red player can make.  Next, we record each move the yellow player can make in response, and so on, until each branch reaches the end of the game.  (_The game tree for Connect Four is quite large, so we show only a small preview in the image above_.)

Once we can see every way the game can possibly end, it can help us to pick the move where we are most likely to win.

# Heuristics

The complete game tree for Connect Four has over [4 trillion](https://oeis.org/A212693) different boards!  So in practice, our agent can only work with a small subset when planning a move. 

To make sure the incomplete tree is still useful to the agent, we will use a **heuristic** (or **heuristic function**).  The heuristic assigns scores to different game boards, where we estimate that boards with higher scores are more likely to result in the agent winning the game.  You will design the heuristic based on your knowledge of the game.

For instance, one heuristic that might work reasonably well for Connect Four looks at each grid position of four spots in a (horizontal, vertical, or diagonal) line and assigns:
- **100 points** if the agent has four discs in a row (the agent won), 
- **1 point** if the agent filled three spots, and the remaining spot is empty (the agent wins if it fills in the empty spot), 
- **-10 points** if the opponent filled three spots, and the remaining spot is empty (the opponent wins by filling in the empty spot), and
- **-100 points** if the opponent has four discs in a row (the opponent won).

This is also represented in the image below.

<center>
<img src="https://i.imgur.com/UlUYhgr.png" width=50%><br/>
</center>

So, how exactly will the agent use the heuristic?  Consider it's the agent's turn, and it's trying to plan a move for the game board shown at the top of the figure below.  There are seven possible moves (one for each column).

<center>
<img src="https://i.imgur.com/lGfMNBK.png" width=100%><br/>
</center>

The heuristic assigns the first board (where the agent plays in column 0) a score of 20.  The second board is assigned a score of 10, and so on.  The first board receives the highest score, and so the agent will select this move.  It's also the best outcome for the agent, since it has a guaranteed win in just one more move.  

The heuristic works really well for this specific example, since it matches the best move with the highest score!  In general, if you're not sure how to design your heuristic (i.e., how to score different game states, or which scores to assign to different conditions), often the best thing to do is to simply take an initial guess at the heuristic and then play against your agent.  This will let you identify specific cases when your agent makes bad moves, which you can then fix by modifying the heuristic.

# Code

Our **one-step lookahead** agent will:
- use the heuristic to assign a score to each possible valid move, and
- select the move that gets the highest score.  (_If multiple moves get the high score, we select one at random._)

"One-step lookahead" refers to the fact that the agent looks only one step (or move) into the future, instead of deeper into the game tree.  This is implemented in the code cell below.

In [None]:
#$HIDE_INPUT$
import random
import numpy as np

In [None]:
class OneStep_Player:
    def __init__(self, num_rows=6, num_cols=7, in_a_row=4):
        self.num_rows = num_rows
        self.num_cols = num_cols
        self.in_a_row = in_a_row
        
    def get_valid_moves(self, state):
        is_valid_by_idx = [state[0][i]==0 for i in range(self.num_cols)]
        return np.where(is_valid_by_idx)[0]
    
    def drop_piece(self, state, col):
        next_state = state.copy()
        for row in range(self.num_rows-1, -1, -1):
            if next_state[row][col] == 0:
                break
        next_state[row][col] = 1
        return next_state
    
    def act(self, state):
        state = state.reshape(self.num_rows, self.num_cols)
        valid_moves = self.get_valid_moves(state)
        scores = dict(zip(valid_moves, [self.score_valid_move(state, col) for col in valid_moves]))
        max_val = max(scores.values())
        max_keys = [key for key in scores.keys() if scores[key] == max_val]
        col = random.choice(max_keys)
        return col
    
    def score_valid_move(self, state, col):
        next_state = self.drop_piece(state, col)
        num_threes = self.count_windows(next_state, 3, 1)
        num_fours = self.count_windows(next_state, 4, 1)
        num_threes_opp = self.count_windows(next_state, 3, -1)
        score = num_threes - 1e3*num_threes_opp + 1e6*num_fours
        return score
    
    def check_window(self, window, num_discs, piece):
        return (window.count(piece) == num_discs and window.count(0) == self.in_a_row-num_discs)
    
    def count_windows(self, state, num_discs=3, piece=1):
        num_windows = 0
        # horizontal
        for row in range(self.num_rows):
            for col in range(self.num_cols-(self.in_a_row-1)):
                window = list(state[row,col:col+self.in_a_row])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        # vertical
        for row in range(self.num_rows-(self.in_a_row-1)):
            for col in range(self.num_cols):
                window = list(state[row:row+self.in_a_row,col])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        # positive diagonal
        for row in range(self.num_rows-(self.in_a_row-1)):
            for col in range(self.num_cols-(self.in_a_row-1)):
                window = list(state[range(row, row+self.in_a_row), range(col, col+self.in_a_row)])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        # negative diagonal
        for row in range(self.in_a_row-1, self.num_rows):
            for col in range(self.num_cols-(self.in_a_row-1)):
                window = list(state[range(row, row-self.in_a_row, -1), range(col, col+self.in_a_row)])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        return num_windows

remember that the `act()` method is most important. gets set of valid moves, scores each of them, and then selects at random from the moves that maximize the score.

then they play against the agent

In [None]:
# code here

# Your turn

Continue to **[...link...](#$NEXT_NOTEBOOK_URL$)** ...