# Introduction

...

# Game trees

very similar to how a human generally thinks of how to play: i have 3 possible moves available to me. if i make the first move, then my opponent is most likely to do this, and then i'd make this move in response, and then she's likely to do that ... and in the end, if my predictions are correct, then i'm most likely to win.

we can only do this for a few alternatives. can represent all possible moves (by player and opponent) in a game tree. and if the tree begins with the starting board, we refer to it as "complete".

then show game tree. say it's for tic-tac-toe. only partial game tree shown here but represent all potential moves along with whether or not we won/lost the game.

To build a complete understanding of all possible moves in any game, we can build what's known as a **game tree**.  The game tree for **[tic-tac-toe](https://en.wikipedia.org/wiki/Tic-tac-toe)** is shown below.
<center>
<img src="https://i.imgur.com/EZKHxyy.png"><br/>
</center>

It begins with the starting board at the base of the tree (in this case, at the top of the figure above).  
- Then, we consider each possible move that the first player can make and record the resulting boards (in the game of tic-tac-toe, the player can choose to move in a corner, on a side, or in the center).  
- Next, for each of the boards, we consider every possible move that the second player can select in response, and record those resulting boards.  
- Then, we continue alternating between the players and recording all of the available moves and the resulting boards, until each "branch" reaches the end of the game.

This way, if we also record whether we won or lost in each of the branches, we can use the game tree to help us look into the future, until the end of each game, into each parallel theoretical universe.  can help us select which move is most likely to let us win.

# Heuristics

trouble is tic-tac-toe is a very small game in comparison to connect four, and we still can't draw the full game tree.  can't fit it in memory, incredibly inefficient.  but this is the main idea.  but we'll have to make it more efficient with some approximations (in general, we only have enough space for a very shallow game tree, and so won't be able to know for sure if a move is more likely to win or lose).

We will use a **heuristic** (or **heuristic function**) to determine the best move from each possible game state.  The function will take the game state as input, and give each potential move a score.  (Then, we'll select the move with the highest score.)  For instance, one heuristic that might work reasonably well for Connect Four is shown below.

We assign:
- 100 points for every location where we get four in a row, 
- 1 point for every group of four where have ...
- -10 points for every ...

then consider we're at the game state shown at the top of the figure below, where our agent is the red player.  to determine how to move, we consider all seven possible moves that we can make. need to flip colors in game board

<center>
<img src="https://i.imgur.com/jW0s2ke.png"><br/>
</center>

then, we look at the resulting boards.  and score them up. the board with the highest score corresponds to the move we will make

if we choose a good heuristic, will yield an agent that is reasonably intelligent. and, it's significantly less computation than if we built the entire game tree!  but the trouble is to define a good heuristic.

# Code

if build an agent that selects moves as we've described, where looks at only its immediate move and the resulting board to decide how to move, called one-step lookahead.  we build the agent below.

In [None]:
#$HIDE_INPUT$
import random
import numpy as np

In [None]:
class OneStep_Player:
    def __init__(self, num_rows=6, num_cols=7, in_a_row=4):
        self.num_rows = num_rows
        self.num_cols = num_cols
        self.in_a_row = in_a_row
        
    def get_valid_moves(self, state):
        is_valid_by_idx = [state[0][i]==0 for i in range(self.num_cols)]
        return np.where(is_valid_by_idx)[0]
    
    def drop_piece(self, state, col):
        next_state = state.copy()
        for row in range(self.num_rows-1, -1, -1):
            if next_state[row][col] == 0:
                break
        next_state[row][col] = 1
        return next_state
    
    def act(self, state):
        state = state.reshape(self.num_rows, self.num_cols)
        valid_moves = self.get_valid_moves(state)
        scores = dict(zip(valid_moves, [self.score_valid_move(state, col) for col in valid_moves]))
        max_val = max(scores.values())
        max_keys = [key for key in scores.keys() if scores[key] == max_val]
        col = random.choice(max_keys)
        return col
    
    def score_valid_move(self, state, col):
        next_state = self.drop_piece(state, col)
        num_threes = self.count_windows(next_state, 3, 1)
        num_fours = self.count_windows(next_state, 4, 1)
        num_threes_opp = self.count_windows(next_state, 3, -1)
        score = num_threes - 1e3*num_threes_opp + 1e6*num_fours
        return score
    
    def check_window(self, window, num_discs, piece):
        return (window.count(piece) == num_discs and window.count(0) == self.in_a_row-num_discs)
    
    def count_windows(self, state, num_discs=3, piece=1):
        num_windows = 0
        # horizontal
        for row in range(self.num_rows):
            for col in range(self.num_cols-(self.in_a_row-1)):
                window = list(state[row,col:col+self.in_a_row])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        # vertical
        for row in range(self.num_rows-(self.in_a_row-1)):
            for col in range(self.num_cols):
                window = list(state[row:row+self.in_a_row,col])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        # positive diagonal
        for row in range(self.num_rows-(self.in_a_row-1)):
            for col in range(self.num_cols-(self.in_a_row-1)):
                window = list(state[range(row, row+self.in_a_row), range(col, col+self.in_a_row)])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        # negative diagonal
        for row in range(self.in_a_row-1, self.num_rows):
            for col in range(self.num_cols-(self.in_a_row-1)):
                window = list(state[range(row, row-self.in_a_row, -1), range(col, col+self.in_a_row)])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        return num_windows

remember that the `act()` method is most important. gets set of valid moves, scores each of them, and then selects at random from the moves that maximize the score.

then they play against the agent

# Your turn

Continue to **[...link...](#$NEXT_NOTEBOOK_URL$)** ...