# Introduction

In the previous tutorial, you learned how to build an agent with one-step lookahead.  This agent performs reasonably well, but definitely still has room for improvement!  For instance, consider the potential moves in the figure below.

NEED TO FIX FIGURE
<center>
<img src="https://i.imgur.com/Zi77Qf5.png" width=80%><br/>
</center>

With one-step lookahead, the red player picks one of column 4 or 5, each with 50% probability.  But, column 5 is the better move, as it puts the player in a position to certainly win the game in only one more turn.  (_So, ideally, the agent only selects column 5 from this board._)  Unfortunately, the agent doesn't know this, because it can only look one move into the future.  

In this tutorial, you'll use the **minimax algorithm** to help the agent look farther into the future and make better-informed decisions.

# N-step lookahead

We'd like to leverage information from deeper in the game tree.  For now, assume we work with a depth of 3.  This way, when deciding its move, the agent considers all possible game boards that can result from  
1. the agent's move, 
2. the opponent's move, and 
3. the agent's next move.  

This agent will be more complex than the one-step lookahead agent.  To see this, we'll work with an example.  For simplicity, we assume that at each turn, both the agent and opponent have only two possible moves.

<center>
<img src="https://i.imgur.com/Vgf1OwI.png" width=80%><br/>
</center>

As before, the current game state is at the top of the figure, and we've recorded the number of points assigned to each board at the bottom of the tree.  The agent's goal is to end up with a score that's as high as possible. 

But notice that the agent no longer has complete control over its score -- after the agent makes its move, the opponent selects its own move.  And, the opponent's selection can prove disastrous for the agent!  In particular, the opponent can ensure the agent never receives a score of +10.  
- If the agent chooses the left branch, the opponent can force a score of -10.  
- If the agent chooses the right branch, the opponent can force a score of 0.  

This is depicted in the figure below (_where the agent's selection and the opponent's response are marked as [1] and [2], respectively_).

<center>
<img src="https://i.imgur.com/x6AGOQf.png" width=80%><br/>
</center>

Since the agent's goal (to win the game) is at odds with the opponent's goal, it's not quite clear how the agent should come up with a long-term strategy (that can't be ruined by the opponent).  Thankfully, an algorithm exists for this, called the **minimax algorithm**.

# Minimax

For now, we assume the opponent uses the same heuristic as the agent, and the opponent selects moves to *minimize* the score.  This is a reasonable assumption, since the heuristic is designed to attach higher scores to boards where the agent is more likely to win the game. 

If this assumption is true, then the agent's best strategy for selecting moves is given by the minimax algorithm.  The pseudocode can be [found on Wikipedia](https://en.wikipedia.org/wiki/Minimax) and is shown below.

```
function minimax(node, depth, maximizingPlayer) is
    if depth = 0 or node is a terminal node then
        return the heuristic value of node
    if maximizingPlayer then
        value := −∞
        for each child of node do
            value := max(value, minimax(child, depth − 1, FALSE))
        return value
    else (* minimizing player *)
        value := +∞
        for each child of node do
            value := min(value, minimax(child, depth − 1, TRUE))
        return value
```
```
(* Initial call *)
minimax(origin, depth, TRUE)
```

# Code

.

In [None]:
#$HIDE_INPUT$
import random
import numpy as np

then they play against the agent

# Your turn

Continue to **[...link...](#$NEXT_NOTEBOOK_URL$)** ...