# Introduction

In the tutorial, you learned how to define a simple heuristic that the agent used to select moves.  In this exercise, you'll check your understanding and make the heuristic more complex.

To get started, run the code cell below to set up our feedback system.

In [None]:
from learntools.core import binder
binder.bind(globals())
from learntools.rl.ex2 import *

Run the code cell below to import some of the functions from the tutorial.  These will be useful for completing the exercise!

In [None]:
# Helper function for score_move: gets board at next step if agent drops piece in selected column
def drop_piece(grid, col, mark, config):
    next_grid = grid.copy()
    for row in range(config.rows-1, -1, -1):
        if next_grid[row][col] == 0:
            break
    next_grid[row][col] = mark
    return next_grid

# Helper function for get_heuristic: checks if window satisfies heuristic conditions
def check_window(window, num_discs, piece, config):
    return (window.count(piece) == num_discs and window.count(0) == config.inarow-num_discs)
    
# Helper function for get_heuristic: counts number of windows satisfying specified heuristic conditions
def count_windows(grid, num_discs, piece, config):
    num_windows = 0
    # horizontal
    for row in range(config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[row, col:col+config.inarow])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    # vertical
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns):
            window = list(grid[row:row+config.inarow, col])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    # positive diagonal
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row+config.inarow), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    # negative diagonal
    for row in range(config.inarow-1, config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
    return num_windows

### 1) A more complex heuristic

The heuristic from the tutorial looks at all groups of four adjacent grid locations on the same row, column, or diagonal and assigns points for each occurrence of the following patterns:

<center>
<img src="https://i.imgur.com/vQ8b1aX.png" width=60%><br/>
</center>

In the image above, we assume that the agent is the red player, and the opponent plays yellow discs.  For reference, here is the `get_heuristic()` function from the tutorial:
```python
def get_heuristic(grid, mark, config):
    num_threes = count_windows(grid, 3, mark, config)
    num_fours = count_windows(grid, 4, mark, config)
    num_threes_opp = count_windows(grid, 3, mark%2+1, config)
    num_fours_opp = count_windows(grid, 4, mark%2+1, config)
    score = num_threes - 1e2*num_threes_opp - 1e4*num_fours_opp + 1e6*num_fours
    return score
```

In the `get_heuristic()` function, `num_fours`, `num_threes`, `num_threes_opp`, and `num_fours_opp` are the number of windows in the game grid that are assigned 1000000, 1, -100, and -10000 point(s), respectively. 
    
In this tutorial, you'll change the heuristic to the following:

<center>
<img src="https://i.imgur.com/bIS2qbw.png" width=80%><br/>
</center>
    
Define the new heuristic in the `get_heuristic_q1()` function.

In [None]:
def get_heuristic_q1(grid, col, mark, config):
    num_twos = count_windows(grid, 2, mark, config)
    num_threes = count_windows(grid, 3, mark, config)
    num_fours = count_windows(grid, 4, mark, config)
    num_twos_opp = count_windows(grid, 2, mark%2+1, config)
    num_threes_opp = count_windows(grid, 3, mark%2+1, config)
    # Your code here: Calculate the score
    score = ____
    return score

# Check your answer
q_1.check()

In [None]:
#%%RM_IF(PROD)%%
def score_move_q1(grid, col, mark, config):
    next_grid = drop_piece(grid, col, mark, config)
    num_twos = count_windows(next_grid, 2, mark, config)
    num_threes = count_windows(next_grid, 3, mark, config)
    num_fours = count_windows(next_grid, 4, mark, config)
    num_twos_opp = count_windows(next_grid, 2, mark%2+1, config)
    num_threes_opp = count_windows(next_grid, 3, mark%2+1, config)
    score = num_threes - 1e3*num_threes_opp + 1e6*num_fours
    return score

q_1.assert_check_passed()

In [None]:
# Lines below will give you a hint or solution code
#_COMMENT_IF(PROD)_
q_1.hint()
#_COMMENT_IF(PROD)_
q_1.solution()

Run the code cell below to create and play against the agent.

In [None]:
# Calculates score if agent drops piece in selected column: do not change this function!
def score_move_q1(grid, col, mark, config):
    next_grid = drop_piece(grid, col, mark, config)
    score = get_heuristic_q1(next_grid, mark, config)
    return score

# Improved agent: do not change this function!
def agent_q1(obs, config):
    # Get list of valid moves
    valid_moves = [c for c in range(config.columns) if obs.board[c] == 0]
    # Convert the board to a 2D grid
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)
    # Use the heuristic to assign a score to each possible board in the next step
    scores = dict(zip(valid_moves, [score_move_q1(grid, col, obs.mark, config) for col in valid_moves]))
    # Get a list of columns (moves) that maximize the heuristic
    max_cols = [key for key in scores.keys() if scores[key] == max(scores.values())]
    # Select at random from the maximizing columns
    return random.choice(max_cols)

### 2) Does the agent win?

Does the heuristic from the previous question seem like a good option?  Consider the game board below.  

<center>
<img src="https://i.imgur.com/AlnaQ3J.png" width=30%><br/>
</center>

Say the agent uses red discs, and it's the agent's turn.  
- If the agent uses the heuristic **_that you just implemented_**, does it win or lose the game?
- If the agent uses the heuristic **_from the tutorial_**, does it win or lose the game?

In [None]:
#_COMMENT_IF(PROD)_
q_2.hint()

In [None]:
#_COMMENT_IF(PROD)_
q_2.solution()

### 3) Submit to the competition

Now, it's time to submit to the competition!

PUT INSTRUCTIONS HERE

In [None]:
# Run this code cell after you have successfully made a submission!
q_3.check()

# Keep going

Move on to **[develop a longer-term strategy](#$NEXT_NOTEBOOK_URL$)** with the minimax algorithm.