# Introduction

You have seen how to define a random agent.  In this exercise, you'll make a few improvements.

To get started, run the code cell below to set up our feedback system.

In [None]:
from learntools.core import binder
binder.bind(globals())
from learntools.rl.ex1 import *

### 1) A smarter agent

We can improve our performance without devising a complicated strategy, simply by selecting a winning move, if one is available.

Define an agent that:
- selects the winning move, if it is available.  (_If there is more than one move that lets the agent win the game, the agent can select any of them._)
- Otherwise, it should select a random move.

To help you with this exercise, we have provided an agent with a very useful helper function:
- the `check_winning_move` method takes two required arguments: `state` (corresponding to the current game state) and `col` (any valid move).  The method returns `True` if dropping a piece in the provided column wins the game for the agent, and otherwise returns `False`.  (There's a third, optional argument that we'll discuss in the next exercise.)

**To complete this exercise, you need to finish the `act()` method.  To do this, you'll need to use the `check_winning_move` function.**  

Many of the other methods (like `drop_piece`, `check_window`, and `count_windows`) are called in the `check_winning_move` function.  Feel free to examine the details below, but you won't need to use these to solve the exercise.

In [None]:
# Random agent that selects winning move, if it is available
# Define opponent
class BetterRandom_Player:
    def __init__(self, num_rows=6, num_cols=7, in_a_row=4):
        self.num_rows = num_rows
        self.num_cols = num_cols
        self.in_a_row = in_a_row
        
    def get_valid_moves(self, state):
        is_valid_by_idx = [state[0][i]==0 for i in range(self.num_cols)]
        return np.where(is_valid_by_idx)[0]
    
    def drop_piece(self, state, col):
        next_state = state.copy()
        for row in range(self.num_rows-1, -1, -1):
            if next_state[row][col] == 0:
                break
        next_state[row][col] = 1
        return next_state
    
    def act(self, state):
        state = state.reshape(self.num_rows, self.num_cols)
        valid_moves = self.get_valid_moves(state)
        # Currently, the agent selects a random move.  Change this!
        col = random.choice(valid_moves)
        return col
    
    def check_winning_move(self, state, col, agent=True):
        next_state = self.drop_piece(state, col)
        if agent==True:
            num_fours = self.count_windows(next_state, 4, 1)
        else:
            num_fours = self.count_windows(next_state, 4, -1)
        is_winning_move = (num_fours > 0)
        return is_winning_move
    
    def check_window(self, window, num_discs, piece):
        return (window.count(piece) == num_discs and window.count(0) == self.in_a_row-num_discs)
    
    def count_windows(self, state, num_discs=4, piece=1):
        num_windows = 0
        # horizontal
        for row in range(self.num_rows):
            for col in range(self.num_cols-(self.in_a_row-1)):
                window = list(state[row,col:col+self.in_a_row])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        # vertical
        for row in range(self.num_rows-(self.in_a_row-1)):
            for col in range(self.num_cols):
                window = list(state[row:row+self.in_a_row,col])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        # positive diagonal
        for row in range(self.num_rows-(self.in_a_row-1)):
            for col in range(self.num_cols-(self.in_a_row-1)):
                window = list(state[range(row, row+self.in_a_row), range(col, col+self.in_a_row)])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        # negative diagonal
        for row in range(self.in_a_row-1, self.num_rows):
            for col in range(self.num_cols-(self.in_a_row-1)):
                window = list(state[range(row, row-self.in_a_row, -1), range(col, col+self.in_a_row)])
                if self.check_window(window, num_discs, piece):
                    num_windows += 1
        return num_windows

# Check your answer
q_1.check()

In [None]:
#%%RM_IF(PROD)%%

q_1.assert_check_passed()

In [None]:
# Lines below will give you a hint or solution code
#_COMMENT_IF(PROD)_
q_1.hint()
#_COMMENT_IF(PROD)_
q_1.solution()

### 2) An even smarter agent

In the previous question, you created an agent that selects winning moves.  In this problem, you'll amend the code to create an agent that can also block its opponent from winning.  In particular, your agent should:
- Select a winning move, if one is available.
- Otherwise, it selects a move to block the opponent from winning, if the opponent has a winning move. 
- If neither the agent nor the opponent can win in the next move, the agent selects a random move.

To help you with this exercise, you are encouraged to start with the agent from the previous exercise.  

**To check if the opponent has a winning move, you can use the `check_winning_move` function, but need to set `agent` (the third argument) to `False`.**  The default provided value for this function is `True`, and in this case, we check if the agent has a winning move.

In [None]:
# Your code here: ...

# Check your answer
q_2.check()

In [None]:
#%%RM_IF(PROD)%%

q_2.assert_check_passed()

In [None]:
# Lines below will give you a hint or solution code
#_COMMENT_IF(PROD)_
q_2.hint()
#_COMMENT_IF(PROD)_
q_2.solution()

### 3) Looking ahead

So far, you have encoded an agent that always selects the winning move, if it's available.  And, it can also block the opponent from winning.

Will this produce an agent that always either wins or draws the game?  If not, where (specifically) does the agent have room for improvement?

In [None]:
# play the agent

In [None]:
#_COMMENT_IF(PROD)_
q_3.hint()

In [None]:
#_COMMENT_IF(PROD)_
q_3.solution()

### 4) Submit to the competition!

...

# Keep going

Learn how to **[use heuristics](#$NEXT_NOTEBOOK_URL$)** to improve your agent.