#Connect4 Game using Reinforcement Learning
###AI Project

# Connect4Game Class:

*   init: Initializes the game environment with a 6x7 grid board and sets the current player to 1.
*   render: Renders the current state of the game board.
*   reset: Resets the game board and sets the current player back to 1.
*    get_available_moves: Returns a list of available columns where a
     player can drop a token.
*    check_game_done: Checks if the game is over based on the current
     player's move, detecting horizontal, vertical, diagonal, or anti-diagonal wins, as well as draws.
*     make_move: Records the move made by a player, updates the game
      board, and returns the observation (current board state) and the reward based on the outcome of the move.

# DQNAgent Class:
* init: Initializes the DQN agent with parameters such as the state size, action size, memory buffer, discount rate (gamma), exploration rate (epsilon), and learning rate.
* _build_model: Builds the neural network model for the DQN with two hidden layers of 24 neurons each and a linear output layer.
remember: Stores the experiences (state, action, reward, next_state, done) in the agent's memory buffer.
* act: Selects an action based on the current state either randomly (exploration) or by using the model's prediction (exploitation).
* replay: Trains the agent's neural network using experiences sampled from the memory buffer via Q-learning.

# Training the Agent:
1. The agent plays Connect4 for a specified number of episodes (episodes) and updates its Q-values through the training process.
2. During each episode, the agent selects actions, makes moves, remembers experiences, and replays experiences to train the neural network.
3. The agent's model weights are saved periodically (agent.model.save_weights) to checkpoint files.

# Demonstration:
After training, the trained agent plays the game of Connect4 against itself or a random player.
The game is rendered after each move, showing the current state of the board until the game is finished.
The code follows a typical reinforcement learning setup with a game environment and an agent interacting with that environment to learn optimal policies through Q-learning.


#Create our environment (connect-4 game)

Connect-4 is a game in which if one were to connect 4 of his moves either horizontally, vertically or diagonally, one records a win. In reinforcement learning setting, we grant reward = 1 for a win, -1 for a lose and 0.5 for a draw. The following game engine object provides five methods.

Render :Showing the board state with 1 and 0s.

Reset : For playing over and over.

Get available moves : Scan the board state and give available moves.

Check game done : Based on which player is making the move, check if one has won the game or a draw has resulted.

Make move : Record the move by players and return observation and reward.

In [None]:
import numpy as np
import random
from collections import deque
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

class Connect4Game:
    def __init__(self):
        self.board = np.zeros((6, 7)) # 6x7 grid for the Connect-4 board
        self.current_player = 1 # Player 1 starts

    def render(self):
        # Render the board state with 1 and Os
        print(self.board)

    def reset(self):
        # Reset the board
        self.board = np.zeros((6, 7))
        self.current_player = 1

    def get_available_moves(self):
        # Scan the board state and give available moves
        return [col for col in range(7) if self.board[0][col] == 0]

    def check_game_done(self, player):
        # Check if the game is done based on the player's move
        for row in range(6):
            for col in range(4):
                if np.all(self.board[row, col:col+4] == player):
                    return True # Horizontal win
        for row in range(3):
            for col in range(7):
                if np.all(self.board[row:row+4, col] == player):
                    return True # Vertical win
        for row in range(3):
            for col in range(4):
                if np.all(self.board[row:row+4, col:col+4].diagonal() == player):
                    return True # Diagonal win
                if np.all(np.fliplr(self.board[row:row+4, col:col+4]).diagonal() == player):
                    return True # Anti-diagonal win
        if len(self.get_available_moves()) == 0:
            return True # Draw
        return False

    def make_move(self, column):
        # Record the move by players and return observation and reward
        row = 5
        while row >= 0:
            if self.board[row][column] == 0:
                self.board[row][column] = self.current_player
                break
            row -= 1
        if self.check_game_done(self.current_player):
            if self.current_player == 1:
                reward = 1
            else:
                reward = -1
        else:
            reward = 0.5
        observation = np.copy(self.board)
        self.current_player = 3 - self.current_player # Switch player
        return observation, reward

class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95    # discount rate
        self.epsilon = 1.0  # exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.learning_rate = 0.001
        self.model = self._build_model()

    def _build_model(self):
        # Neural Net for Deep-Q learning Model
        model = Sequential()
        model.add(Dense(24, input_dim=self.state_size, activation='relu'))
        model.add(Dense(24, activation='relu'))
        model.add(Dense(self.action_size, activation='linear'))
        model.compile(loss='mse', optimizer=Adam(lr=self.learning_rate))
        return model

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.randrange(self.action_size)
        act_values = self.model.predict(state)
        return np.argmax(act_values[0])  # returns action

    def replay(self, batch_size):
        minibatch = random.sample(self.memory, batch_size)
        for state, action, reward, next_state, done in minibatch:
            target = reward
            if not done:
                target = reward + self.gamma * np.amax(self.model.predict(next_state)[0])
            target_f = self.model.predict(state)
            target_f[0][action] = target
            self.model.fit(state, target_f, epochs=10, verbose=0)
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

if __name__ == "__main__":
    # Initialize game environment and agent
    env = Connect4Game()
    state_size = 42  # 6x7 grid
    action_size = 7  # 7 possible columns to drop a token
    agent = DQNAgent(state_size, action_size)

    # Training the agent
    episodes = 10000
    batch_size = 32
    for e in range(episodes):
        state = np.reshape(env.board, [1, state_size])
        for time in range(100):
            action = agent.act(state)
            next_state, reward = env.make_move(action)
            next_state = np.reshape(next_state, [1, state_size])
            agent.remember(state, action, reward, next_state, reward != 0.5)
            state = next_state
            if reward != 0.5:
                print("Episode: {}, Score: {}".format(e, reward))
                break
            if len(agent.memory) > batch_size:
                agent.replay(batch_size)
        if e % 10000 == 0:
            agent.model.save_weights("checkpoint_{}.h5".format(e))

    # Demonstration of trained agent playing the game
    env.reset()
    state = np.reshape(env.board, [1, state_size])
    done = False
    while not done:
        action = agent.act(state)
        next_state, reward = env.make_move(action)
        env.render()
        state = np.reshape(next_state, [1, state_size])
        done = reward != 0.5



[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Episode: 5114, Score: -1
Episode: 5115, Score: 1
Episode: 5116, Score: -1
Episode: 5117, Score: 1
Episode: 5118, Score: -1
Episode: 5119, Score: 1
Episode: 5120, Score: -1
Episode: 5121, Score: 1
Episode: 5122, Score: -1
Episode: 5123, Score: 1
Episode: 5124, Score: -1
Episode: 5125, Score: 1
Episode: 5126, Score: -1
Episode: 5127, Score: 1
Episode: 5128, Score: -1
Episode: 5129, Score: 1
Episode: 5130, Score: -1
Episode: 5131, Score: 1
Episode: 5132, Score: -1
Episode: 5133, Score: 1
Episode: 5134, Score: -1
Episode: 5135, Score: 1
Episode: 5136, Score: -1
Episode: 5137, Score: 1
Episode: 5138, Score: -1
Episode: 5139, Score: 1
Episode: 5140, Score: -1
Episode: 5141, Score: 1
Episode: 5142, Score: -1
Episode: 5143, Score: 1
Episode: 5144, Score: -1
Episode: 5145, Score: 1
Episode: 5146, Score: -1
Episode: 5147, Score: 1
Episode: 5148, Score: -1
Episode: 5149, Score: 1
Episode: 5150, Score: -1
Episode: 5151, Score: 1
Epis