# RL Agent for Game Playing

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Understand the key concepts of this topic
- Apply the topic using Python code examples
- Practice with small, realistic datasets or scenarios

## ðŸ”— Prerequisites

- âœ… Basic Python
- âœ… Basic NumPy/Pandas (when applicable)

---

## Official Structure Reference

This notebook supports **Course 09, Unit 5** requirements from `DETAILED_UNIT_DESCRIPTIONS.md`.

---


# RL Agent for Game Playing
## AIAT 123 - Reinforcement Learning

## Learning Objectives

- Build RL agent for game playing
- Implement Q-learning for games
- Train agent to play Connect 4
- Evaluate agent performance

## Real-World Context

Game AI development, strategy games, and competitive AI.

**Industry Impact**: Powers game AI in chess, Go, video games.

In [1]:
%pip install numpy -q
import numpy as np
print('âœ… Setup complete!')

Note: you may need to restart the kernel to use updated packages.


âœ… Setup complete!


## Part 1: Simple Game Environment (Tic-Tac-Toe)


In [2]:
class TicTacToe:
    """Simple Tic-Tac-Toe game environment"""
    def __init__(self):
        self.reset()
    
    def reset(self):
        self.board = np.zeros((3, 3), dtype=int)
        self.current_player = 1
        return self.board.copy()
    
    def step(self, action):
        row, col = action // 3, action % 3
        if self.board[row, col] != 0:
            return self.board.copy(), -10, True, {}  # Invalid move
        
        self.board[row, col] = self.current_player
        
        # Check win
        if self.check_win():
            return self.board.copy(), 10, True, {}
        
        # Check draw
        if np.all(self.board != 0):
            return self.board.copy(), 0, True, {}
        
        self.current_player = -self.current_player
        return self.board.copy(), 0, False, {}
    
    def check_win(self):
        """Check if current player won"""
        player = self.current_player
        # Check rows, columns, diagonals
        for i in range(3):
            if np.all(self.board[i] == player) or np.all(self.board[:, i] == player):
                return True
        if np.all(np.diag(self.board) == player) or np.all(np.diag(np.fliplr(self.board)) == player):
            return True
        return False

print('âœ… Game environment created')

âœ… Game environment created


## Part 2: Q-Learning Agent


In [3]:
class QLearningAgent:
    """Q-learning agent for game playing"""
    def __init__(self, learning_rate=0.1, discount=0.95, epsilon=0.1):
        self.q_table = {}
        self.lr = learning_rate
        self.gamma = discount
        self.epsilon = epsilon
    
    def get_state_key(self, board):
        """Convert board to hashable key"""
        return tuple(board.flatten())
    
    def get_q_value(self, state, action):
        """Get Q-value for state-action pair"""
        key = (self.get_state_key(state), action)
        return self.q_table.get(key, 0.0)
    
    def update_q_value(self, state, action, reward, next_state, done):
        """Update Q-value using Q-learning"""
        key = (self.get_state_key(state), action)
        current_q = self.q_table.get(key, 0.0)
        
        if done:
            target_q = reward
        else:
            next_actions = [i for i in range(9) if next_state.flatten()[i] == 0]
            if next_actions:
                max_next_q = max([self.get_q_value(next_state, a) for a in next_actions])
                target_q = reward + self.gamma * max_next_q
            else:
                target_q = reward
        
        self.q_table[key] = current_q + self.lr * (target_q - current_q)
    
    def select_action(self, state, available_actions):
        """Select action using epsilon-greedy"""
        if np.random.random() < self.epsilon:
            return np.random.choice(available_actions)
        
        q_values = [self.get_q_value(state, a) for a in available_actions]
        return available_actions[np.argmax(q_values)]

print('âœ… Q-learning agent implemented')

âœ… Q-learning agent implemented


## Real-World Applications

- **Chess/Go**: DeepMind AlphaZero
- **Video Games**: Dota 2, StarCraft II
- **Board Games**: Connect 4, Checkers
- **Puzzle Games**: Rubik's Cube solving

---

**End of Notebook**