Develop a Reinforcement Learning (RL) agent capable of effectively playing Connect 4 by:
- Implementing and testing various RL algorithms.
- Comparing the performance of these approaches.
Connect 4 is a two-player, turn-based strategy game. The objective is to be the first to connect four of your pieces vertically, horizontally, or diagonally.
- Board Dimensions: 6 rows × 7 columns.
- Players: Each player uses distinct tokens (e.g., red and yellow).
- Turns: Players take turns dropping tokens into columns.
- Winning: Form a line of four tokens.
- Draw: If the board is full and no one connects four.
- Game Board: Represented as a 6 × 7 grid with 42 blocks.
0: Empty block.1: Filled by Player 1.2: Filled by Player 2.
- Actions:
- Action Space: 7 discrete actions (columns 0-6).
- Invalid actions (e.g., selecting a full column) incur penalties.
- Winning Move: +10
- Blocking Opponent: +3
- Illegal Move: -50
- Draw: 0
- Ongoing Game: 0
The Connect 4 environment is implemented as a custom Gym environment. Key features include:
- Action Space: Seven discrete actions representing columns.
- Observation Space: A 6 × 7 grid indicating the board state.
- Pygame Rendering: The game board is visually rendered using Pygame.
- Random Opponent: By default, the opponent is random, but it can be configured with heuristic or any other trained agents.
- Reward System: Rewards and penalties guide the agent's learning process.
The Heuristic Agent employs domain-specific knowledge to make decisions. Key features include:
- Winning Move Priority: Prioritizes actions leading to an immediate win.
- Blocking Opponent: Detects and blocks the opponent's winning moves.
- Central Column Preference: Encourages moves near the center column for strategic positioning.
- Upper Confidence Bound (UCB): Balances exploration and exploitation during gameplay.
- Scoring Mechanism: Evaluates potential moves based on heuristics such as chain formation and board evaluation.
The Minimax Agent uses a game-tree search algorithm with the following features:
- Recursive Search: Evaluates potential moves up to a configurable depth.
- Alpha-Beta Pruning: Optimizes the search by eliminating branches that do not influence the final decision.
- Winning and Blocking Detection: Identifies and prioritizes winning or blocking moves.
- Terminal State Evaluation: Determines outcomes such as wins, losses, or draws at leaf nodes.
- Strategic Play: Calculates optimal moves for both the agent and opponent to maximize the agent's chances of winning.
The play function allows you to interact with the Connect 4 environment. You can either play as a human against an agent or have two agents compete.
- Parameters:
opponent: The agent you want to play against.player(optional): An agent to play as Player 1 (default is human input).
- How to Run:
- Human vs. Agent: Run
play(agent). - Agent vs. Agent: Provide both
playerandopponentagents.
- Human vs. Agent: Run
The test function evaluates an agent's performance against an opponent over multiple episodes.
- Parameters:
player: The agent to be tested.opponent(optional): The agent acting as the opponent (default is random).mode: Rendering mode (e.g., "human" or "stdout").
- How to Run:
- Use
test(player, opponent)to compute win rates.
- Use
- Example:
- Evaluate an ensemble agent against random opponents:
players = [PPO.load(f"./saved_agents/agent{i}") for i in range(15)] ensemble_player = EnsembleAgent(players) NUM_GAMES = 10000 wins = 0 for _ in range(NUM_GAMES): wins += test(ensemble_player) winrate = wins / NUM_GAMES print("Win rate:", winrate)
- Evaluate an ensemble agent against random opponents:
The train function is used to train an RL agent on the Connect 4 environment.
- Parameters:
agent_algo: The RL algorithm to use (e.g., PPO, DQN, A2C).total_timesteps: Total timesteps for training.opp_path(optional): Path to a pre-trained opponent agent.opp_epsilon: Exploration factor for the opponent.tb_dir: Directory for TensorBoard logs.tb_log_name: Name for TensorBoard logs.
- How to Run:
- Train a PPO agent:
agent = train(agent_algo="PPO", total_timesteps=100_000, tb_dir="./logs", tb_log_name="PPO_agent")
- Train an agent with a pre-trained opponent:
agent = train(agent_algo="PPO", total_timesteps=100_000, opp_path="./saved_agents/agent0", opp_algo="PPO", tb_dir="./logs", tb_log_name="PPO_vs_agent0")
- Train a PPO agent:
- Gymnasium: For creating the Connect 4 environment.
- Stable-Baselines3 (SB3): For RL algorithms such as PPO, DQN, and A2C.
- Pygame: For UI development and gameplay visualization.
- tqdm: For progress bar visualization during training and evaluation.
- Install dependencies:
pip install gymnasium stable-baselines3 pygame. - Clone the repository and navigate to the project directory.
- Run training scripts for specific algorithms.
- Visualize results using TensorBoard.
