Skip to content

Creating agents that learn to play Connect4 using Reinforcement learning.

Notifications You must be signed in to change notification settings

Vilijan/Connect4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Graduation thesis

This was my graduation thesis so you can find the whole documentation in Macedonian on this link

Connect4 agent using Reinforcement Learning

In this repository I have created an intelligent agent using reinforcement learning that learned how to play the game Connect4. I have experimented with DQN, minimax-DQN and DDQN algorithms.

The ConnectX competition that is happening on Kaggle has motivated me to learn how to create this kind of agents. The most trained agents based on the code in this repository managed to achive top 15% ranking in the competition.

Algorithm description

In the following part I will briefly explain the learning proccess of the agent and the code sections that make the alghorithm.

  • Environment - this object represents dynamics of the Connect4 game. Each game of the Connect4 represents one episode in the Connect4 environment. The state of the game is represented as a matrix of size 6x7 where each element of the matrix has one of the following three values: 0 - empty position in the board, 1 - mark put by the first player and 2 mark put by the second player. The environment objects has two main functions:
    • reset() - resets the environment meaning that new game will be played.
    • step(action) - executes the given action in the environment. After each executed action the agent receives the new state and a reward for the executed action.
  • Model - represents the neural network that decides which actions the agent should execute. The goal of the whole project is to train this network in order for the agent to win more often in the Connect4 game. This network represents the Q function of the DQN algorithm.
  • Experience - this object stores all the games that the agent has played which means that it represents the agent's experience of the game. Based on this experience the agent learns how to get better in the game.
  • Exploration strategy - represents an implementation of a specific algorithm that tackles the trade-off between exploration and exploation in RL problems. I implemented the simples algorithm which is the Epsilon Greedy Strategy.
  • Connect4 Learner - this object encapsulates the Model, Experience and the Exploration Strategy. Additionaly it implements the minimax-DQN and minimax-DDQN which is the main algorithm for training the policy_network. This object has one main function fit(EPOCHS) which trains the policy network based on the stored experience for the given number of epochs.
  • Self-play - this represents a function where an agent plays an episode in the environment against itself. The experience that is gathered during the episode is stored in the Experience object.
  • Hyperparameters - constants that are representing all the hyperparameters.
  • Agent Evaluation - evaluates the agent performance against a random agent and a negamax agent.
  • Agent submission - creates a submission file for the Kaggle's competition. Additionaly it adds two-step lookahead in order to provide better performance for the agent. The function encodes the weights of the policy network in a string using base64 encoding.

About

Creating agents that learn to play Connect4 using Reinforcement learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages