Skip to content

EM-src/Reinforcement_Learning_Demo

Repository files navigation

Reinforcement_Learning_Demo

Reinformcement Learning implementation on a 2D game using Pytorch and Pygame frameworks

Game Description and objective:

The game is presented in an 8x8 block grid (each block is 50x50 pixels) and the main objective is for the Player (Agent) to first collect a Key then unlock a Door to complete the game level while avoiding the fire tile that leads to instant game over.

Game Components:

The Environment: It is the game grid with all its game components operating within it, different actions performed by the Agent result in different environments states
The Agent: The purple tile, it can move in black spaces and detects collisions with all other game assets resulting in different environment states
The Key: The yellow tile, once the Agent collides with it it dissapears from the game grid. At this point the Agent is ready to head for the exit
The Exit: The green tile, once the Agent collides with it after having the Key the game level is completed
The Fire: The orange tile, once the Agent collides with it, it is game over

Game elements and level layout

Optimal Policy

The Agent in each game iteration has been trained to choose the action which will yield the best results in the optimal policy of the game which is to reach for the Key tile and then head for the level Exit while having performed the least possible amount of moves.
The optimal policy is dictated based on a reward system implemented that awards negative and positive points to the agent for each of its actions based on the state of the environment that they have resulted in. Maximum reward is given for completing the level, a smaller reward is given for collecting the key and negative reward is given for colliding with the fire tile. All other types of collisions do not give any reward.

The DQN model architecture

The model used for the RL task described in the sections above is DQN. It is a feed forward deep neural nework which takes as an input the current state of environment at a given game iteration and is then connected to 2 hidden layers that in turn are connected to an output layer (with RELU activation) that produces a trensor with 4 values. Based on the maximum of the 4 values the next move is decided.

DQN model architecture

For reference and comparison a manual game is also provided that needs user input instead of having an automatic agent

About

RL implementation on a 2D game made with Pygame framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages