Reinforcement_Learning_Demo

Reinformcement Learning implementation on a 2D game using Pytorch and Pygame frameworks

Game Description and objective:

The game is presented in an 8x8 block grid (each block is 50x50 pixels) and the main objective is for the Player (Agent) to first collect a Key then unlock a Door to complete the game level while avoiding the fire tile that leads to instant game over.

Game Components:

The Environment: It is the game grid with all its game components operating within it, different actions performed by the Agent result in different environments states
The Agent: The purple tile, it can move in black spaces and detects collisions with all other game assets resulting in different environment states
The Key: The yellow tile, once the Agent collides with it it dissapears from the game grid. At this point the Agent is ready to head for the exit
The Exit: The green tile, once the Agent collides with it after having the Key the game level is completed
The Fire: The orange tile, once the Agent collides with it, it is game over

Optimal Policy

The Agent in each game iteration has been trained to choose the action which will yield the best results in the optimal policy of the game which is to reach for the Key tile and then head for the level Exit while having performed the least possible amount of moves.
The optimal policy is dictated based on a reward system implemented that awards negative and positive points to the agent for each of its actions based on the state of the environment that they have resulted in. Maximum reward is given for completing the level, a smaller reward is given for collecting the key and negative reward is given for colliding with the fire tile. All other types of collisions do not give any reward.

The DQN model architecture

The model used for the RL task described in the sections above is DQN. It is a feed forward deep neural nework which takes as an input the current state of environment at a given game iteration and is then connected to 2 hidden layers that in turn are connected to an output layer (with RELU activation) that produces a trensor with 4 values. Based on the maximum of the 4 values the next move is decided.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
model		model
model_level1		model_level1
model_level2		model_level2
.gitattributes		.gitattributes
README.md		README.md
agent.py		agent.py
game_env.py		game_env.py
manual_game.py		manual_game.py
model.py		model.py
pixel_font.ttf		pixel_font.ttf
report.py		report.py
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reinforcement_Learning_Demo

Game Description and objective:

Game Components:

Optimal Policy

The DQN model architecture

For reference and comparison a manual game is also provided that needs user input instead of having an automatic agent

About

Uh oh!

Releases

Packages

Languages

EM-src/Reinforcement_Learning_Demo

Folders and files

Latest commit

History

Repository files navigation

Reinforcement_Learning_Demo

Game Description and objective:

Game Components:

Optimal Policy

The DQN model architecture

For reference and comparison a manual game is also provided that needs user input instead of having an automatic agent

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages