Deep Q-Learning for 2048

Description

This is my second attempt at implementing Q-Learning on 2048, following a previous failed project (https://github.com/Bloodaxe90/2048-Q-Learning) where I used a tabular approach (more details in that repository). Thankfully, this attempt was much more successful!

This project's UI was built using PySide6 with Qt Designer, and it uses TensorBoard for inference.

Usage:

Activate a virtual environment.
Run pip install -r requirements.txt to install the dependencies.
Either:

Run main.py to train a model
Run application.py to test a model play 2048 visually or play 2048 yourself.

Hyperparameters:

Hyperparameters found in main.py:

EPISODES (int): The number of episodes to train across.
HIDDEN_NEURONS (tuple[int]): Defines the number of hidden neurons in each hidden layer. The number of hidden layers is len(HIDDEN_NEURONS) - 1. For example, (128, 64, 32) results in two hidden layers: the first with 128 input and 64 output neurons, and the second with 64 input and 32 output neurons.
REPLAY_CAPACITY (int): The capacity of the replay buffer.
BATCH_SIZE (int): The number of experiences used in each training step.
ALPHA (float): The learning rate.
GAMMA (float): The discount factor.
TRIAL_NAME (str): The name of the current experiment, used as part of the filename for the TensorBoard logs.
MAIN_UPDATE_COUNT (int): The number of training steps performed on the main network when an update condition is met.
MAIN_UPDATE_FREQ (int): The frequency (in episodes) at which the main network is updated.
TARGET_UPDATE_FREQ (int): The frequency (in episodes) at which the target network is updated from the main network.
MODEL_SAVE_NAME (str): The name to save the trained model under. Leave as an empty string if the model should not be saved.

Hyperparameters found in application.py:

MODEL_LOAD_NAME (str): The name of the model to load and use for playing 2048.
MODEL_LOAD_HIDDEN_NEURONS (tuple[int]): The hidden layer structure of the model being loaded. Follows the same format as HIDDEN_NEURONS described above.

Controls:

Default Radio Button: (Disabled while the agent is autoplaying)
Allows you to play 2048 manually.
- Arrow Keys: Move the number tiles in the corresponding direction.
Q-AI Radio Button:
Enables the agent to play automatically.
- S Key: Starts or stops the agent autoplaying 2048.
Space Bar: Resets the game (Disabled while the agent is autoplaying).

Results

Baseline

These baseline results are when the agent played using a random policy. This image can originally be found in the Experiment Notebook.

Final Results

After a lot of testing I trained my model for 30,000 episodes which took about 4 days. These results show an amazing improvement from the baseline with the agent even reaching a score of 2048 occasionally, I am sure that with more training an agent would be able to consistently reach a value of 2048. The original image of the results can be found in the Inference Notebook.

Screenshot of the final UI:

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.idea		.idea
notebooks		notebooks
resources		resources
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Q-Learning for 2048

Description

Usage:

Hyperparameters:

Controls:

Results

Baseline

Final Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Q-Learning for 2048

Description

Usage:

Hyperparameters:

Controls:

Results

Baseline

Final Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages