This is my second attempt at implementing Q-Learning on 2048, following a previous failed project (https://github.com/Bloodaxe90/2048-Q-Learning) where I used a tabular approach (more details in that repository). Thankfully, this attempt was much more successful!
This project's UI was built using PySide6 with Qt Designer, and it uses TensorBoard for inference.
- Activate a virtual environment.
- Run
pip install -r requirements.txtto install the dependencies. - Either:
- Run
main.pyto train a model - Run
application.pyto test a model play 2048 visually or play 2048 yourself.
Hyperparameters found in main.py:
EPISODES(int): The number of episodes to train across.-
HIDDEN_NEURONS(tuple[int]): Defines the number of hidden neurons in each hidden layer. The number of hidden layers islen(HIDDEN_NEURONS) - 1. For example,(128, 64, 32)results in two hidden layers: the first with 128 input and 64 output neurons, and the second with 64 input and 32 output neurons. REPLAY_CAPACITY(int): The capacity of the replay buffer.BATCH_SIZE(int): The number of experiences used in each training step.ALPHA(float): The learning rate.GAMMA(float): The discount factor.TRIAL_NAME(str): The name of the current experiment, used as part of the filename for the TensorBoard logs.-
MAIN_UPDATE_COUNT(int): The number of training steps performed on the main network when an update condition is met. MAIN_UPDATE_FREQ(int): The frequency (in episodes) at which the main network is updated.TARGET_UPDATE_FREQ(int): The frequency (in episodes) at which the target network is updated from the main network.-
MODEL_SAVE_NAME(str): The name to save the trained model under. Leave as an empty string if the model should not be saved.
Hyperparameters found in application.py:
MODEL_LOAD_NAME(str): The name of the model to load and use for playing 2048.-
MODEL_LOAD_HIDDEN_NEURONS(tuple[int]): The hidden layer structure of the model being loaded. Follows the same format asHIDDEN_NEURONSdescribed above.
- Default Radio Button: (Disabled while the agent is autoplaying)
Allows you to play 2048 manually.
- Arrow Keys: Move the number tiles in the corresponding direction.
- Q-AI Radio Button:
Enables the agent to play automatically.
- S Key: Starts or stops the agent autoplaying 2048.
- Space Bar: Resets the game (Disabled while the agent is autoplaying).
These baseline results are when the agent played using a random policy. This image can originally be found in the Experiment Notebook.
After a lot of testing I trained my model for 30,000 episodes which took about 4 days. These results show an amazing improvement from the baseline with the agent even reaching a score of 2048 occasionally, I am sure that with more training an agent would be able to consistently reach a value of 2048. The original image of the results can be found in the Inference Notebook.
Screenshot of the final UI:


