# Deep Q Learning with the game 2048

This repository trains a q deep learning network from the game 2048 and plots a performance graph. The gamelogic is based on the implementation from Georg Wiese https://github.com/georgwiese/2048-rl . The deep q learning code is loosely based on the implementation from this repository https://github.com/keon/deep-q-learning, but was improved and adapted to include the game 2048.

## Overview

### The game 2048

2048 is a single-player sliding block puzzle game developed by Gabriele Cirulli in 2014. The game represents a 4 × 4 grid where the value of each cell is a power of 2. An action can be any of the 4 movements: up, down, left right. When an action is performed, all cells move in the chosen direction. Any two adjacent cells with the same value (power of 2) along this direction merge to form one single cell with value equal to the sum of the two cells (i.e. the next power of 2). The objective of the game is to combine cells until reaching 2048.

### Artificial Neural Network

The deep network is a standard artificial neural network consisting of two fully connected hidden layers with 256 nodes each. As activation functions ReLu was used for all layers, which guarantees non vanishing gradients. The loss was computed using mse, as optimizer we used Adam. Choosing different configurations then mse and Adam didn't yield much different results.


The hyperparameters can be set in the file parameters.py.

The most successful run was achieved using the following configuration:

gamma = 0.00001    # discount rate
epsilon_decay = 0.9995
learning_rate = 0.001
batch_size = 32
two hidden layers with 256 nodes each
reward = score of the game, consisting of the highest tile achieved. E.g. if tile 256 was achieved, the score of this game was 256.



To start training, execute all cells. To start plotting the graph, execute all cells of plot.ipynb, located in the root folder DQN-2048.

The repo consists of two parts: the learning part and the full programmed game of 2048.
The gamelogic of the game 2048 can be found in the folder gamelogic in file game.py.

First, let's import the libraries and the gamelogic

In [None]:
import numpy as np
import random
from collections import deque
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
import json
import time
from shutil import copyfile
import parameters
import os

from gamelogic.game import Game

In [None]:
Next, define the number of episodes to train the 

In [2]:
EPISODES = 100000

path  = os.getcwd()

class DQNAgent:
    def __init__(self):
        self.state_size = 16
        self.action_size = 4 # (up, down, right, left)
        self.memory = deque(maxlen=5000000)
        self.gamma = parameters.gamma    # discount rate
        self.epsilon = 1.0  # exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = parameters.epsilon_decay
        self.learning_rate = parameters.learning_rate
        self.model = self._build_model()
        self.batch_size = parameters.batch_size
        self.is_max_value_reward = parameters.is_max_value_reward
        self.max_value_reward_threshold = parameters.max_value_reward_threshold
        self.max_value_reward_amount = parameters.max_value_reward_amount
        self.output_name = parameters.output_name


    def _build_model(self):
        # Neural Net for Deep-Q learning Model
        model = Sequential()
        model.add(Dense(24, input_dim=self.state_size, activation='relu'))
        model.add(Dense(256, activation='relu'))
        model.add(Dense(256, activation='relu'))
        model.add(Dense(self.action_size, activation='relu'))
        model.compile(loss='mse',
                      optimizer=Adam(lr=self.learning_rate))
        return model

    def remember(self, state, action, reward, next_state, done):
        """algorithm tends to forget the previous experiences as it overwrites them with new experiences.
        Therefore we re-train the model with previous experiences."""
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.choice(game.available_actions())
        #forward feeding
        act_values = self.model.predict(state)
        #sets q-values of not available actions to -100 so they are not chosen
        if len(game.available_actions())< 4:
          temp = game.available_actions()
          for i in range(0, 4):
            if i not in temp:
              act_values[0][i] = -100
        #returns action with highest q-value
        return np.argmax(act_values[0])

    def replay(self, batch_size):
        """trains the neural net with experiences from memory (minibatches)"""
        #samples mimibatch from memory
        minibatch = random.sample(self.memory, batch_size)
        #for each memory
        for state, action, reward, next_state, done in minibatch:
            #if its final state set target to the reward
            target = reward
            if not done:
                #set target according to formula
                target = (reward + self.gamma * np.amax(self.model.predict(next_state)[0]))
            #gets all 4 predictions from current state
            target_f = self.model.predict(state)
            #takes the one action which was selected in batch
            target_f[0][action] = target
            #trains the model
            self.model.fit(state, target_f, epochs=1, verbose=0)
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

    def load(self, name):
        self.model.load_weights(name)

    def save(self, name):
        self.model.save_weights(name)



if __name__ == "__main__":
    game = Game()
    agent = DQNAgent()
    # agent.load("./save/file")
    done = False
    batch_size = agent.batch_size
    debug = False
    save_maxvalues = True
    output_list = []


    for e in range(EPISODES):
        game.new_game()
        state = game.state()
        state = np.reshape(state, [1, agent.state_size])
        while not game.game_over():
            action = agent.act(state)
            reward = (game.do_action(action))**2
            if(agent.is_max_value_reward):
                reward = 0
                temp = game.state()
                temp_reshaped = np.reshape(temp, [1, agent.state_size])
                temp_max_value = np.amax(temp_reshaped[0])
                if temp_max_value > agent.max_value_reward_threshold:
                    reward = agent.max_value_reward_amount
            next_state = game.state()
            actions_available = game.available_actions()
            if len(actions_available) == 0: 
                done = True
            else:
                done = False
            next_state = np.reshape(next_state, [1, agent.state_size])
            agent.remember(state, action, reward, next_state, done)
            state = next_state

            if done:
                if (debug): print("no action available")
                states = game.state()
                states = np.reshape(state, [1, agent.state_size])
                max_value = np.amax(states[0])
                output_list.append([e, np.asscalar(max_value), np.asscalar(game.score()), agent.epsilon])
                if(debug):print("max_value: " + str(max_value))
                break
        print("episodes: " + str(e))

        #save copy of configuration and the episode_maxvalue_data
        if save_maxvalues:
            if e == 100:
                src = path + "/learn.py"
                dst = path + "/data/"+agent.output_name+"config.py"
                copyfile(src, dst)
                output_list.insert(0, "gamma: "+str(parameters.gamma)+" | epsilon decay: "+str(parameters.epsilon_decay)+" | learning rate: "+str(parameters.learning_rate)+"\n batch size: "+str(parameters.batch_size)+" | reward = maxVal: "+str(parameters.is_max_value_reward)+" | reward amount: "+str(parameters.max_value_reward_amount)+" | reward threshold: "+str(parameters.max_value_reward_threshold))
            if e % 100 == 0:
                with open(path + "/data/"+agent.output_name+"output.txt", "w") as outfile:
                    json.dump(output_list, outfile)

        if len(agent.memory) > batch_size:
            agent.replay(batch_size)
        if e % 10000 == 0:
            timenow = time.strftime("%Y-%m-%d_%H-%M-%S")
            savepath = path + "/data/agent"+agent.output_name+timenow+"_Epi"+str(e)
            agent.save(savepath)


episodes: 0
episodes: 1
episodes: 2
episodes: 3
episodes: 4
episodes: 5
episodes: 6
episodes: 7
episodes: 8
episodes: 9
episodes: 10
episodes: 11
episodes: 12
episodes: 13
episodes: 14
episodes: 15
episodes: 16
episodes: 17
episodes: 18


KeyboardInterrupt: 