<a href="https://colab.research.google.com/github/bdadeveloper1/MachineLearningProjects/blob/main/2048GameFinalTrainingData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#World Data Science Institute Internship
#2048 Game Implementation using Deep Reinforcement Learning (Training Data Creation)

By Brandon Oppong-Antwi



---


Important Notes on Installations:

Python Version: Python 3.8.3

Tensorflow Version: Tensorflow 2.3.0

Operating System: Windows 10 

It is important to install the correct version of Tensorflow and Python for optimal GPU support in order to run this program. Additional Installation and GPU support can be found at: https://www.tensorflow.org/install/gpu



---



Note: For testing if you would like the data to formulate quick data that is good enough for testing you can use an M = 1000. I used an M = 200001 which in turn completed the game to 2048. This took more than two days to complete on my Lenovo.


### Required Imported Libraries

---



In [None]:
import tensorflow as tf
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior() 
import numpy as np
from copy import deepcopy #keep memory for dictionary of objects
import matplotlib.pyplot as plt
import random
import math
from google.colab import files

Instructions for updating:
non-resource variables are not supported in the long term


### Game Theory and Logic

For those who have not played, 2048 is a game that is played on a 4 x 4 grid where the goal is for the the user to merge and collide the tiles on the board until they reach the desired final result 2048.

The user  has the option to  move in the four cardinal directions and after every move a new tile is generated randomly in the grid which is either numbered 2 or 4 with a probability of about 0.90 or 0.10 respectively. A move is legal if at least one tile can be slid into an empty spot or if the tiles
can be combined in the chosen direction. The game ends when the user does not have any legal moves left. 

In [None]:
#Initialize and start a new game, this will create the empty grid environment with a 2 or 4 initially in the grid.

def newGame(n): #new game
    matrix = np.zeros([n,n]) #grid will be represented in the form a matrix
    return matrix

#add two or 4 in the matrix. This is the initial start of the game where only two or 4 appear. Each new game instance will start with a 2 or 4
def addTwo(mat): 
    emptyTiles = []
    for i in range(len(mat)):
        for j in range(len(mat[0])):
            if(mat[i][j]==0):
                emptyTiles.append((i,j)) #append the array of empty cells to include the values
    if(len(emptyTiles)==0):
        return mat
    
    indexPair = emptyTiles[random.randint(0,len(emptyTiles)-1)]
    
    probability = random.random() #allow the probabilities to be randomly generated with 90% change of getting a 2 and 10% chance of getting 4
    if(probability>=0.9):
        mat[indexPair[0]][indexPair[1]]=4
    else:
        mat[indexPair[0]][indexPair[1]]=2
    return mat

#check the state of the game and where the game is currently at
def gameState(mat):
    #if 2048 in mat:
    #    return 'win'
    
    for i in range(len(mat)-1): #intentionally reduced to check the row on the right and below
        for j in range(len(mat[0])-1): #more elegant to use exceptions but most likely this will be the solution
            if mat[i][j]==mat[i+1][j] or mat[i][j+1]==mat[i][j]:
                return 'not done'
            
    for i in range(len(mat)): #check for any zero entries
        for j in range(len(mat[0])):
            if mat[i][j]==0:
                return 'not done'
            
    for k in range(len(mat)-1): #to check the left/right entries on the last row
        if mat[len(mat)-1][k]==mat[len(mat)-1][k+1]:
            return 'not done'
        
    for j in range(len(mat)-1): #check up/down entries on last column
        if mat[j][len(mat)-1]==mat[j+1][len(mat)-1]:
            return 'not done'
        
    return 'game over' # returns lose if the desire result is not acquired


#Game Functionality

def transpose(mat): #Simulation for tiles changing place with one another(Transposing)
    new=[]
    for i in range(len(mat[0])):
        new.append([])
        for j in range(len(mat)):
            new[i].append(mat[j][i])
            
    return np.transpose(mat)

def reverse(mat): #Simulation for if the user went in the reverse direction
    new=[]
    for i in range(len(mat)):
        new.append([])
        for j in range(len(mat[0])):
            new[i].append(mat[i][len(mat[0])-j-1])
    return new

def coverUp(mat): #Simulation for covering Up the tiles when changed in the grid
    new = [[0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]]
    done = False
    for i in range(4):
        count = 0
        for j in range(4):
            if mat[i][j]!=0:
                new[i][count] = mat[i][j]
                if j!=count:
                    done=True
                count+=1
    return (new,done)


def merge(mat): #Simulation for merging the Tiles together... Important so that tile can be changed
    done=False
    score = 0
    for i in range(4):
        for j in range(3):
            if mat[i][j]==mat[i][j+1] and mat[i][j]!=0:
                mat[i][j]*=2
                score += mat[i][j]   
                mat[i][j+1]=0
                done=True
    return (mat,done,score)

##### Game Controls
*   User instance directions that will reflect game functionality methods from earlier
*   Finds the Empty Cell Function that will be used in the reward

*   Convert the input values
*   List item









In [None]:


#Movement for the up direction
def up(game):
        game = transpose(game)
        game,done = coverUp(game)
        temp = merge(game)
        game = temp[0]
        done = done or temp[1]
        game = coverUp(game)[0]
        game = transpose(game)
        return (game,done,temp[2])

#Movement for the down direction
def down(game):
        game=reverse(transpose(game))
        game,done=coverUp(game)
        temp=merge(game)
        game=temp[0]
        done=done or temp[1]
        game=coverUp(game)[0]
        game=transpose(reverse(game))
        return (game,done,temp[2])

#Movement for the left direction
def left(game):
        game,done=coverUp(game)
        temp=merge(game)
        game=temp[0]
        done=done or temp[1]
        game=coverUp(game)[0]
        return (game,done,temp[2])

#Movement for the right direction
def right(game):
        game=reverse(game)
        game,done=coverUp(game)
        temp=merge(game)
        game=temp[0]
        done=done or temp[1]
        game=coverUp(game)[0]
        game=reverse(game)
        return (game,done,temp[2])

In [None]:
controls = {0:up,1:left,2:right,3:down}

In [None]:
#convert the input game matrix into corresponding power of 2 matrix.
def changeValues(X):
    powerMatrix = np.zeros(shape=(1,4,4,16),dtype=np.float32)
    for i in range(4):
        for j in range(4):
            if(X[i][j]==0):
                powerMatrix[0][i][j][0] = 1.0
            else:
                power = int(math.log(X[i][j],2))
                powerMatrix[0][i][j][power] = 1.0
    return powerMatrix        

#find and keep track of the the number of empty cells in the game matrix that are still remaining.
def findemptyCell(mat):
    count = 0
    for i in range(len(mat)):
        for j in range(len(mat)):
            if(mat[i][j]==0):
                count+=1
    return count

Hyper Parameters used for training data

Hyper-Parameterization: Tuned hyperparameters play a large role in eliciting the best results for the algorithm. Hyperparameters are used before training and will directly control the behavior of the training. Choosing the most efficient Hyper parameters, which I have put below play an integral role in the success of the neural network architecture.The two most important are the learning rate and the network size which will effect the speed of the network and how many layers are in our network.


In [None]:
#Learning rate- Can edit depending, but this is a good learning rate in that it will not overshoot the data and give a reasonable amount of epochs.
learningRate = 0.0005

#gamma for Q-learning-Gamma (γ) is a number between [0,1] and its used to discount the reward as the time passes
gamma = 0.9

#epsilon greedy approach
epsilon = 0.9

#to store states and lables of the game for training
#states of the game
replayMemory = list()

#labels of the states
replayLabels = list()

#capacity of memory
mem_capacity = 6000

Network Architecture

---





Convolution Neural Network Architecture

In this section, we will use front propogation with our policy network.The  Policy Network controlling the actions in 2048. We explain the
game playing with front-propagation algorithm and we use the  exploration process to  an ϵ-greedy algorithm. 


![network architecture](https://github.com/navjindervirdee/2048-deep-reinforcement-learning/raw/master/Architecture/Architecture.JPG?raw=true)




In [None]:
#first convolution layer depth- setting the number of hidden units larger than the number of inputs tends to enable better results in number of tasks
firstDepth = 128

#second convolution layer depth- Convolutional Neural Networks (CNN) tend to perform better with the amount of layers added.In order for this model to run well a 3 layer network consiting of
#two hidden layers makes this optimal
secondDepth = 128

#batch size for batch gradient descent- as an effect on the resource requirements of the training process, speed and number of iterations in a non-trivial way.
batchSize = 512

#input units
input_units = 16

#fully connected layer neurons
hidden_units = 256

#output neurons = number of moves
output_units = 4

Activation Function and Optimizers


*   Activation - RELU
  
    Following each of the two hidden layers, we use the ReLu h : x → (x) activation function which will help to ensure that we can get non vanishing gradients.

![picture](https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcS8qyQ46SFPNxoXPQWJKJ8fMA7rnSA3jWOQog&usqp=CAU)


*  Optimizer - RMSPRop

    This optimizer is a gradient based technique which  balances the step size  (momentum),  decreasing the step for large gradients to avoid exploding, and increasing the step for small gradients to avoid vanishing. 

![picture](https://miro.medium.com/max/1240/1*Y2KPVGrVX9MQkeI8Yjy59Q.gif)




In [None]:
#input data
tfBatchDataset = tf.placeholder(tf.float32,shape=(batchSize,4,4,16))
tfBatchLabels  = tf.placeholder(tf.float32,shape=(batchSize,output_units))

datasetSingle   = tf.placeholder(tf.float32,shape=(1,4,4,16))


#CONV LAYERS
#conv layer1 weights
conv1_layer1_weights = tf.Variable(tf.truncated_normal([1,2,input_units,firstDepth],mean=0,stddev=0.01))
conv2_layer1_weights = tf.Variable(tf.truncated_normal([2,1,input_units,firstDepth],mean=0,stddev=0.01))

#conv layer2 weights
conv1_layer2_weights = tf.Variable(tf.truncated_normal([1,2,firstDepth,secondDepth],mean=0,stddev=0.01))
conv2_layer2_weights = tf.Variable(tf.truncated_normal([2,1,firstDepth,secondDepth],mean=0,stddev=0.01))



#FUllY CONNECTED LAYERS
expand_size = 2*4*secondDepth*2 + 3*3*secondDepth*2 + 4*3*firstDepth*2
fc_layer1_weights = tf.Variable(tf.truncated_normal([expand_size,hidden_units],mean=0,stddev=0.01))
fc_layer1_biases = tf.Variable(tf.truncated_normal([1,hidden_units],mean=0,stddev=0.01))
fc_layer2_weights = tf.Variable(tf.truncated_normal([hidden_units,output_units],mean=0,stddev=0.01))
fc_layer2_biases = tf.Variable(tf.truncated_normal([1,output_units],mean=0,stddev=0.01))


#model
def model(dataset):
    #layer1
    conv1 = tf.nn.conv2d(dataset,conv1_layer1_weights,[1,1,1,1],padding='VALID') 
    conv2 = tf.nn.conv2d(dataset,conv2_layer1_weights,[1,1,1,1],padding='VALID') 
    
    #layer1 relu activation
    relu1 = tf.nn.relu(conv1)
    relu2 = tf.nn.relu(conv2)
    
    #layer2
    conv11 = tf.nn.conv2d(relu1,conv1_layer2_weights,[1,1,1,1],padding='VALID') 
    conv12 = tf.nn.conv2d(relu1,conv2_layer2_weights,[1,1,1,1],padding='VALID') 

    conv21 = tf.nn.conv2d(relu2,conv1_layer2_weights,[1,1,1,1],padding='VALID') 
    conv22 = tf.nn.conv2d(relu2,conv2_layer2_weights,[1,1,1,1],padding='VALID') 

    #layer2 relu activation
    relu11 = tf.nn.relu(conv11)
    relu12 = tf.nn.relu(conv12)
    relu21 = tf.nn.relu(conv21)
    relu22 = tf.nn.relu(conv22)
    
    #get shapes of all activations
    shape1 = relu1.get_shape().as_list()
    shape2 = relu2.get_shape().as_list()
    
    shape11 = relu11.get_shape().as_list()
    shape12 = relu12.get_shape().as_list()
    shape21 = relu21.get_shape().as_list()
    shape22 = relu22.get_shape().as_list()

    #expansion
    hidden1 = tf.reshape(relu1,[shape1[0],shape1[1]*shape1[2]*shape1[3]])
    hidden2 = tf.reshape(relu2,[shape2[0],shape2[1]*shape2[2]*shape2[3]])
    
    hidden11 = tf.reshape(relu11,[shape11[0],shape11[1]*shape11[2]*shape11[3]])
    hidden12 = tf.reshape(relu12,[shape12[0],shape12[1]*shape12[2]*shape12[3]])
    hidden21 = tf.reshape(relu21,[shape21[0],shape21[1]*shape21[2]*shape21[3]])
    hidden22 = tf.reshape(relu22,[shape22[0],shape22[1]*shape22[2]*shape22[3]])

    #concatenation
    hidden = tf.concat([hidden1,hidden2,hidden11,hidden12,hidden21,hidden22],axis=1)

    #full connected layers
    hidden = tf.matmul(hidden,fc_layer1_weights) + fc_layer1_biases
    hidden = tf.nn.relu(hidden)

    #output layer
    output = tf.matmul(hidden,fc_layer2_weights) + fc_layer2_biases
    
    #return output
    return output

#for single example
single_output = model(datasetSingle)

#for batch data
logits = model(tfBatchDataset)

#loss
loss = tf.square(tf.subtract(tfBatchLabels,logits))
loss = tf.reduce_sum(loss,axis=1,keep_dims=True)
loss = tf.reduce_mean(loss)/2.0

#optimizer
global_step = tf.Variable(0)  # count the number of steps taken.
learning_rate = tf.train.exponential_decay(float(learningRate), global_step, 1000, 0.90, staircase=True)
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss, global_step=global_step)

Instructions for updating:
keep_dims is deprecated, use keepdims instead
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


In [None]:
#loss
J = []

#scores
scores = []

#to store final parameters
final_parameters = {}

#number of episodes- I am stating that an episode is one a sequence of states, actions and rewards, which ends with terminal state, so think one pass for the game.
#This M number is important in order to gain all possible solutions for training
M = 100

### Create Training Data Set

The reward algorithm fucntion will be as follows 

Q(reward) = number of merges + log(new max,2)

In [None]:
with tf.Session() as session:
    tf.global_variables_initializer().run()
    print("Initialized")
    
    global epsilon
    global replayLabels
    global replayMemory

    #for episode with max score
    maximum = -1
    episode = -1
    
    #total_iters 
    total_iters = 1
    
    #number of back props
    back=0
    
    for ep in range(M):
        global board
        board = newGame(4)
        addTwo(board)
        addTwo(board)
        
        #whether episode finished or not
        finish = 'not done'
        
        #total_score of this episode
        total_score = 0
        
        #iters per episode
        local_iters = 1
        
        while(finish=='not done'):
            prev_board = deepcopy(board)
            
            #gets the required move for this state
            state = deepcopy(board)
            state = changeValues(state)
            state = np.array(state,dtype = np.float32).reshape(1,4,4,16)
            feed_dict = {datasetSingle:state}
            control_scores = session.run(single_output,feed_dict=feed_dict)
            
            #find the move with max Q value
            control_buttons = np.flip(np.argsort(control_scores),axis=1)
            
            #copy the Q-values as labels
            labels = deepcopy(control_scores[0])
            
            #generate random number for epsilon greedy approach
            num = random.uniform(0,1)
            
            #store prev max
            prev_max = np.max(prev_board)
            
            #num is less epsilon generate random move
            if(num<epsilon):
                #find legal moves
                legal_moves = list()
                for i in range(4):
                    temp_board = deepcopy(prev_board)
                    temp_board,_,_ = controls[i](temp_board)
                    if(np.array_equal(temp_board,prev_board)):
                        continue
                    else:
                        legal_moves.append(i)
                if(len(legal_moves)==0):
                    finish = 'lose'
                    continue
                
                #generate random move.
                con = random.sample(legal_moves,1)[0]
                
                #apply the move
                temp_state = deepcopy(prev_board)
                temp_state,_,score = controls[con](temp_state)
                total_score += score
                finish = gameState(temp_state)
                
                #get number of merges
                empty1 = findemptyCell(prev_board)
                empty2 = findemptyCell(temp_state)
                
                if(finish=='not done'):
                    temp_state = addTwo(temp_state)

                board = deepcopy(temp_state)

                #get next max after applying the move
                next_max = np.max(temp_state)
                
                #reward math.log(next_max,2)*0.1 if next_max is higher than prev max
                labels[con] = (math.log(next_max,2))*0.1
                
                if(next_max==prev_max):
                    labels[con] = 0
                
                #reward is also the number of merges
                labels[con] += (empty2-empty1)
                
                #get the next state max Q-value
                temp_state = changeValues(temp_state)
                temp_state = np.array(temp_state,dtype = np.float32).reshape(1,4,4,16)
                feed_dict = {datasetSingle:temp_state}
                temp_scores = session.run(single_output,feed_dict=feed_dict)
                    
                max_qvalue = np.max(temp_scores)
                
                #final labels add gamma*max_qvalue
                labels[con] = (labels[con] + gamma*max_qvalue)
            
            #Generates the maximum predicted move
            else:
                for con in control_buttons[0]:
                    prev_state = deepcopy(prev_board)
                    
                    #apply the LEGAl Move with max q_value
                    temp_state,_,score = controls[con](prev_state)
                    
                    #if illegal move label = 0
                    if(np.array_equal(prev_board,temp_state)):
                        labels[con] = 0
                        continue
                        
                    #Calculates the number of merges the computer took
                    empty1 = findemptyCell(prev_board)
                    empty2 = findemptyCell(temp_state)

                    
                    temp_state = addTwo(temp_state)
                    board = deepcopy(temp_state)
                    total_score += score

                    next_max = np.max(temp_state)
                    
                    #Rewrd process
                    labels[con] = (math.log(next_max,2))*0.1
                    if(next_max==prev_max):
                        labels[con] = 0
                    
                    labels[con] += (empty2-empty1)

                    #get next max qvalue
                    temp_state = changeValues(temp_state)
                    temp_state = np.array(temp_state,dtype = np.float32).reshape(1,4,4,16)
                    feed_dict = {datasetSingle:temp_state}
                    temp_scores = session.run(single_output,feed_dict=feed_dict)

                    max_qvalue = np.max(temp_scores)

                    #final labels
                    labels[con] = (labels[con] + gamma*max_qvalue)
                    break
                    
                if(np.array_equal(prev_board,board)):
                    finish = 'lose'
            
            #decrease the epsilon value
            if((ep>10000) or (epsilon>0.1 and total_iters%2500==0)):
                epsilon = epsilon/1.005
                
           
            #change the matrix values and store them in memory
            prev_state = deepcopy(prev_board)
            prev_state = changeValues(prev_state)
            prev_state = np.array(prev_state,dtype=np.float32).reshape(1,4,4,16)
            replayLabels.append(labels)
            replayMemory.append(prev_state)
            
            
            #back-propagation
            if(len(replayMemory)>=mem_capacity):
                back_loss = 0
                batch_num = 0
                z = list(zip(replayMemory,replayLabels))
                np.random.shuffle(z)
                np.random.shuffle(z)
                replayMemory,replayLabels = zip(*z)
                
                for i in range(0,len(replayMemory),batchSize):
                    if(i + batchSize>len(replayMemory)):
                        break
                        
                    batch_data = deepcopy(replayMemory[i:i+batchSize])
                    batch_labels = deepcopy(replayLabels[i:i+batchSize])
                    
                    batch_data = np.array(batch_data,dtype=np.float32).reshape(batchSize,4,4,16)
                    batch_labels = np.array(batch_labels,dtype=np.float32).reshape(batchSize,output_units)
                
                    feed_dict = {tfBatchDataset: batch_data, tfBatchLabels: batch_labels}
                    _,l = session.run([optimizer,loss],feed_dict=feed_dict)
                    back_loss += l 
                    
                    print("Mini-Batch - {} Back-Prop : {}, Loss : {}".format(batch_num,back,l))
                    batch_num +=1
                back_loss /= batch_num
                J.append(back_loss)
                
                #store the parameters in a dictionary
                #In the network architecture I deleted the biases because they negatively affected my final ouput
                final_parameters['conv1_layer1_weights'] = session.run(conv1_layer1_weights)
                final_parameters['conv1_layer2_weights'] = session.run(conv1_layer2_weights)
                final_parameters['conv2_layer1_weights'] = session.run(conv2_layer1_weights)
                final_parameters['conv2_layer2_weights'] = session.run(conv2_layer2_weights)
                final_parameters['fc_layer1_weights'] = session.run(fc_layer1_weights)
                final_parameters['fc_layer2_weights'] = session.run(fc_layer2_weights)
                final_parameters['fc_layer1_biases'] = session.run(fc_layer1_biases)
                final_parameters['fc_layer2_biases'] = session.run(fc_layer2_biases)
                
                #number of back-props
                back+=1
                
                #make new memory 
                replayMemory = list()
                replayLabels = list()
                
            
            if(local_iters%400==0):
                print("Episode : {}, Score : {}, Iters : {}, Finish : {}".format(ep,total_score,local_iters,finish))
            
            local_iters += 1
            total_iters += 1
            
        scores.append(total_score)
        print("Episode {} finished with score {}, result : {} board : {}, epsilon  : {}, learning rate : {} ".format(ep,total_score,finish,board,epsilon,session.run(learning_rate)))
        print()
        
        if((ep+1)%1000==0):
            print("Maximum Score : {} ,Episode : {}".format(maximum,episode))    
            print("Loss : {}".format(J[len(J)-1]))
            print()
            
        if(maximum<total_score):
            maximum = total_score
            episode = ep
    print("Maximum Score : {} ,Episode : {}".format(maximum,episode))     

Initialized
Episode 0 finished with score 1364.0, result : lose board : [[  2.   4.   2.   4.]
 [  4.  16. 128.  16.]
 [  8.  64.  16.   4.]
 [  4.  32.   8.   2.]], epsilon  : 0.9, learning rate : 0.0005000000237487257 

Episode 1 finished with score 2480.0, result : lose board : [[2, 4.0, 8, 2], [4.0, 2.0, 64.0, 16.0], [8.0, 64.0, 16.0, 2.0], [4.0, 256.0, 2.0, 4.0]], epsilon  : 0.9, learning rate : 0.0005000000237487257 

Episode 2 finished with score 1152.0, result : lose board : [[  2.   8.  32.   4.]
 [  4.  16. 128.  16.]
 [  8.  32.  16.   8.]
 [  4.   2.   8.   2.]], epsilon  : 0.9, learning rate : 0.0005000000237487257 

Episode 3 finished with score 1656.0, result : lose board : [[  2.   4.  64.   2.]
 [ 16.   8.  32.   8.]
 [  4. 128.   8.  64.]
 [  2.   8.  16.   4.]], epsilon  : 0.9, learning rate : 0.0005000000237487257 

Episode 4 finished with score 660.0, result : lose board : [[ 2.  8.  2.  8.]
 [ 4.  2.  8. 64.]
 [ 2.  4. 32.  8.]
 [ 4.  2. 16. 32.]], epsilon  : 0.9,

**Store the training weight data in a file**

Make sure to save a path file in your computer and this is the file that you will use in the gamplay to call the data. 

In [None]:
path = r'C:\Users\brandono\2048-deep-reinforcement-learning\Weights'
weights = ['conv1_layer1_weights','conv1_layer2_weights','conv2_layer1_weights','conv2_layer2_weights','fc_layer1_weights','fc_layer1_biases','fc_layer2_weights','fc_layer2_biases']
for w in weights:
    flatten = final_parameters[w].reshape(-1,1)
    files = open(path + '\\' + w +'.csv','w')
    files.write('Sno,Weight\n')
    for i in range(flatten.shape[0]):
        files.write(str(i) +',' +str(flatten[i][0])+'\n') 
    files.close()
    print(w + " written!")

conv1_layer1_weights written!
conv1_layer2_weights written!
conv2_layer1_weights written!
conv2_layer2_weights written!
fc_layer1_weights written!
fc_layer1_biases written!
fc_layer2_weights written!
fc_layer2_biases written!


Works Cited and Additional Resources

Amar, J., &amp; Dedieu, A. (n.d.). Deep Reinforcement Learning for 2048. Retrieved August 6, 2020, from http://www.mit.edu/~amarj/files/2048.pdf


Science Institute, W. (n.d.). 2Deep Learning and FeedForward Networks (Simply Explained) .pptx. Retrieved August 07, 2020, from https://docs.google.com/presentation/d/e/2PACX-1vTLeO6zMBVQYICIoQLvVEf2IPJJ31H4vc15nCbuTeSDYib6bLpRCxo5I0iPD8yA5A/pub?start=false


Adaptation for code mainly from: <Navjinder Virdee > (<Jun 10, 2018>) <2048-deep-reinforcement-learning> (<Old-Code>) [<Conv-2048.ipynb>]. https://github.com/navjindervirdee/2048-deep-reinforcement-learning/blob/master/Code/Old%20Code/Conv-2048.ipynb.


Tjwei (Director). (n.d.). [Video file]. Retrieved August 09, 2020, from https://github.com/tjwei/2048-NN


