# Final Project - 2048 Solver with Expectimax Algorithm

## Alexandra Pukhova

### Rules of 2048

2048 is a single-player game, (normally) played on a 4x4 board. At initiation, two out of 16 tiles are occupied with blocks. These blocks have values of either 2 or 4 (i.e. two 2s, two 4s, or one of each kind). The player has to slide all exisiting blocks across the board in one direction. Once they pick a direction, all the tiles move that way until they reach an obstacle, i.e. hit another tile or the edge of the board. If there are two tiles of the same value on the way, they will merge and become a tile whose value is the combined value of the two colliding tiles. After the player's move, a new block will randomly appear in an empty spot with a value of either 2 or 4. Once the player reaches the value of 2048 in one of the blocks, they win and can either stop or continue to play in an attempt to reeach a higher value. In this implementation of the game, 2048 is the terminal state. Thus, the game ends when there are no possible slides left or when the terminal state is reached.

### AI Solver

Since the adversary is not optimal in this case - the computer randomly generates the tiles on the board after every move a player makes, I used expectimax search to model the unpredictable behavior of the "opponent". In comparison with minimax, the min nodes, in this case, take "random steps". In expectimax, we perform optimization over all possible moves, based on their utility. A maximization step is followed by an expectation step that uses the probability of new tiles appearing (in this case, 90% for 2s and 10% for 4s).


One shortcoming of the expectimax algorithm is the unavailability of alpha-beta pruning, given that the adversary is not playing optimally. As a result, expectimax suffers from suboptimal speed, so we have to use depth-limited search in order to compensate for it.

### Heuristics

The following heuristics are used in this implementation of expectimax to compute the score for each configuration. A combination of those scores is then used to determine the optimal steps for the max nodes to take.

1. **Monotonicity**: This heuristic rewards ordering of tiles in descending order in rows and columns. It also gives extra points to configurations, where the maximum value tile is located in the leftmost corner which is aligned with the descending order goal.
2. **Smoothness**: This heuristic rewards small value differences in neighbouring cells.

The two heuristics are most effective when applied together: whereas the monotonicity heurisitc ensures that the tiles are well-positioned for merging, i.e. organized in a descending value order, the smoothness heuristic ensures that the values are as close as possible to each other, so that merging can actually happen.

In [1]:
# Make the necessary imports

import numpy as np
import random
from numpy.random import choice
import copy
import heapq
import itertools
from itertools import accumulate
import time
import operator
from operator import mul
import math
from math import log
from cachetools import cached
from IPython.display import clear_output
import cProfile

In [2]:
class Board:
    
    '''
    This is a node class. It captures the state of
    the 2048 grid and enables the user to print
    the grid state. It also has the methods that
    allow us to check whether two board configurations
    are equal, to check whether there are any moves
    possible left on the board, and to insert new tiles.
    
    Inputs
    
        n (int) Single grid dimention.
            Default: 4.
        
        state (arr) Current state of the board of dimentions n-by-n.
            Default: None.
    '''
    
    def __init__(self, n=4, state=None):
        self.n = n 
        
        # Originating the board state if none given
        if state is None:
            self.state = np.zeros((self.n,self.n),dtype = int)
            while np.count_nonzero(self.state) < 2:
                self.insert_tile()
        else:
            self.state = state
    
    def __str__(self):     
        '''A printing method.'''
        return print(*self.state, sep='\n')
    
    def equal(self, other):
        '''A method for checking whether the two board states are the same.'''
        return(np.array_equal(self.state, other.state))
    
    def can_move(self):
        
        n = len(self.state)
        truth_vals = []
        
        # Checking for a zero tile
        for row in range(n):
            for i in range(n):
                if self.state[row,i]==0:
                    return True
                else:
                    # If no zero tiles, check whether there are neighboring tiles of same value
                    indices = [[0,-1],[0,1],[-1,0],[1,0]]
                    neighbour_vals = []

                    for sublist in indices:
                        new_pos = [row+sublist[0],i+sublist[1]]
                        if ((0<=new_pos[0] and new_pos[0]<=(n-1)) and (0<=new_pos[1] and new_pos[1]<=(n-1))):
                            neighbour_vals.append(self.state[row+sublist[0],i+sublist[1]])
                    if any(x==self.state[row,i] or x==0 for x in neighbour_vals):
                        truth_vals.append(True) 
                    else: truth_vals.append(False)
        return any(i for i in truth_vals)
    
    def is_terminal_state(self):
        return(2048 in self.state)
    
    def insert_tile(self, added=False):
        '''A method that randomly inserts a new tile after each move.'''
        
        if added: return
        
        # if no empty tiles left
        if (self.n*self.n-np.count_nonzero(self.state))==0: return
    
        coords = [int(np.random.choice(self.n, 1)), int(np.random.choice(self.n, 1))]
        if self.state[coords[0],coords[1]]==0:
            self.state[coords[0],coords[1]]=choice([2, 4],p=[0.9,0.1])
            added = True
        self.insert_tile(added)

### Step & Merge functions

In [3]:
def merge(lst, direction):
    n=len(lst)
    lst = list(filter(lambda a: a != 0, lst))
    merge_flag = False

    if len(lst)==0: # if no tiles in this row/column
        return list(np.zeros(n, dtype='int'))
    
    if direction >0: # if moving down/right, reverse lst
        lst = lst[::-1] 
    i=0
    while i<len(lst)-1:
        if lst[i]==lst[i+1]:
            lst[i]*=2 # new tile w/ 2x value
            del lst[i+1] # delete the old value
            merge_flag = True
        i+=1
    arr_zeros = list(np.zeros(n-len(lst), dtype='int'))
    
    if direction >0:
        if merge_flag:
            lst = lst
        lst = lst + arr_zeros
        lst = lst[::-1]
    else:
        lst = lst + arr_zeros
    return lst

In [4]:
def step(current_state, direct):
    n = len(current_state)
    direction = np.asarray(direct)
    move_axis = int(np.where(direction!= 0)[0])
    next_state = np.zeros((n,n),dtype = int)

    if not move_axis: # if 0, move up/down
        cols = range(0,n) 
        for col in cols:
            lst = current_state[:, col]
            next_state[:, col] = merge(lst, direction[move_axis])
    
    else: # move left/right
        rows = range(0,n)
        for row in rows:
            lst = current_state[row, :]
            next_state[row, :] = merge(lst, direction[move_axis])

    return(next_state)

### Expectimax

In [5]:
DEPTH_LIMIT = 2 
# The total depth = 2 x DEPTH_LIMIT, since both max and exp steps will be executed twice.
# in this case, the depth is 4. 

def max_value(state, curr_depth=0):
    '''The maximization step.'''
    val = -np.inf
    
    potential_moves = {'up':[-1,0],'down':[1,0],'right':[0,1],'left':[0,-1]}
    next_states_temp = [step(state, potential_moves[direct]) for direct in potential_moves] # all next states
    
    # Exclude impossible next_states
    next_states = [k for k in next_states_temp if not np.array_equal(state,k)]
    
    curr_depth +=1
    bestmove = state
    
    if len(next_states) == 0:
        return (0, bestmove) # If we reach a leaf node (impossible state), return val = 0 instead of -inf
        
    for next_state in next_states:
        candidatevalue, pos_state = exp_value(next_state,curr_depth)
        if candidatevalue > val: 
            val = candidatevalue
            bestmove = pos_state
    return (val, bestmove)

def exp_value(state, curr_depth):
    '''The expectation step.'''
    
    if curr_depth == DEPTH_LIMIT or (2048 in state):
        return (h_sum(state), state)
    
    val = 0
    zero_positions = np.argwhere(state == 0) # get zero positions

    for pos in zero_positions:
        state_copy = copy.deepcopy(state)
        state_copy[pos[0],pos[1]]=2
        val += max_value(state_copy,curr_depth)[0]*0.9
        state_copy[pos[0],pos[1]]=4
        val += max_value(state_copy,curr_depth)[0]*0.1
    return (val, state)

In [6]:
class Game():
    '''
    This is a game class that initiates the
    board of class Board() and uses expectimax 
    algorithm until the board reaches the winning state
    or an unmovable board configuration.
    '''
    
    def __init__(self):
        self.board = Board()
        
    def play(self):
        start = time.time()
        
        while not self.board.is_terminal_state():
            
            clear_output()            
            print(self.board.state, flush=True)
            
            if not self.board.can_move():
                finish = time.time()
                print('Time taken: ', (finish-start))
                return('Game over :(')
                break
            
            # expectimax step - returns the next state after the optimal move
            move = max_value(state=self.board.state)[1]
            
            # update the board state
            self.board.state = move
    
            # randomly insert new tile
            self.board.insert_tile()
            
        finish = time.time()
        
        print('Time taken to achieve the goal: ', (finish-start))
        print(self.board.state)
        return('You won!')

### Heuristics

In [7]:
# Using @cached decorator to memoize the outputs of the heuristics
# and optimize the run time

@cached(cache={}, key=lambda state: tuple(state.flatten()))
def h_monoton(state):
    '''Ensures that the tile values are in an ascending/descending order.'''
    lin_board = (state).flatten()
    logs = [log(i,2) if i !=0 else 0 for i in lin_board]
    gp = list(accumulate([2]*(len(lin_board)), mul))[::-1] # geometric sequence w/ common ratio 2
    
    # Compute the dot product of board values and gp.
    # I took logs with base two for the board values,
    # given the high magnitude of pure values.
    # Took the ordination idea from (Pezzotti, 2014).
    m_val = np.dot([logs], gp)/np.count_nonzero(state)
    
    max_corner = 0
    if max(lin_board) == lin_board[0]: 
        max_corner = (log(lin_board[0],2) * gp[0])
    
    corner_2 = 0
    if max_corner != 0:
        corner_2 = (log(lin_board[0],2) * gp[0])
    
    if max_corner == 0:
        max_corner += 1
    return m_val, max_corner, corner_2
    
@cached(cache={}, key=lambda state: tuple(state.flatten()))
def h_smoothness(state):
    '''Minimizes the difference between adjacent cells.'''
    
    sum_per_row = 0
    sum_per_col = 0

    for idx in range(len(state)):
        row = [log(i,2) if i!=0 else 0 for i in state[idx]]
        sum_per_row += sum(np.abs(np.diff(row)))

        col = [log(i,2) if i!=0 else 0 for i in state[:,idx]]
        sum_per_col += sum(np.abs(np.diff(col)))

    total_sum = sum_per_row + sum_per_col
    return total_sum

def h_sum(state):
    '''Sums all the heuristic values per configuration.'''
    smooth_val = h_smoothness(state)
    smoothness = 1/smooth_val if smooth_val else 0
    mon = h_monoton(state)
    
    # The weights for the sum were chosen arbitrarily,
    # based on several runs and their outcomes. In theory,
    # we could run Monte Carlo simulations to find the optimal values.
    # However, it would be very computationally expensive.
    return mon[0]*0.4 + log(mon[1]) + mon[2] + smoothness

### Testing performance

In [8]:
# Prints out the last two steps if winning configuration

g = Game()
g.play()

# cProfile.run("g.play()", sort=True)

[[1024    0    0    0]
 [1024   32    2    0]
 [  64    8    2    4]
 [   8    2   64    2]]
Time taken to achieve the goal:  29.449549198150635
[[2048   32    4    4]
 [  64    8   64    2]
 [   8    2    0    0]
 [   0    0    0    2]]


'You won!'

### Potential Improvements

* **Speed improvement**. The time complexity of the algorithm increases exponentially, making even the depth 6 algorithm considerably slower than the depth 4 one. Thus, implementing some time speeding techniques, using dynamic programming in as many steps of the game as possible would be preferable. The time complexity for DP will be given by the number of unique states multiplied by the amount of time, taken to compute each unique state.

    * When I ran depth 6, the algorithm reached the winning configuration much more frequently, however, given the limited computational power (plus exponentially increasing time), I chose to leave depth 4 for the game demostration purposes. Ideally, I would comprate the percentage of times when the game reached the winning state with a range of different depths. Such an experiment, however, would be very computationally expensive.
        

* **Reinforcement learning**. One curious improvement that I read about is reinforcement learning. This application does not require the use of heuristics. Instead, the algorithm 'learns' the utility of certain steps overtime, i.e. as the time passed since the start increases. The problem with that implementation is that it requires a lot of memory and calls for extra efficient sorting methods (Lui, Prost & Lee, 2017).

### References

1. CS188 Spring 2014. (2014). Lecture 7: Expectimax [Video file]. Retrieved from https://www.youtube.com/watch?v=jaFRyzp7yWw


2. Lui, M. H., Prost, A., Lee, C. W. (2017). COMP3211 Final Project Report: Solving the Game ‘2048’ Using Various Machine-Learning Agents. Hong Kong University of Science and Technology. Retrieved from https://home.cse.ust.hk/~yqsong/teaching/comp3211/projects/2017Fall/G13.pdf


3. Pozzotti, N. (2014). An Artificial Intelligence for the 2048 game. Retrieved from https://nicola17.github.io/2014/03/26/an-artificial-intelligence-for-2048-game.html


4. Shoaran, M. (n.d.). Expectimax Search. University of Victoria. Retrieved from 
https://web.uvic.ca/~maryam/AISpring94/Slides/06_ExpectimaxSearch.pdf


5. Xiao, R. (2014). What is the optimal algorithm for the game 2048?. Stack Overflow. Retrieved from https://stackoverflow.com/a/22498940