# Blackjack AI using Monte Carlo Simulation and Bayesian Statistics

### Part 1: Blackjack Strategy using Monte Carlo Simulation


In [327]:
#Part 1: First generate blackjack strategy using Monte Carlo Simulation. This is done by using bayesian statistics to sample the probability of winning given a certain hand. We make use of python package emcee to run simulations for sampling the probability of winning given a certain hand. We then use the generated strategy to play blackjack and evaluate the performance of the strategy and update the strategy using the results of the game. We then use the updated strategy to play the game again and repeat the process until we have a good strategy for playing blackjack. This process is Monte Carlo.

#Importing Libraries
import numpy as np
import random

#Defining the deck of cards, the value of the cards and evaluating the value of the hand
"""
Each card has a value. The value of the card is the number on the card. The value of the face cards is 10. The value of the ace is 11 or 1. The value of the ace is 11 if the total value of the hand is less than or equal to 21. If the total value of the hand is greater than 21, the value of the ace is 1. This value of the hand when the ace is 11 is called soft hand. The value of the hand when the ace is 1 is called hard hand. The value of the hand is the sum of the values of the cards in the hand.
"""
#Defining the deck of cards
def deck():
    deck = []
    for i in range(1, 14):
        for j in range(4):
            if i > 10:
                deck.append(10)
            else:
                deck.append(i)
    return deck

#Defining dealer's shoe. A standard blackjack game uses 6 decks of cards.
def dealer_shoe():
    shoe = []
    for i in range(6):
        shoe.extend(deck())
    random.shuffle(shoe)
    return shoe

#Define the value of the hand
def value(hand: list):
    values = []
    for i in hand:
        value = 0
        aces = 0
        for card in i:
            if card == 1:
                aces += 1
                value+=11
            else:
                value += card
        while value > 21 and aces:
            value -= 10
            aces -= 1
        values.append(value)
    return values



### Singular blackjack game

In [None]:
#Rules of the game
"""
Each blackjack game starts with the dealer with only one faced up card and one faced down card, while the player has two faced up cards. The player can see one of the dealer's cards. The player can either hit or stand. If the player hits, the player gets another card. If the player stands, the player does not get another card. The player can hit as many times as the player wants. 
The player wins if the value of the player's hand is greater than the value of the dealer's hand and the value of the player's hand is less than or equal to 21. The player automatically wins if the value of the player's hand is 21 when the player has two cards, this is called blackjack.
The player loses if the value of the player's hand is less than the value of the dealer's hand and the value of the dealer's hand is less than or equal to 21. The player also loses if the dealer has blackjack. The player also loses if the value of the player's hand is greater than 21.
The is a tie if the value of the player's hand is equal to the value of the dealer's hand.
"""

#Defining the rules of the game

#check_victory checks for victory of either the player or the dealer. This function returns 1 if the player wins, -1 if the dealer wins and 0 if there is a tie. This function does not check if the player has bust because the player will lose regardless of what the second card the dealer has or whatever cards the dealer would have drawn. This function also does not check for blackjack because the player will win regardless of what the second card the dealer has or whatever cards the dealer would have drawn. This function only checks for the value of the hand of the player and the dealer and returns the result of the game.
def check_victory(player, dealer_hand,results):
    player_hand_value = player.value
    dealer_hand_value = value(dealer_hand)[0]
    
    for i,ith_player_hand_value in enumerate(player_hand_value):
        if ith_player_hand_value>dealer_hand_value:
            results[i] = 1
        elif ith_player_hand_value<dealer_hand_value:
            results[i] = -1
        else:
            #the case there is a tie
            results[i] = 0
    return results

#Action of the player
"""
The player can have different actions, we define the actions of the player as follows:

1. Hit: The player gets another card.
2. Stand: The player does not get another card.
3. Double: The player doubles the bet and gets another card. The player can only double the bet if the player has two cards. The player cannot get another card after doubling the bet.
4. Split: The player splits the cards into two hands. The player can only split the cards if the player has two cards of the same value. The player can split the cards again if the player has two cards of the same value. The player has access to the other actions after splitting the cards.
5. Surrender: The player surrenders the game and loses half of the bet. The player can only surrender if the player has two cards. (This action is not implemented in this code)
"""

#Defining the hand class and the actions of the player

class playerHand:
    def __init__(self, hands, bets):
        self.hands = hands
        self.value = value(hands)
        self.bets = bets
    
    def hit(self, shoe, hand_index):
        self.hands[hand_index].append(shoe.pop()) #adds card to the hand_index-th hand
        self.value = value(self.hands) #updates the value of the hand
    
    def stand(self,hand_index):
        pass

    def double(self, shoe,hand_index):
        self.hands[hand_index].append(shoe.pop()) #adds card to the hand_index-th hand
        self.value = value(self.hands) #updates the value of the hand
        self.bets[hand_index] *= 2 #doubles the bet for the hand_index-th hand
    
    def split(self, shoe, hand_index): #splits the hand but do not play the game for each hand
        if self.hands[hand_index][0] == self.hands[hand_index][1]:  # Check for identical cards
            original_hand=[self.hands[hand_index][0], shoe.pop()] 
            new_hand = [self.hands[hand_index][1], shoe.pop()]
            self.hands[hand_index] = original_hand
            self.hands.insert(hand_index+1, new_hand)
            self.value = value(self.hands)
            self.bets.insert(hand_index+1, self.bets[hand_index])

        else:
            raise ValueError("Cannot split non-identical cards") 

#Defining the game function
def blackjack_game(player_hand, dealer_hand,action):
    SHOE = dealer_shoe()
    #initialise player's bankroll
    #Different from casino blackjack, the player starts with a bankroll of 0, because what we are interested in is the end result and to minimise areas for bugs, the bets are compiled and the player's win/loss is calculated at the end of the game.
    BANKROLL = 0
    #initialise player's bet
    #the bets are initialised like a vector and throughout the game, if there are splits, the bets are appended to the vector. then the results of each hand is compiled as another vector and the resulting win/loss is calculated at the end of the game.
    BETS = [1]
    

    #initialise player and dealer hands
    if player_hand in ['5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20']:
        player_value = int(player_hand)
        player_hand=[[player_value-2,2]]
    elif player_hand in ['a2','a3','a4','a5','a6','a7','a8','a9']:
        player_hand=[[1,int(player_hand[1])]]
    elif player_hand in ['aa']:
        player_hand=[[1,1]]
    elif player_hand in ['22','33','44','55','66','77','88','99']:
        player_hand=[[int(player_hand[0]),int(player_hand[1])]]
    else:
        player_hand=[[10,10]]
    
    if dealer_hand in ['2','3','4','5','6','7','8','9','10']:
        dealer_hand = [[int(dealer_hand), SHOE.pop()]]
    elif dealer_hand in ['a']:
        dealer_hand = [[1, SHOE.pop()]]

    #for debugging purposes
    # player_hand = [[10, 10]]
    # dealer_hand = [[5, SHOE.pop()]]

    #initialise player class
    player = playerHand(player_hand, BETS)

    
    #check for blackjack, the game ends if there is a blackjack
    blackjack = check_blackjack(player_hand, dealer_hand)
    if blackjack == 1: #Player blackjack, games end with player win
        win_loss = 1.5*BETS[0] #Blackjack pays 3:2, adjust the ratio according to the casino rules
        BANKROLL += win_loss
        return BANKROLL
    
    elif blackjack == -1: #Dealer blackjack, game ends with player loss
        win_loss = -BETS[0] #For dealer blackjack, the player loses the bet
        BANKROLL += win_loss
        return BANKROLL
    
    elif value(player_hand) == 21 and value(dealer_hand) == 21: #Player and dealer blackjack, game ends with a tie
        return BANKROLL #No win/loss for player and dealer blackjack
    
    play_game(player, SHOE, 0, action)

    #check player bust
    RESULTS = [1 for i in range(len(player.hands))]
    for i,ith_player_hand_value in enumerate(player.value):
        if ith_player_hand_value>21:
            RESULTS[i] = -1

    all_bust = all(element == -1 for element in RESULTS)
    if all_bust:
        BANKROLL = sum(np.array(RESULTS)*np.array(BETS))
        return BANKROLL
    
    #dealer draws
    dealer_hand = dealer_draws(dealer_hand, SHOE)

    #check dealer bust
    if value(dealer_hand)[0] > 21:
        BANKROLL = sum(np.array(RESULTS)*np.array(BETS))
        return BANKROLL

    #check for victory
    RESULTS = check_victory(player, dealer_hand, RESULTS)
    BANKROLL = sum(np.array(RESULTS)*np.array(BETS))

    return BANKROLL

#check for blackjack
def check_blackjack(player_hand, dealer_hand):
    player_value = value(player_hand)[0]
    dealer_value = value(dealer_hand)[0]
    #check if player has blackjack
    if player_value == 21 and dealer_value != 21 :
        blackjack = 1
        return blackjack
    #check if dealer has a blackjack
    elif dealer_value == 21 and player_value != 21:
        blackjack = -1
        return blackjack
    else:
        blackjack = 0
        return blackjack


#dealer plays
def dealer_draws(dealer_hand, shoe):
    dealer_value = value(dealer_hand)[0]
    while dealer_value < 17:
        dealer_hand[0].append(shoe.pop())
        dealer_value = value(dealer_hand)[0]
    return dealer_hand

#check for bust
def check_bust(hand):
    if sum(hand)>21:
        return True
    else:
        return False

#Defining the game function
def play_game(player, shoe, hand_index,action):
    
    
    #initialise move
    move = action
        
    if move == 'stand':
        return
    
    elif move == 'hit':
        player.hit(shoe, hand_index)

    elif move == 'double':
        player.double(shoe, hand_index)
        return

    elif move == 'split':
        player.split(shoe, hand_index)
        play_game(player, shoe, hand_index,'stand')
        play_game(player, shoe, hand_index+1,'stand')
        return
            
    # print("Player hand: ", player.hands)
    if player.value[hand_index]>21:
        return

#Creating strategy using Monte Carlo Simulation
#Defining the probability of winning given a certain hand
#Creating a matrix to store the probability of winning given a certain hand
#36 rows for the player's hand
#10 columns for the dealer's up card
#4 columns for the player's action (hit, stand, double, split)
strategy = np.zeros([36,10,4]) 

## Creating strategy using Monte Carlo Simulation

### Creating the functions from Bayesian Statistics

In [329]:

# Defining the probability of winning given a certain hand
# Creating a matrix to store the probability of winning given a certain hand
# 36 rows for the player's hand
# 10 columns for the dealer's up card
# 4 columns for the player's action (hit, stand, double, split)
# The first 18 rows are for the player's hand value
# The next 8 rows are for the player's hand value when the player has an ace except when the player has a pair of aces
# The last 10 rows are for the player's hand value when the player has a pair
rows = ['5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','a2','a3','a4','a5','a6','a7','a8','a9','22','33','44','55','66','77','88','99','1010','aa']
columns = ['2','3','4','5','6','7','8','9','10','a']

#Initialise the probability of winning given a certain hand. This is just a rough estimate of the probability of winning given a certain hand. The probability of winning given a certain hand is initialised with equal probability for each move the player can make.
def create_prior_strategy():
    prior_strategy = np.empty((34, 10), dtype=object)
    #initialise the probability of winning given a certain hand
    #We initialise the probability of winning each hand with equal probability for each move the player can make
    for i in range(34):
        for j in range(10):
            if i < 5:
                prior_strategy[i , j]  = [0.6,0.2,0.2,0]
            elif i < 7 and i>4:
                prior_strategy[i , j]  = [0.2,0.2,0.6,0]
            elif i>6 and i<16:
                prior_strategy[i , j]  = [1/3,1/3,1/3,0]
            elif i>15 and i<32:
                prior_strategy[i, j] = [1/4,1/4,1/4,1/4]
            elif i == 32:
                prior_strategy[i, j] = [0.2,0.6,0.2,0]
            elif i == 33:
                prior_strategy[i, j] = [0.1,0.1,0.1,0.7]
    return prior_strategy


# Defining the likelihood of winning given a certain hand. This acts as a reward/penalty function. This reward the simulation and penalises less for the simulation when it makes a good move.

likelihood_array = np.ones((16, 4))
likelihood_array_multiplier = np.ones((16, 4))
#invalid moves
likelihood_array[0,:]= [0,0,0,0]
likelihood_array[8,:]= [0,0,0,0]
for i in range(16):
    if i%2==1:
        likelihood_array[i,3]=0

#invalid moves
likelihood_array[0,:]= [0,0,0,0]
likelihood_array[8,:]= [0,0,0,0]
for i in range(16):
    if i%2==1:
        likelihood_array[i,3]=0

likelihood_array_multiplier = np.array([[0,0,0,0],
                                       [-0.1,0.2,-0.2,0],
                                       [-0.2,0.2,-0.3,-0.4],
                                       [-0.2,0.2,-0.3,0],
                                       [0.1,-0.1,-0.2,0.5],
                                       [0.2,-0.2,-0.3,0],
                                       [0.2,-0.1,-0.3,-0.4],
                                       [0.2,-0.1,-0.3,0],
                                       [0,0,0,0],
                                       [-0.1,0,0.1,0],
                                       [-0.3,0,-0.4,0],
                                       [-0.3,0.3,-0.4,0],
                                       [0.1,0,-0.1,0.5],
                                       [0.1,-0.1,0.3,0],
                                       [0.1,-0.1,-0.1,0.2],
                                       [-0.1,0.3,-0.3,0]]
                                       )

# Create the win and lose likelihood matrix for difference cases
win_likelihood_matrix=np.ones((16, 4))*likelihood_array + likelihood_array_multiplier
lose_likelihood_matrix=np.ones((16, 4))*likelihood_array - likelihood_array_multiplier

#Defining the likelihood function
def likelihood_function(player_hand:str,dealer_hand:str,result:int, action:str, win_prob):
    """
    This function calculates the likelihood of the data given the model.
    player_hand: The player's hand value from the list of rows
    dealer_hand: The dealer's up card value from the list of columns
    result: The result of the game. 1 for player win, -1 for player loss and 0 for push
    win_prob: The probability of winning given the player's hand and the dealer's up card. This value keeps changing until equilibrium is reached.
    """

    player_hand_index = int(rows.index(player_hand))
    dealer_hand_index = int(columns.index(dealer_hand))
    
    if player_hand in ['5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20']:
        player_hand_cards = [[int(player_hand)-2,2]]
    elif player_hand in ['a2','a3','a4','a5','a6','a7','a8','a9']:
        player_hand_cards = [[1,int(player_hand[1])]]
    elif player_hand in ['aa']:
        player_hand_cards = [[1,1]]
    elif player_hand in ['22','33','44','55','66','77','88','99']:
        player_hand_cards = [[int(player_hand[0]),int(player_hand[1])]]
    else:
        player_hand_cards = [[10,10]]
    if dealer_hand in ['2','3','4','5','6','7','8','9','10']:
        dealer_hand_cards = [[int(dealer_hand),random.randint(1,10)]]
    elif dealer_hand in ['a']:
        dealer_hand_cards = [[1,random.randint(1,10)]]

    if action == 'hit':
        action_index = int(0)
    elif action == 'stand':
        action_index = int(1)
    elif action == 'double':
        action_index = int(2)
    else:
        action_index = int(3)
    

    if result>0:
        result = 1
    elif result<0:
        result = -1
    else:
        #tie
        win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*0.5
        return win_prob[player_hand_index][dealer_hand_index][action_index]

    if player_hand in ['22','33','44','55','66','77','88','99','1010','aa']:
        can_split = True
    else:
        can_split = False

    if player_hand in ['a2','a3','a4','a5','a6','a7','a8','a9','aa']:
        has_ace = True
    else:
        has_ace = False

    if player_hand in ['17','18','19','20','a6','a7','a8','a9','1010']:
        player_strong = True
    else:
        player_strong = False

    if dealer_hand in ['7','8','9','10','a']:
        dealer_strong = True
    else:
        dealer_strong = False


    if dealer_strong and player_strong and has_ace and not can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[1, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[1, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]

    elif dealer_strong and player_strong and not has_ace and can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[2, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index,dealer_hand_index][action_index])*lose_likelihood_matrix[2, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif dealer_strong and player_strong and not has_ace and not can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[3, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[3, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif dealer_strong and not player_strong and has_ace and can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[4, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[4, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif dealer_strong and not player_strong and has_ace and not can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[5, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[5, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif dealer_strong and not player_strong and not has_ace and can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[6, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[6, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif dealer_strong and not player_strong and not has_ace and not can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[7, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index,dealer_hand_index][action_index] = (1-win_prob[player_hand_index,dealer_hand_index][action_index])*lose_likelihood_matrix[7, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif not dealer_strong and player_strong and has_ace and not can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[9, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[9, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif not dealer_strong and player_strong and not has_ace and can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[10, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[10, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif not dealer_strong and player_strong and not has_ace and not can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[11, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[11, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif not dealer_strong and not player_strong and has_ace and can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[12, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[12, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif not dealer_strong and not player_strong and has_ace and not can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[13, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[13, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif not dealer_strong and not player_strong and not has_ace and can_split:
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[14, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index,dealer_hand_index][action_index] = (1-win_prob[player_hand_index,dealer_hand_index][action_index])*lose_likelihood_matrix[14, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
    elif not dealer_strong and not player_strong and not has_ace and not can_split:
    
        if result == 1:
            win_prob[player_hand_index][dealer_hand_index][action_index] = win_prob[player_hand_index][dealer_hand_index][action_index]*win_likelihood_matrix[15, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        else:
            win_prob[player_hand_index][dealer_hand_index][action_index] = (1-win_prob[player_hand_index][dealer_hand_index][action_index])*lose_likelihood_matrix[15, action_index]
            return win_prob[player_hand_index][dealer_hand_index][action_index]
        
#Defining the posterior function

def posterior(theta,action,prior_strategy):
    player_hand, dealer_hand = theta
    player_hand_index = rows.index(player_hand)
    dealer_hand_index = columns.index(dealer_hand)

    #check validity of action
    can_split = player_hand in ['22','33','44','55','66','77','88','99','1010','aa']
    if not can_split and action == 'split':
        return 0
    
    game_outcome = blackjack_game(player_hand, dealer_hand,action)

    win_prob = likelihood_function(player_hand, dealer_hand, game_outcome, action, prior_strategy)
    
    return win_prob

### Metropolis-Hasting Algorithm

1. Initialise prior for state $s$
2. Calculate posterior, for state $s'$
3. If $P(s')>P(s)$, update state $s=s'$
4. Else, up state $s=s'$ only if $P(s')/P(s)>r$ where $r$ is a random number such that $0≤r≤1$
5. Repeat steps 1-4 until equilibrium has been reached

In [357]:
def MCMC (no_of_walkers,window_size,window_limit):
    """
    This function runs the MCMC simulation to generate the blackjack strategy

    no_of_walkers: The number of walkers in the MCMC simulation. This number indicates how many parallel runners are running the simulation. The higher the number of walkers, the more accurate the simulation but the longer the simulation takes to run.

    window_size: The number of samples to be generated before the MCMC simulation checks for equilibrium. The higher the window size, the more accurate the simulation but the longer the simulation takes to run.

    window_limit: The correlation coefficient limit for the MCMC simulation to check for equilibrium. The lower the correlation coefficient limit, the more accurate the simulation but the longer the simulation takes to run. 
    """

    
    prior_strategy = create_prior_strategy()
    #single iteration of the MCMC
    #example posterior(['5','2'],'hit')
    for action_index in range(4):
        action = ['hit','stand','double','split'][action_index]

        rows = ['5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','a2','a3','a4','a5','a6','a7','a8','a9','22','33','44','55','66','77','88','99','1010','aa']
        columns = ['2','3','4','5','6','7','8','9','10','a']

        # for player_hand in rows:
        #     for dealer_hand in columns:
        #         position = [player_hand, dealer_hand]
        #         moving_window = []

        #         posterior(position, action, prior_strategy)
        
        #test out for one hand and one action
        for player_hand in rows:
            for dealer_hand in columns:
                print(player_hand, dealer_hand)
                position = [player_hand, dealer_hand]

                counter = 0
                initial_probability = posterior(position, action, prior_strategy)
                if initial_probability == 0:
                    continue
                equilibrium = False
                walkers_ensemble = [[initial_probability] for i in range(no_of_walkers)]
                
                while equilibrium == False:
                    for walker in range(no_of_walkers):
                            
                        old_probability = prior_strategy[rows.index(position[0]), columns.index(position[1])][action_index]
                        if old_probability >10:
                            equilibrium = True
                        new_probability = posterior(position, action, prior_strategy)

                        if new_probability > old_probability:
                            walkers_ensemble[walker].append(new_probability)
                            
                        else:
                            acceptance_probability = new_probability/old_probability
                            if random.random() < acceptance_probability:
                                walkers_ensemble[walker].append(new_probability)
                            else:                    
                                walkers_ensemble[walker].append(old_probability)
                                prior_strategy[rows.index(position[0]), columns.index(position[1])][action_index] = old_probability
                        
                    
                    #check for equilibrium
                    if counter > window_size:
                        window = [sublist[-window_size:] for sublist in walkers_ensemble]
                        window_corr = np.corrcoef(window)[0,1]
                        if abs(window_corr) < window_limit:
                            equilibrium = True
                    else:
                        counter += 1
            
    return prior_strategy

In [385]:
strategy = MCMC(30,20,0.1)

5 2
5 3
5 4
5 5
5 6
5 7
5 8
5 9
5 10
5 a
11
6 2
6 3
6 4
6 5
6 6
6 7
6 8
6 9
6 10
6 a
16
7 2
7 3
7 4
7 5
7 6
7 7
7 8
7 9
7 10
7 a
10
8 2
8 3
8 4
8 5
8 6
8 7
8 8
8 9
8 10
8 a
10
9 2
9 3
9 4
9 5
9 6
9 7
9 8
9 9
9 10
9 a
6
10 2
10 3
10 4
10 5
10 6
10 7
10 8
10 9
10 10
10 a
2
11 2
11 3
11 4
11 5
11 6
11 7
11 8
11 9
11 10
11 a
5
12 2
12 3
12 4
12 5
12 6
12 7
12 8
12 9
12 10
12 a
10
13 2
13 3
13 4
13 5
13 6
13 7
13 8
13 9
13 10
13 a
5
14 2
14 3
14 4
14 5
14 6
14 7
14 8
14 9
14 10
14 a
7
15 2
15 3
15 4
15 5
15 6
15 7
15 8
15 9
15 10
15 a
7
16 2
16 3
16 4
16 5
16 6
16 7
16 8
16 9
16 10
16 a
21
17 2
17 3
17 4
17 5
17 6
17 7
17 8
17 9
17 10
17 a
21
18 2
18 3
18 4
18 5
18 6
18 7
18 8
18 9
18 10
18 a
21
19 2
19 3
19 4
19 5
19 6
19 7
19 8
19 9
19 10
19 a
21
20 2
20 3
20 4
20 5
20 6
20 7
20 8
20 9
20 10
20 a
21
a2 2
a2 3
a2 4
a2 5
a2 6
a2 7
a2 8
a2 9
a2 10
a2 a
8
a3 2
a3 3
a3 4
a3 5
a3 6
a3 7
a3 8
a3 9
a3 10
a3 a
2
a4 2
a4 3
a4 4
a4 5
a4 6
a4 7
a4 8
a4 9
a4 10
a4 a
7
a5 2
a5 3
a5 4
a5 5
a5 6
a5 7
a5 

Converting results into the optimal move

In [404]:
optimal_strategy = np.zeros((34,10),dtype=object)
for i in range(34):
    for j in range(10):
            max_value = max(strategy[i][j])
            print(i,j)
            k=strategy[i][j].index(max_value)
            optimal_strategy[i][j]=['hit','stand','double','split'][k]

0 0
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 9
1 0
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
2 2
2 3
2 4
2 5
2 6
2 7
2 8
2 9
3 0
3 1
3 2
3 3
3 4
3 5
3 6
3 7
3 8
3 9
4 0
4 1
4 2
4 3
4 4
4 5
4 6
4 7
4 8
4 9
5 0
5 1
5 2
5 3
5 4
5 5
5 6
5 7
5 8
5 9
6 0
6 1
6 2
6 3
6 4
6 5
6 6
6 7
6 8
6 9
7 0
7 1
7 2
7 3
7 4
7 5
7 6
7 7
7 8
7 9
8 0
8 1
8 2
8 3
8 4
8 5
8 6
8 7
8 8
8 9
9 0
9 1
9 2
9 3
9 4
9 5
9 6
9 7
9 8
9 9
10 0
10 1
10 2
10 3
10 4
10 5
10 6
10 7
10 8
10 9
11 0
11 1
11 2
11 3
11 4
11 5
11 6
11 7
11 8
11 9
12 0
12 1
12 2
12 3
12 4
12 5
12 6
12 7
12 8
12 9
13 0
13 1
13 2
13 3
13 4
13 5
13 6
13 7
13 8
13 9
14 0
14 1
14 2
14 3
14 4
14 5
14 6
14 7
14 8
14 9
15 0
15 1
15 2
15 3
15 4
15 5
15 6
15 7
15 8
15 9
16 0
16 1
16 2
16 3
16 4
16 5
16 6
16 7
16 8
16 9
17 0
17 1
17 2
17 3
17 4
17 5
17 6
17 7
17 8
17 9
18 0
18 1
18 2
18 3
18 4
18 5
18 6
18 7
18 8
18 9
19 0
19 1
19 2
19 3
19 4
19 5
19 6
19 7
19 8
19 9
20 0
20 1
20 2
20 3
20 4
20 5
20 6
20 7
20 8
20 9
21 0
21 1
21 2
21 3
21 4
21 5
21 6
21 7
21 8
21 9


In [405]:
optimal_strategy

array([['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'stand', 'stand', 'stand', 'hit', 'hit', 'hit',
        'hit', 'hit'],
       ['stand', 'stand', 'sta

In [407]:
np.savetxt("optimal_strategy.csv", optimal_strategy, delimiter=",",fmt="%s")

In [409]:
np.savetxt("optimal_strategy_numbers.csv", strategy.reshape(-1), delimiter=",", fmt="%s")

In [None]:
#Part 2:After the blackjack stratgy is generated, we implement them in a neural network to play blackjack. The use of a neural network is to learn the strategy and improve the strategy over time as the number of cards in the deck reduces which changes the probability of winning given a certain hand over time. The neural network will be trained using the generated strategy from Monte Carlo Simulation and the results of the game will be used to update the strategy. The neural network will be trained using the results of the game and the strategy will be updated using the results of the game. This process is called reinforcement learning.