### Simple BlackJack Bot through Q-Learning
mr. fish blesses you again with this blackjack bot!

This bot plays the game of BlackJack using trained q-tables to make decisions on whether to "HIT" or "STAND". The q-tables are based on the sum of the player's hand and the dealer's visible card. For example, if the player has a hand value of 16 and the dealer's visible card is a 10, the q-tables will output the scores for "HIT" and "STAND". If the score for "HIT" is higher, the bot will choose to draw another card. If the score for "STAND" is higher, the bot will choose to keep its current hand.

Please note that this bot does not implement card counting and treats aces as having a value of 1. As a result, it is not possible to get a blackjack hand that pays 3:2. If you want to simulate the probability of hitting a blackjack (which is about 4.8%), you can use an if statement in the code to keep track of the balance over a certain number of hands. Since hitting a blackjack is largely a matter of luck, there is no need to have a q-table for this scenario.

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
from tqdm import tqdm

#### Creating the player (and the dealer)
The Player class has several functions that are used to play a game of blackjack:

* __init(self, dealer = False)__: This is the constructor function of the class. It initializes the instance of the Player class with a deck of cards and an empty list of cards for the player. If the player is the dealer, the value of dealer is set to True.


* __getHand(self, cards = None)__: This function is used to draw the first two cards for the player. If the player is not the dealer, both cards will be drawn from the deck at random. If the player is the dealer, only one card will be drawn from the deck at random and the other card will be hidden from the player. If the cards parameter is provided, these cards will be used instead of drawing from the deck.


* __hit(self):__ __(ONLY USE ONCE IF THE PLAYER IS A DEALER, THEN USE DEALER_HIT WHICH IS EXPLAINED BELOW)__ This function is used to draw additional cards for the player when they choose to "hit" in the game. The additional card is drawn at random from the deck.


* __stand(self)__: This function is used when the player chooses to "stand" and not draw any more cards. Simply does nothing.


* __dealer_hit(self)__: __(ONLY USE IF THE PLAYER IS A DEALER)__ This function is used by the dealer to draw additional cards until their hand value is at least 17.


* __showHand(self)__: This function returns the total value of the player's hand and a list of the individual card values.


* __reset(self, retrack = False)__: This function is used to reset the player's hand. If the retrack parameter is True, the function will remove all but one of the dealer's cards and all but two of the player's cards. If retrack is False, the player's hand is completely reset to an empty list.

In [2]:
class Player():
    
    def __init__(self, dealer = False):
        # Initialize the deck of cards and the game state
        self.dealer = dealer
        self.deck = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10]
        self.cards = []
        
    def getHand(self, cards = None):
        
        self.card_one = np.random.choice(self.deck) if cards == None else cards[0]
        if not self.dealer:
            self.card_two = np.random.choice(self.deck) if cards == None else cards[1]   
        
        self.cards.append(self.card_one)
        if not self.dealer:
            self.cards.append(self.card_two)

            
    
    def hit(self):
        self.card_hit = np.random.choice(self.deck)
        self.cards.append(self.card_hit)
        
    def stand(self):
        pass
        
    def dealer_hit(self):
        # Until 17 hit
        while self.showHand()[0] < 17:
            self.hit()
            #print(self.showHand())
    
    
    def showHand(self):
        return sum(self.cards), self.cards
    
    def reset(self, retrack = False):
        self.retrack = retrack
        
        if self.retrack:
            
            rem_cards = 1 if self.dealer else 2
            while not len(self.cards) == rem_cards:
                self.cards.pop()       
        else:       
            self.cards = []      

#### Training the Q-Tables
The train function is used to train the q-tables for the blackjack bot. It takes two optional parameters:

choice: This is either "hit" or "stand". It determines whether the q-table being trained is for the "HIT" or "STAND" action.

rounds: This is the number of rounds to be played in order to train the q-table.

The function works as follows:

It initializes an empty q-table with dimensions 10 (dealer's card values) and 21 (player's hand values).

It loops over all possible dealer's card values (1 to 10) and player's hand values (2 to 21). For each combination, it creates a new Player instance (representing the player) and a Player instance with dealer = True (representing the dealer).

It gets the initial hands for the player and the dealer by drawing two cards that have a sum equal to the player's hand value. It then resets the hands of both players using the reset function.

It plays rounds number of rounds of blackjack, using the hit or stand function for the player depending on the choice parameter. The dealer will keep drawing cards until its hand value is at least 17.

After each round, it updates the q-table score based on the outcome of the round. If the player wins, the score is incremented. If the player loses or the game is a tie, the score is not changed.

It returns the trained q-table when all iterations are complete.

In [3]:

def train(choice = "hit", rounds = 1000):
    
    print("")
    q_table = np.zeros((10, 21))
    for dealer_card in [i for i in range(1, 11)]:
        print("|", end ="")
        for player_hand in [i for i in range(2, 22)]:
        
            # Create the Player and the Dealer
            player = Player()
            dealer = Player(dealer=True)
            
            # Get two arbitary cards that has the sum of your hand
            first_card, second_card = player_hand/2, player_hand/2
            
            # Get the initial hands
            player.getHand([first_card, second_card])
            dealer.getHand([dealer_card])

            dealer.reset(retrack = True)
            player.reset(retrack = True)

            step = 0
            win = 0
            # Play the rounds
            for i in range(rounds):
                
                if choice == "hit":
                    player.hit()
                else:
                    player.stand()
                
                # After the player, dealer starts hitting until the dealer is 17
                dealer.dealer_hit()
                step += 1
                
                # If the player is already busted, no rewards
                if player.showHand()[0] > 21:
                    pass
                
                # If the player didn't get busted but the dealer did, rewards
                elif dealer.showHand()[0] > 21:
                    win += 1
                # If the dealer did not get busted but player has a higher hand, rewards
                elif dealer.showHand()[0] <= player.showHand()[0]:
                    win += 1
                # If the dealer did not get busted and dealer has a higher hand, rewards
                elif dealer.showHand()[0] > player.showHand()[0]:
                    pass



                #print(player.showHand(), dealer.showHand(), win)

                dealer.reset(retrack = True)
                player.reset(retrack = True)

            score = win/step
            q_table[dealer_card-1][player_hand-1] = score
                
    #print(dealer_card, player_hand, score)
    return q_table

In [4]:
# Train the tables here
q_hit = train(choice = "hit", rounds = 10000)
q_stand = train(choice = "stand", rounds = 10000)


||||||||||
||||||||||

In [5]:
# Checking probs of one hand and printing them out
my_hand = 3  # Sum of your hand (2A in this case)
dealer_hand = 6 # Dealer's card (6 in this case)

q_hit[dealer_hand-1][my_hand-1], q_stand[dealer_hand-1][my_hand-1], 

(0.4414, 0.4579)

#### Playing a single Hand
The function below simulates a single hand of blackjack. 

It takes an optional argument bet, which represents the amount of money that the player is betting on the hand. The default value for bet is 10.

The function creates two player objects: one for the player and one for the dealer. It then deals the player and dealer their initial hands by calling the getHand method on each player object.

The function then enters a loop where the player can choose to "hit" (take an additional card) or "stand" (keep their current hand). The player will continue to hit as long as the value of their hand is less than 21 and the expected value of hitting (as determined by the "HIT" q-table) is greater than the expected value of standing (as determined by the "STAND" q-table).

After the player stands or busts, the dealer begins to hit until their hand is at least 17. If the player's hand is greater than the dealer's hand (or the dealer busts), the player is paid out according to the bet. If the player's hand is less than the dealer's hand (or the player busts), the player loses their bet. If the player's hand is equal to the dealer's hand, the bet is a push and the player gets their money back.

In [6]:
def play_hand(bet = 10):
    player = Player()
    dealer = Player(dealer = True)

    player.getHand()
    player_hand = player.showHand()[0]

    dealer.getHand()
    dealer_hand = dealer.showHand()[0]

    while player.showHand()[0] < 21 and q_hit[dealer_hand-1][player_hand-1] > q_stand[dealer_hand-1][player_hand-1]:
        player.hit()
        player_hand = player.showHand()[0]

    dealer.dealer_hit()
    #print(player.showHand(), dealer.showHand(), end ="")
    
    # If the player is already busted, no rewards
    if player.showHand()[0] > 21:
        reward = 0   
        
    # If the player didn't get busted but the dealer did, rewards
    elif dealer.showHand()[0] > 21:
        reward = bet * 2
        
    # If the dealer did not get busted but player has a higher hand, rewards
    elif dealer.showHand()[0] < player.showHand()[0]:
        reward = bet * 2
        
    # Push
    elif dealer.showHand()[0] == player.showHand()[0]:
        reward = bet
        
    # If the dealer did not get busted and dealer has a higher hand, rewards
    elif dealer.showHand()[0] > player.showHand()[0]:
         reward = 0
            
    #print(player.showHand(), dealer.showHand(), reward)
            
    return reward

#### Simulating the Q-Tables with a theoritical chance of hitting BlackJack

This code simulates n number of hands of blackjack using the play_hand function.

The variables balance, wins, loses, and ties are used to keep track of the player's balance, the number of wins, the number of losses, and the number of ties, respectively. The history list is used to store the player's balance after each hand.

The code starts by setting the initial balance to 0 and the initial bet to 10. It then enters a loop where it simulates each hand of blackjack.

For each iteration of the loop, the player's balance is decreased by the value of their bet, which is initially set to 10. The code then generates a random number between 0 and 1000, which represents the probability of the player getting a blackjack (a hand with a value of 21). If this probability is greater than 48 (corresponding to a 4.8% chance of getting a blackjack), the code simulates a regular hand of blackjack using the play_hand function and adds the reward for the hand to the player's balance. The wins, loses, and ties variables are also updated based on the outcome of the hand. If the probability of getting a blackjack is less than or equal to 48, the code assumes that the player got a blackjack and adds their bet plus half of their bet to the balance. The wins variable is also incremented by 1.

After each hand, the player's balance is added to the history list. At the end of the loop, the final balance, number of wins, number of losses, and number of ties are printed.

In [13]:
# How many hands do you want to play?
nr_hands = 1000

# What is your current balance? (Advised to keep it at 0 to see how down you can go)
balance = 0

# How much will you be betting every hand?
init_bet = 10





# Run the code after setting all the params above





history = []
wins, loses, ties = 0,0,0

player_bet = init_bet
for i in range(nr_hands):
    
    # Place your bet
    balance -= player_bet
    
    player_bj_chance = np.random.randint(0, 1000)
    dealer_bj_chance = np.random.randint(0, 1000)
    
    # If player and dealer hits BlackJack -> Push
    if (player_bj_chance < 48) and (dealer_bj_chance < 48):
        balance += player_bet
        ties += 1
        pass
    # If player hits blackjack -> Wins
    elif player_bj_chance < 48:
        balance += player_bet + player_bet + player_bet/2
        wins += 1
    # If dealer hits blackjack -> Loses 
    elif dealer_bj_chance < 48:
        loses +=1
       
    # If the game continues
    else:
        reward = play_hand(bet = player_bet)
        balance += reward

        if reward < player_bet:
            loses += 1
        elif reward == player_bet:
            ties += 1
        else:
            wins += 1
        #print("BJ", bj_chance)
        
        
    #print(balance)
    history.append(balance)
    
print("Final Balance:", balance, " - Wins:", wins," - Loses:" , loses, " - Ties:", ties)

Final Balance: -210.0  - Wins: 440  - Loses: 486  - Ties: 74


In [14]:
# Print the performance, it will always go down in the long run trust me.
px.line(history)

#### Evaluate your hand againts the dealer's hand
This part is pretty self-explanatory. Just put the sum of your hand and the dealer's hand like I did below and run it. It will yield the best mathematical move. Do not forget to have some faith in the dealer since there is nothing else you can do.

In [9]:
def evaluate(my_hand, dealer_hand):

    if q_hit[dealer_hand-1][my_hand-1] > q_stand[dealer_hand-1][my_hand-1]:
        print(q_hit[dealer_hand-1][my_hand-1], q_stand[dealer_hand-1][my_hand-1], "HIT")
    else:
        print(q_hit[dealer_hand-1][my_hand-1], q_stand[dealer_hand-1][my_hand-1], "STAND")
        

In [10]:
evaluate(16, 10)

0.2568 0.2218 HIT


### Morale of the Story
This python notebook will simulate the game of BJ and give you the basic strategy (what to play in a certain situation) without card counting. I did not implement card counting due to two reasons:
* The way the cards are dealt in the Netherlands is always random. They take the cards out from a machine that shuffles 5 decks all the time. ( I do not necessarily know if this is true, this is what the dealer told me )
* I do not know how counting works.
Please don't gamble and use this as a method to make money. If this were the case, trust me I would be doing that already but guess what? I am sharing this instead with you guys.