# Q Learning and the Game of Blackjack

Q-Learning is a machine learning technique which is useful in a risk and reward environment, such as a game. It's attractive for stochastic situations, in which the perception of the environment is not complete, because it doesn't have to calculate the probabilities of all possibilities in any given state. It can continuously collect and store information and deduce an optimal policy from the outcomes of what's already happened. The more it plays, the more it learns, and the better it gets at choosing the right thing.

To start, let's write some routines to simulate a game of Blackjack.

* Blackjack is played with a normal deck of playing cards. To make things interesting, games are often played using more than one deck.

In [1]:
import random
#
# Cards are dealt from a shoe, or a chute containing a number
# of decks of cards which are randomly shuffled.
# For this exercise, there's no need to differentiate between
# individual cards, so we'll keep instances of point value instead.
#
# There are 52 cards in a deck. Each deck contains 4 suits of 13 cards each:
#     numbered cards with value 2-10
#     face cards: Jack, Queen, King, and Ace valued at 10, 10, 10, and 11/1 respectively
#
class Shoe:
    _suit = [2,3,4,5,6,7,8,9,10,10,10,10,11]
    _shoeSize = 0
    
    def __init__(self,deckCount):
        self._shoeSize = deckCount
        self._shuffle()

    def _shuffle(self):
        # [re]fill the shoe and shuffle the order of cards in it
        self.cards = []
        for i in range(self._shoeSize*4):
            self.cards += self._suit
        random.shuffle(self.cards)

    def draw(self):
        # Give out one card from the end of the shoe
        if len(self.cards) < ((self._shoeSize*52) * .20):
            self._shuffle()
            
        return self.cards.pop()


Let's take a look at a shoe that contains 2 decks. The cards have been loaded into the shoe and then shuffled, or randomized.

In [2]:
shoe = Shoe(2)
print(shoe.cards)

[5, 10, 2, 9, 2, 8, 4, 3, 3, 2, 4, 9, 3, 8, 5, 3, 10, 10, 10, 11, 5, 3, 2, 5, 9, 9, 2, 6, 11, 7, 9, 11, 3, 4, 10, 4, 9, 6, 6, 10, 9, 10, 7, 4, 10, 3, 4, 10, 8, 3, 8, 7, 11, 6, 5, 10, 10, 10, 10, 8, 10, 2, 10, 10, 2, 10, 11, 11, 11, 10, 10, 5, 5, 10, 4, 10, 5, 10, 4, 10, 6, 8, 10, 7, 10, 7, 10, 7, 8, 6, 10, 11, 8, 10, 6, 9, 10, 2, 7, 6, 10, 7, 10, 10]


Notice how we're not really interested in the suits of cards, they don't matter to the game. Only the point value of the cards matters. Points correspond to the number of pips on each card, except for face cards which are worth 10 points. The Ace is a special exception: it can be either 1 point or 11, and the player gets to choose which. Usually, it's counted as 11 untless the hand goes over 21.

The draw() method of the Shoe class takes a card off of the list. Just like in a casino, if the shoe gets close to empty, it gets reloaded and all the cards are shuffled together.

In [3]:
len(shoe.cards)
for i in range(5):
    print(shoe.draw())
print(shoe.cards)

10
10
7
10
6
[5, 10, 2, 9, 2, 8, 4, 3, 3, 2, 4, 9, 3, 8, 5, 3, 10, 10, 10, 11, 5, 3, 2, 5, 9, 9, 2, 6, 11, 7, 9, 11, 3, 4, 10, 4, 9, 6, 6, 10, 9, 10, 7, 4, 10, 3, 4, 10, 8, 3, 8, 7, 11, 6, 5, 10, 10, 10, 10, 8, 10, 2, 10, 10, 2, 10, 11, 11, 11, 10, 10, 5, 5, 10, 4, 10, 5, 10, 4, 10, 6, 8, 10, 7, 10, 7, 10, 7, 8, 6, 10, 11, 8, 10, 6, 9, 10, 2, 7]


* There are at least 2 players. In Blackjack, there is one Dealer and one or more Players. The Dealer is in control of the game; she hands out the cards. All other players go head-to-head with the Dealer. They are in competition with eacher other only in the sense that there is one pool of cards from which they can draw.
* The object of the game is to reach a hand worth as close to 21 points as possible without going over. A hand of 21 points using only 2 cards is called Blackjack.

In [8]:
# %load player.py
# Player class

class Player:
    def __init__(self):
        self.reset()

    def reset(self):
        self.hand = []
        
    def receive(self,card):
        self.hand += [card]
        
    def getPoints(self):
        points = sum(self.hand)
        if points <= 21:
            return points

        while (points > 21) and (11 in self.hand):
            self.hand[self.hand.index(11)] = 1
            points = sum(self.hand)

        return points
        


* A round starts when the Dealer distributes 2 cards to each player, including herself, one at a time in succession. Each player decides to draw more cards or stay, depending on how many points already acculumated in the hand and the value of the one card that the Dealer is showing from her hand.
* When all players have rested on their bet, the Dealer plays out her hand. Dealers follow special rules; they must continue to draw cards until they have reached at least 17 points. They must stay at 17 points or more.
* After the Dealer finishes, points are compared with each player. If a player has more points than the Dealer without either going over 21, the player wins; otherwise, the Dealer wins.

In [23]:
# Playing a basic round of Blackjack

def hit(p,s):
    p.receive(s.draw())
    print("New hand: {} ({})".format(p.hand,p.getPoints()))

def newHand(d,p,s):
    p.reset()
    d.reset()
    deal(d,p,s)

# Deal
def deal(d,p,s):
    d.receive(s.draw())
    p.receive(s.draw())
    d.receive(s.draw())
    p.receive(s.draw())
    print("Dealer's hand: {} ({})".format(dealer.hand,dealer.getPoints()))
    print("Player's hand: {} ({})".format(player.hand,player.getPoints()))

#
# Start up
#

shoe = Shoe(1)
dealer = Player()
player = Player()

deal(dealer,player,shoe)

if dealer.getPoints() == 21:
    print("Dealer wins! Player loses :-(")
elif player.getPoints() == 21:
    print("Player wins! Dealer loses!")
else:
    while player.getPoints() < 21 and player.getPoints() < dealer.getPoints():
        print("Player hits...")
        hit(player,shoe)
    print("Player stays.")
    while dealer.getPoints() < 17:
        print("Dealer hits...")
        hit(dealer,shoe)
    print("Dealer stays.")
    if player.getPoints() > 21:
        print("Player busted :-(")
    elif dealer.getPoints() > 21:
        print("Dealer busted :-(")
    elif player.getPoints() > dealer.getPoints():
        print("Player wins!")
    elif dealer.getPoints() > player.getPoints():
        print("Dealer wins :-(")
    else:
        print("Hand is a draw: no winner.")


Dealer's hand: [10, 5] (15)
Player's hand: [4, 11] (15)
Player stays.
Dealer hits...
New hand: [10, 5, 10] (25)
Dealer stays.
Dealer busted :-(


This is a pretty bad strategy. Let's learn a better one.