# Teaching A Computer To Play Blackjack

*by Kelsey Cribari, Brandt Reutimann, and Courtney Schulze - December, 2017*

## Introduction
*The What. The Why. The How.*

It all started when a member of the iconic trio went to Las Vegas for a weekend over the semester. While she had never gambled before, everyone would soon find out that Courtney loved blackjack ... a lot. Although her wallet was empty, Courtney left Vegas rich in experience and Blackjack knowledge. Born out of that trip and lost money came an idea, an idea for a CS440 final project.

For those of you reading this report who might not know, in the game of blackjack, players are dealt two cards, and they are dealt cards until they reach or get as close as then can to a card value totaling twenty-one. If the player’s total goes over twenty-one, the player busts and they lose their money. If the player’s total is less than the dealer’s total, the player also loses their money. Really, the player tends to lose their money quite a lot. However, there is a generally universally accepted basic blackjack strategy. If players play with this proper strategy, then the dealer only has about a 0.05% advantage. In other words, “if you are playing for $100 per hand, you can expect to lose about 50 cents each hand.” A chart of this proper strategy is found below:

![alt text](https://www.blackjackclassroom.com/wp-content/uploads/2017/02/Blackjack-Basic-Strategy-Chart.png "Basic Blackjack Strategy")
(Photo courtesy of [Blackjack Classroom](https://www.blackjackclassroom.com/blackjack-basic-strategy-charts))

Because of this interesting nature of the basic blackjack strategy, we thought it would be a fascinating project to try training a computer to play blackjack through basic artifical intelligence principles, namely reinforcement learning. Would the computer naturally learn the strategy that all professional blackjack players accept as “proper”? That’s what we set to find out.

### Overview of Methods:
* Temporal Difference Reinforcement Learning: In the inital implementation of the game, we used reinforcements of 1 if you beat the dealer, 0 if you push (tie with the dealer), and -1 if you lose.

### Overview of Results:
** put results here **

## Method
*The steps we took. The resources we used. The work we shared.*

### Phase One:
The first phase was comprised of creating deck, player, and dealer representations for the game. Feel free to run the code below.

#### Deck:
*Main Author: Brandt Reutimann*

The deck is a relatively standard card deck, except suits are naturally ignored. All face cards are represented by 10's (since for blackjack, a king is the same as a queen which is the same as a ten). 

A new deck is shuffled by default. This feature can be turned off by passing ```deck = BlackJackDeck (shuffleCards = False)```.

Suits and faces can be activated with ```deck = BlackJackDeck (SuitsAndFaces = True)```.

In [1]:
from src.Deck import BlackJackDeck

In [2]:
print ("Example of drawing 10 random cards without suits or faces:")
deck = BlackJackDeck()
for i in range (0, 10):
    print (deck.drawCard())

Example of drawing 10 random cards without suits or faces:
10
6
10
8
2
2
2
10
5
9


In [3]:
print ("Example of drawing 10 random cards with suits and faces:")
deckSuits = BlackJackDeck(SuitsAndFaces = True)
for i in range (0, 10):
    print (deckSuits.drawCard())

Example of drawing 10 random cards with suits and faces:
(4, 'S')
(9, 'S')
(3, 'H')
(10, 'C')
(4, 'C')
('J', 'C')
(6, 'H')
(4, 'D')
(9, 'D')
('Q', 'S')


#### Player:
*Main Author: Courtney Schulze*

A player consists of a hand and a current card count (which is the current card total of the cards in their hand). A player has a list of valid moves they can make (either stand or hit). When a player hits, a card is drawn from the deck and added to the player's hand. If the card count after the new card is greater than 21, then the player busts. 

In [4]:
from src.Player import Player

In [5]:
# create both deck and player
deck = BlackJackDeck()
player = Player()

#put first two cards in player's hand
player.addCardToHand(deck.drawCard())
player.addCardToHand(deck.drawCard())

print("The player's current hand is: " + str(player.getHand()))
print("The cards in the player's hand total: " + str(player.getCardCount()))
print("Valid moves: " + str(player.validMoves()))

The player's current hand is: [5, 10]
The cards in the player's hand total: 15
Valid moves: ['stand', 'hit', 'double']


In [6]:
print("The player takes a card. Here are the results: " + str(player.hit(deck)))
print("The player busted based on the previous hit: " + str(player.bust))

The player takes a card. Here are the results: [5, 8, 10]
The player busted based on the previous hit: True


What happens if the player keeps hitting until they get to a total greater than 21?

In [7]:
if (player.getCardCount() < 21):
    while not player.bust:
        result = player.hit(deck)
        print("Player's hand: " + str(player.getHand()))
    print("Player busted! Card total was: " + str(player.getCardCount()))

#### Dealer:
*Main Author: Kelsey Cribari*

A dealer is pretty much a player, but they have a more specific ruleset they have to follow in terms of valid moves.

In [8]:
from src.Dealer import BlackJackDealer

In [9]:
# create both dealer and deck
deck = BlackJackDeck()
dealer = BlackJackDealer()

dealer.hand.append(deck.drawCard())
# deal second card to dealer
dealerFaceCard = deck.drawCard()
dealer.hand.append(dealerFaceCard)
# keep track of the card that is face up on the dealer so the player knows what to base their moves off of
dealer.faceUpCard = dealerFaceCard

print("Dealer's faceup card: " + str(dealer.faceUpCard))
print("Dealer's hand: " + str(dealer.hand))
print("The dealer has to: " + str(dealer.dealerValidMoves()))

Dealer's faceup card: 2
Dealer's hand: [10, 2]
The dealer has to: hit


### Phase Two:
This phase consisted of doing the game representation and writing the trainQ and testQ methods.

#### Game
*Main Authors: Kelsey Cribari and Brandt Reutimann*

** Kelsey is going to write things here **

In [10]:
from src.Game import BlackJackGame

#### TrainQ and TestQ
*Main Authors: Kelsey Cribari and Brandt Reutimann*

Here's the part in which we trained our player! Reinforcement for a combination of the player's hand and the dealer's up card was only positive when the player won. Reinforcement was negative when the player lost. Take a gander at the code below - in this code, the learning rate is 0.6, and the epsilon decay factor is 0.8. Q is trained over 100000 iterations.

In [11]:
game = BlackJackGame()
Q = game.trainQ(100000, .6, .8)
print ('calling testQ')
winRate = game.testQ(Q, 1000, True)
print ('Win rate was: {}'.format(winRate))

calling testQ
Initial Hand: [6, 9], Player: [6, 9], Dealer: [8, 2, 9], Result: loss
Initial Hand: [4, 10], Player: [4, 10], Dealer: [6, 3, 10], Result: loss
Initial Hand: [3, 4], Player: [3, 4, 'A'], Dealer: [9, 3, 10], Result: win
Initial Hand: [7, 10], Player: [7, 10], Dealer: [2, 9, 8], Result: loss
Initial Hand: [10, 10], Player: [10, 10], Dealer: [10, 9], Result: win
Initial Hand: [10, 10], Player: [10, 10], Dealer: [6, 10, 'A'], Result: win
Initial Hand: [2, 5], Player: [2, 5, 'A'], Dealer: [9, 3, 3], Result: win
Initial Hand: [6, 10], Player: [6, 10], Dealer: [5, 5, 10], Result: loss
Initial Hand: [3, 6], Player: [3, 6, 7], Dealer: [8, 6, 'A'], Result: win
Initial Hand: [4, 10], Player: [4, 10, 'A'], Dealer: [4, 2, 2, 8], Result: loss
Win rate was: 44.4


In [12]:
def averageWinRate(learningRate, epsilonDecayFactor):
    sumWin = 0.0
    for i in range(10):
        game = BlackJackGame()
        Q = game.trainQ(100000, learningRate, epsilonDecayFactor)
        winRate = game.testQ(Q, 1000)
        sumWin += winRate
    
    return sumWin / 10

print("Average win rate for learningRate = 0.6 and epsilonDecayFactor = 0.8: " + str(averageWinRate(0.6, 0.8)))

Average win rate for learningRate = 0.6 and epsilonDecayFactor = 0.8: 43.989999999999995


In most version of the average win rate above, the win rate comes out to about 45%. This makes sense: when playing with proper strategy, the house should only have about a 0.05% advantage, bringing the player's win rate to 45%. According to [Wizard of Odds](https://wizardofodds.com/games/blackjack/appendix/4/), the probability of a net win in blackjack is 42.42%. If ties are ignored, that jumps to about 46.35%. However, we wondered if we could make that win rate better by playing around with the learning rate and epsilon decay factor for 100,000 iterations. Therefore, Courtney put on her investigation hat to see how high we could get that win rate.

In [13]:
def testDifferentValues():
    print("Testing learningRate = 0.5 and epsilonDecayFactor = 0.5")
    print("Average win rate for learningRate = 0.5 and epsilonDecayFactor = 0.5: " + str(averageWinRate(0.5, 0.5)))
    
    print("Testing learningRate = 0.99 and epsilonDecayFactor = 0.3")
    print("Average win rate for learningRate = 0.99 and epsilonDecayFactor = 0.3: " + str(averageWinRate(0.99, 0.3)))
    
    print("Testing learningRate = 0.99 and epsilonDecayFactor = 0.8")
    print("Average win rate for learningRate = 0.99 and epsilonDecayFactor = 0.8: " + str(averageWinRate(0.99, 0.8)))
    
    print("Testing learningRate = 0.3 and epsilonDecayFactor = 0.3")
    print("Average win rate for learningRate = 0.3 and epsilonDecayFactor = 0.3: " + str(averageWinRate(0.3, 0.3)))
    
    print("Testing learningRate = 0.3 and epsilonDecayFactor = 0.99")
    print("Average win rate for learningRate = 0.3 and epsilonDecayFactor = 0.99: " + str(averageWinRate(0.3, 0.99)))

print("Please be prepared to wait a while.")
testDifferentValues()

Please be prepared to wait a while.
Testing learningRate = 0.5 and epsilonDecayFactor = 0.5
Average win rate for learningRate = 0.5 and epsilonDecayFactor = 0.5: 44.64
Testing learningRate = 0.99 and epsilonDecayFactor = 0.3
Average win rate for learningRate = 0.99 and epsilonDecayFactor = 0.3: 40.71
Testing learningRate = 0.99 and epsilonDecayFactor = 0.8
Average win rate for learningRate = 0.99 and epsilonDecayFactor = 0.8: 41.4
Testing learningRate = 0.3 and epsilonDecayFactor = 0.3
Average win rate for learningRate = 0.3 and epsilonDecayFactor = 0.3: 44.39
Testing learningRate = 0.3 and epsilonDecayFactor = 0.99
Average win rate for learningRate = 0.3 and epsilonDecayFactor = 0.99: 45.77


Taking a look at these results, the best thing to do seems to be use a bit lower of a learning rate, and don't make the epsilonDecayFactor too large or small.

## Results
*The results we got.*

## Conclusions
*The things we learned.*