### Mauer Cory
# CS 614 Assignment 4: Gaming
## Black Jack card counting agent

## Pitch: 
For this project I chose to build a gaming agent to play blackjack. While most other electronic black jack games are implemented using randomly selected cards, I wanted to model an actual deck of cards to see if I could train a model to count cards using the extra state. The goal of this agent is increase odds of winning in blackjack which could be implemented into a competative online blackjack game. 

## Data source:
The data source of this project is a black jack game that I implemented my self. At a high level, there is one modeled deck of cards and cards are handed out to a player and dealer until our deck reaches 17 cards at which point the "deck" is shuffled. the API into the black jack games provides users and models with the following state 
** players sum of card
** dealers sum of cards
** if the player has an ace
** an array showing the number of cards we have seen 1, 2, ... , 10

## Model and data justification:
For building my agent I decided to leverage a deep q model via tensor flow. Since each move of black jack can effectively be treated as a unique hand/game there was no need to model any temporal aspects which led to this relatively simple model. 

In [1]:
    import os
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.optimizers import Adam
    import tensorflow as tf
    state_size = 14
    action_state = 2
    lr = 0.001
    model = Sequential()
    model.add(Dense(32, input_dim=state_size, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(action_state, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=lr))

    model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 32)                480       
                                                                 
 dense_1 (Dense)             (None, 128)               4224      
                                                                 
 dense_2 (Dense)             (None, 2)                 258       
                                                                 
Total params: 4,962
Trainable params: 4,962
Non-trainable params: 0
_________________________________________________________________


## Commented examples:

In [7]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import tensorflow as tf
from black_jack import BlackJack
from black_jack_deepq import DQNAgent

# Initialize the game agent with the trained model
state_size = 13  # Player's sum, dealer's showing card, and player has a usable ace
action_size = 2  # Stay (0) or Hit (1)
batch_size = 128 # number of steps to be used for training
agent = DQNAgent(state_size, action_size, batch_size, model="./black_jack_model", verbose=False)

# start a new game of black_jack
env = BlackJack(cardState=True)
state, _ = env.reset() # Reset starts a new game and returns the state
print(f"Game State:")
print(f"player sum: {state[0]} isAcePresent: {state[3]}")
print(f"dealer card value: {state[1]}")
print(f"cards played: \n ace, 2, 3, 4, 5, 6, 7, 8, 9, 10 (10, jack, queen, king)")
print(f"  {state[3:]}")

Game State:
player sum: 12 isAcePresent: 0
dealer card value: 2
cards played: 
 ace, 2, 3, 4, 5, 6, 7, 8, 9, 10 (10, jack, queen, king)
  [0, 1, 1, 0, 0, 0, 0, 0, 1, 0]


In [10]:
# Get the agents move
action = agent.act(state)
print(f"action : 0- stay, 1- hit \n {action}")

action : 0- stay, 1- hit 
 0


In [18]:
next_state, reward, done, _1, _2 = env.step(action)
print(f"reward: -1 = loss, 0.0 = not finished/draw, 1 = win {reward}")
print(f" {reward}")
print(f"final state {next_state}")

reward: -1 = loss, 0.0 = not finished/draw, 1 = win 1.0
 1.0
final state [12, 26, False, 0, 1, 1, 1, 0, 0, 0, 0, 1, 2]


In this example the agent made what I would consider to be a not ideal selection by staying on a 12. However the agent ended up winning the had due to the dealer busting on a hit. Since the model was trained to obsever the state of the seen cards from a single deck, perhaps this was a good choice given that there was a 16/52 probability in receiving a 10 card on a hit. 

## Testing:


In [41]:
import numpy as np
state_size = 13  # Player's sum, dealer's showing card, and player has a usable ace
number_of_games = 1000
# Play the game with the trained agent
win_count = 0
print(f"testing the trained model for {number_of_games} games")
for i in range(number_of_games):
    state, _ = env.reset()
    state = np.reshape(state, [1, state_size])
    done = False
    while not done:
        action = agent.act(state)
        next_state, reward, done, _1, _2 = env.step(action)
        next_state = np.reshape(next_state, [1, state_size])
        state = next_state
        if reward >= 1.0:
            win_count = win_count + 1
        # env.render()

print(f"Win count {win_count} number of games{number_of_games}")
print(f"Win rate {(win_count/number_of_games)*100}%")


testing the trained model for 1000 games
Win count 293 number of games1000
Win rate 29.299999999999997%


The model does not currently preform as well as I expected, however this is likely caused from a relative lack of available training time. Given more time and compute resources I beleive this model could acieve closer to the statistic 

## Code and run Instructions
TODO

### I agree to sharing this assignment with other students in the course after grading has occurred. 