# Blackjack como un proceso de decisión markoviano (MDP)

En esta libreta se tratará de representar el juego del blackjack (o 21) como un proceso de decisión markoviano, es decir, encontrar una politica cada estado que se puede tener en un juego de blackjack.

Pero este juego de blackjack tiene unas restricciones, y las reglas son un poco diferentes:

* El mazo es infinito, esto quiere decir que la probabilidad para que salga cada carta es de 1/13
* Se gana cuando se llega a 4 cartas, esto hace que los estados sean de 4 cartas o menos
* El que sume 21 con sus cartas gana


## Librerias

Librerias que se usaran

In [1]:
using Combinatorics

Funciones auxiliares para generar el estado

In [2]:
function generateHandPlayer()
    cards = [1,2,3,4,5,6,7,8,9,10,11]
    hand = unique(x->sum(x), collect(combinations(cards,2)))
    # Hand with 3 cards
    temp1 = unique(x->sum(x), collect(combinations(cards,3)))
    # Hand with 4 cards
    temp2 = unique(x->sum(x), collect(combinations(cards,4)))

    append!(hand,temp1)
    hand = unique(x->sum(x), hand)
    append!(hand,temp2)
    hand = unique(x->sum(x), hand)
    return hand
end

function generateHandDealer()
    cards = [1,2,3,4,5,6,7,8,9,10,11]
    hand = unique(x->sum(x), collect(combinations(cards,2)))
    temp = unique(x->sum(x), collect(combinations(cards,3)))
    append!(hand,temp)
    hand = unique(x->sum(x), hand)
    return hand
end

generateHandDealer (generic function with 1 method)

## Funciones

Se definiran las funciones y variables que se usaran para definir un MDP

In [3]:
function MDP_legal_actions()
    return ["take", "pass"]
end

function MDP_reward(s,a,s_)
    reward = 0
    lenSCards = length(s[1])
    lenS_Cards = length(s[1])
    if a == nothing
        if length(s[1]) == 4 || s[2] == 21 || s[2] > s[4]
            reward = 1
        else
            reward = -1
        end
    elseif a == "take"
        if lenS_Cards == lenSCards+1
            # This means the cards that are in s are also in s' (in this case s_)
            if s[1] == s_[1][1:lenSCards]
                if s_[2] == 21 || lenS_Cards == 4
                    reward = 1
                end
            end
        end
    else
        if s_[4] == 21 || s_[2] > 21
            reward = -1
        end
    end
    return reward
end

function MDP_p(s,a,s_)
    if a == "pass"
        if s == s_
            probability = 1
        else
            probability = 0
        end
    # a == "take"
    else
        lenSCards = length(s[1])
        lenS_Cards = length(s_[1])
        
        # This means the first cards of each states aren't even the same length
        if lenS_Cards == lenSCards + 1
            if s[1] == s_[1][1:lenSCards]
                probability = 1/13
            else
                probability =  0
            end
        else
            probability = 0
        end
        
    end
    
    return probability
end

function MDP_is_final(s)
   if length(s[1]) == 4 s[2] >= 21 || length(s[3]) == 4 || s[4] >= 21
        return true
    else
        return false
    end
end

#=  
Los estados son una 4-tupla donde (x, xs, y, ys)
x: La mano que tiene el jugador
xs: La suma de la mano del jugador
y: La mano del dealer
ys: La suma de la mano del dealer
=#
function generate_state()
    player = generateHandPlayer()
    dealer = generateHandDealer()
    return [(x, sum(x), y, sum(y)) for x in player for y in dealer]
end

generate_state (generic function with 1 method)

## Value iteration
Función para calcular la política usando el método de iteración de valor (value iteration)

In [9]:
function iter_value(r, states)
    V = Dict(s => 0 for s in states)
    V_p = copy(V)
    
    salir = false
    while !salir
        for s in keys(V)
            if MDP_is_final(s)
                V_p[s] = MDP_reward(s, nothing, s)
            else
                V_p[s] = maximum([sum([MDP_p(s,a,s_)*(MDP_reward(s,a,s_)+r*V[s_]) for s_ in states]) for a in MDP_legal_actions()])
            end
            
            salir = true
            
            for s in keys(V)
                if V_p[s] > V[s]
                    salir = false
                    V[s] = V_p[s]
                else
                end
            end
            
            if salir
                break
            end
        end
    end
    
    #Regresamos la politicaaaaa
    policy = Dict()
    for s in keys(V)
        temp = Dict(a => sum([MDP_p(s,a,s_)*V[s_] for s_ in states]) for a in MDP_legal_actions())
        policy[s] = findmax(temp)[2]
        
    end
    return policy
end

iter_value (generic function with 1 method)

In [11]:
states = generate_state()
r= 0.9
policy = iter_value(r,  states)


Dict{Any,Any} with 1008 entries:
  ([9, 10, 11], 30, [1, 2], 3)          => "take"
  ([9, 10, 11], 30, [6, 10, 11], 27)    => "take"
  ([1, 10, 11], 22, [1, 10], 11)        => "take"
  ([4, 9, 10, 11], 34, [1, 4], 5)       => "take"
  ([6, 10, 11], 27, [9, 11], 20)        => "take"
  ([3, 9, 10, 11], 33, [8, 10, 11], 29) => "take"
  ([4, 10, 11], 25, [6, 10, 11], 27)    => "take"
  ([10, 11], 21, [1, 11], 12)           => "take"
  ([2, 11], 13, [7, 10, 11], 28)        => "take"
  ([3, 10, 11], 24, [1, 10], 11)        => "take"
  ([8, 10, 11], 29, [6, 11], 17)        => "take"
  ([7, 10, 11], 28, [1, 10], 11)        => "take"
  ([3, 9, 10, 11], 33, [1, 10], 11)     => "take"
  ([1, 8], 9, [1, 7], 8)                => "take"
  ([6, 9, 10, 11], 36, [4, 10, 11], 25) => "take"
  ([6, 9, 10, 11], 36, [1, 9], 10)      => "take"
  ([5, 11], 16, [5, 11], 16)            => "take"
  ([3, 11], 14, [7, 10, 11], 28)        => "take"
  ([9, 11], 20, [9, 10, 11], 30)        => "take"
  ([1, 4], 5, [1,