<a href="https://colab.research.google.com/github/DavoodSZ1993/RL/blob/main/05_MC_Control.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import gym

env = gym.make('Blackjack-v0')

In [2]:
state_space_size = (33, 12, 2)
policy = np.zeros(state_space_size, dtype=int)

In [3]:
def observation_clean(observation):
  return (observation[0], observation[1], int(observation[2]))

### Side Note: Binomial Probability Distribution:

* $X \sim Binomial(n,p)$ (where $0\le p\le1$): the number of heads in $n$ independent flips of a coin with heads probability $p$.

$$
p(x) = \begin{bmatrix} n \\ x\end{bmatrix}p^x(1-p)^(n-x)
$$

* **Probability Mass Functions (PMFs)**:
When a random variable $X$ takes on a finite set of possible values (i.e., $X$ is a discrete random variable), a simpler way to represent the probability measure associated with a random variable is to directly specifiy the probability of each value that the random variable can assume. In particular, a _probability mass function (PMF)_ is a function $p_x: \Omega →R$ such that:

$$
p_X(x)=P(X=x)
$$

In [4]:
def run_episode_exploring_start(policy, env=env):
  steps = []
  observation = observation_clean(env.reset())
  done = False
  steps.append(((None, None) + (observation, 0))) # State, Action, Next State, Reward
  start = True

  while not done:
    if start:
      action = np.random.binomial(n=1, p=0.5)  # returns a sample (n=1) either zero or one based on binominal distribution(p=0.5)
      start = False
    else:
      action = policy[observation]

    observation_action = (observation, action)
    observation, reward, done, info = env.step(action)
    observation = observation_clean(observation)
    steps.append(observation_action + (observation, int(reward)))
  return steps

In [12]:
size = list(state_space_size) + [2]
Q = np.zeros(size)
Q.shape

(33, 12, 2, 2)

In [14]:
from collections import defaultdict
returns = defaultdict(list)
returns

defaultdict(list, {})