# **Montecarlo**
----

## Goals of this notebook:

1. Make sure that the most basic environment (HitStand) properly works.
2. Deploy main plotting utilities.
3. Execute a simple experiment to make sure that my hand-made Montecarlo-based algorithms (for both prediction and control) are mistake proof.
4. Define methods to accelerate multiple episode executions in later notebooks.
5. Showcase agents' common efficiency-oriented capabilities (e.g.: Q/V-table storage and agent storage).

---
## Library imports

#### 1. RL libraries:

In [3]:
import environments
from agents.agents import MonteCarloPredictor, MontecarloController, OffPolicyMontecarlo

#### 2. Data aggregation and matrix operation libraries:

In [4]:
import numpy as np
import pandas as pd

#### 3. Plotting libraries:

In [5]:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.ticker import FormatStrFormatter
from mpl_toolkits.axes_grid1 import make_axes_locatable
# Commands to tweak notebook layout:
from sys import maxsize
np.set_printoptions(threshold=maxsize)
plt.style.use('seaborn-pastel')

#### 4. Statistical analysis tools:

In [7]:
from statsmodels.tsa.stattools import adfuller

----

## Common Plotting utilities:


----

## Basic enviroment presentation:

In [8]:
# AI gym-like enviroment instance creation:
env = environments.make('hitstand')

In [10]:
print(env)
# Main rules:

1. 6 decks (with replacement after each episode)
2. Dealer stands on soft 17
3. No Double Down
4. No split
5. No insurance offered
6. No surrender
7. Natural Blackjack 3:2


**Code note**: although natural Blackjack triggers a 3:2 payoff, it will never be computed as Blackjack states entail automatic state transition (the agent receives two cards and just needs to wait for hand resolution).  

In [12]:
env.reward_space_description

{-1: 'The House beats Jack',
 0: 'Draw/not terminal',
 1: 'Jack beats the House',
 1.5: 'Blackjack for Jack'}

In [14]:
env.observation_space_description
# Every observation consists of a 3-element tuple, the description for each element and their positions are:

{0: "Player's total", 1: "Dealer's card value", 2: 'Player has got usable ace'}