___
<a href=''> <img src='https://miro.medium.com/max/1000/1*_9abDpNTM9Cbsd2HEXYm9Q.png' width=500 /></a>
___
# An example of applying a random agent on Blackjack in R

### Let's train our first Blackjack in R!

## > Install packages
To begin with, we install `reticulate`, a package that embeds the Python session within the R session. Specifically, in our tutorial, the `reticulate` is used for virtual environment creation and Python packages connection. Then, we install the package `imager` for the curve plotting of the training performance.

In [1]:
install.packages("reticulate")
library(reticulate)

Installing package into ‘/usr/local/lib/R/4.0/site-library’
(as ‘lib’ is unspecified)



In [2]:
install.packages("imager")
library(imager)

Installing package into ‘/usr/local/lib/R/4.0/site-library’
(as ‘lib’ is unspecified)

Loading required package: magrittr


Attaching package: ‘imager’


The following object is masked from ‘package:magrittr’:

    add


The following objects are masked from ‘package:stats’:

    convolve, spectrum


The following object is masked from ‘package:graphics’:

    frame


The following object is masked from ‘package:base’:

    save.image




## > Virtual Environment
Now we create a virtual environment called "r-rlcard"

In [3]:
virtualenv_create('r-rlcard')

virtualenv: r-rlcard


Before using the virtual environment `r-rlcard`, let's double-check if it exists.

In [4]:
virtualenv_list()
use_virtualenv('r-rlcard', required=TRUE)

## >  Import packages
First, we use `py_install()` to install `Rlcard` and `Tensorflow` in R. We recommend to use `'pip = TURE'` for installation because the default conda install method may occur issues.

In [5]:
py_install('rlcard', pip=TRUE)
py_install('rlcard[tensorflow]', pip=TRUE)

Using virtual environment '/Users/miawan/.virtualenvs/r-rlcard' ...
Using virtual environment '/Users/miawan/.virtualenvs/r-rlcard' ...


In [6]:
rlcard <- import('rlcard')
tf <- import('tensorflow')
os <- import('os')
tf$"__version__"

In [7]:
# Import the modules.
rlcard <- import('rlcard')
RandomAgent <- rlcard$agents$RandomAgent
set_global_seed <- rlcard$utils$set_global_seed

In [8]:
# Make environment
config <- list(seed = 0L)
env = rlcard$make('blackjack', config)
episode_num = 2L

In [9]:
# Set a global seed.
set_global_seed(0L)

In [10]:
# Set up agents
agent_0 <- RandomAgent(action_num=env$action_num)
env$set_agents(list(agent_0))

## > Train the model

Training the model requires complicated interactions with Tensorflow. Thus, we recommend importing a Python script. Specifically, we create a file named `train.py` in the same directory with content as follows.


 
  `def train(episode_num, env, agent_0):
   for episode in range(episode_num):

    # Generate data from the environment
    trajectories, _ = env.run(is_training=False)

    # Print out the trajectories
    print('\nEpisode {}'.format(episode))
    for ts in trajectories[0]:
        print('State: {}, Action: {}, Reward: {}, Next State: {}, Done: {}'.format(ts[0], ts[1], ts[2], ts[3], ts[4]))`

In [11]:
reticulate::source_python("train.py")
train(episode_num, env, agent_0)

The expected output should look like something as follows:

#### Episode 0
   State: {'obs': array([12,  7]), 'legal_actions': [0, 1]}, Action: 0, Reward: 0, Next State: {'obs': array([21,  7]), 'legal_actions': [0, 1]}, Done: False

   State: {'obs': array([21,  7]), 'legal_actions': [0, 1]}, Action: 0, Reward: -1, Next State: {'obs': array([22, 18]), 'legal_actions': [0, 1]}, Done: True

#### Episode 1
   State: {'obs': array([16, 10]), 'legal_actions': [0, 1]}, Action: 1, Reward: -1, Next State: {'obs': array([16, 21]), 'legal_actions': [0, 1]}, Done: True

Note that the states and actions are wrapped by env in Blackjack. In this example, the [12, 7] suggests the current player obtains score 12 while the card that faces up in the dealer's hand has score 7. Action 0 means "hit" while action 1 means "stand". Reward 1 suggests the player wins while reward -1 suggests the dealer wins. Reward 0 suggests a tie. The above data can be directly fed into a RL algorithm for training.