<a href="https://colab.research.google.com/github/mia1996/rlcard-tutoirial/blob/master/random.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# <a href='https://github.com/datamllab/rlcard'> <center> <img src='https://miro.medium.com/max/1000/1*_9abDpNTM9Cbsd2HEXYm9Q.png' width=500 class='center' /></a> 

## **Playing with Random Agents**
This example shows how we can make an RLCard environment and interact with the environment. We use Leduc Hold'em environment as an example to walk through the process of installing RLCard and making an environment.

First, we install RLCard and PyTorch.

In [None]:
!pip install rlcard[torch]



Then we import RLCard and a random agent, which moves randomly, to interact with the environment.

In [None]:
import rlcard
from rlcard.agents import RandomAgent

Now we can create a Leduc Hold'em environment by simply passing `leduc-holdem` to the `make` method of the environment.

In [None]:
env = rlcard.make("leduc-holdem")

Everything about Leduc Hold'em is wrapped in `env`. Now let's take a look at what we have in the environment.

In [None]:
print("Number of actions:", env.num_actions)
print("Number of players:", env.num_players)
print("Shape of state:", env.state_shape)
print("Shape of action:", env.action_shape)

Number of actions: 4
Number of players: 2
Shape of state: [[36], [36]]
Shape of action: [None, None]


We can know that the Leduc Hold'em environment is a 2-player game with 4 possible actions. The state (which means all the information that can be observed at a specific step) is of the shape of 36. There is no action feature. Different environments have different characteristics. You can try other environments as well. For example, you can try Dou Dizhu.

In [None]:
env_doudizhu = rlcard.make("doudizhu")
print("Number of actions:", env_doudizhu.num_actions)
print("Number of players:", env_doudizhu.num_players)
print("Shape of state:", env_doudizhu.state_shape)
print("Shape of action:", env_doudizhu.action_shape)

Number of actions: 27472
Number of players: 3
Shape of state: [[790], [901], [901]]
Shape of action: [[54], [54], [54]]


This environment is much more complex with a large number of possible actions and very large state and action spaces. Despite these challenges, RLCard has implemented the algorithm [DMC](https://github.com/kwai/DouZero) in [DouZero](https://arxiv.org/abs/2106.06135), which leads to strong agents. We will show how to train strong Dou Dizhu AI with RLCard in later tutorials.

Now let's just interact with the environment with random agents. We first tell the environment we want to use random agents to interact by using `set_agents`.

In [None]:
agent = RandomAgent(num_actions=env.num_actions)
env.set_agents([agent for _ in range(env.num_players)])

Now we are ready to make interactions with `run`.

In [None]:
trajectories, player_wins = env.run(is_training=False)

Here, `is_training` indicates we use the `eval_step` method of the random agent to interact. If `is_training` is `True`, it will instead use `step` instead. This interface is designed for certain reinforcement learning algorithms that have different behaviors in training and evaluation. `trajectories` are the collected data. Let's print it out!

In [None]:
print(trajectories)

[[{'legal_actions': OrderedDict([(0, None), (1, None), (2, None)]), 'obs': array([0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0.]), 'raw_obs': {'hand': 'HK', 'public_card': None, 'all_chips': [2, 4], 'my_chips': 2, 'legal_actions': ['call', 'raise', 'fold'], 'current_player': 0}, 'raw_legal_actions': ['call', 'raise', 'fold'], 'action_record': [(1, 'raise'), (0, 'fold')]}, 2, {'legal_actions': OrderedDict([(1, None), (2, None), (3, None)]), 'obs': array([0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0.]), 'raw_obs': {'hand': 'HK', 'public_card': None, 'all_chips': [2, 4], 'my_chips': 2, 'legal_actions': ['raise', 'fold', 'check'], 'current_player': 1}, 'raw_legal_actions': ['raise', 'fold', 'check'], 'action_record': [(1, 'raise'), (0, 'fold')]}], [{'legal_actions': Order

These consist of observations and actions of the two players. They can be used for training reinforcement learning agents. Let's print out `player_wins` as well.

In [None]:
print(player_wins)

[-1.  1.]


This means the second player wins and gets 1 point. These are all the interfaces we need to know to train most of the reinforcement learning agents! We will show you how to do this in later tutorials.