# Overcooked Tutorial
This Notebook will demonstrate a couple of common use cases of the Overcooked-AI library, including loading and evaluating agents and visualizing trajectories.


In [2]:
%reload_ext autoreload
%autoreload 2

from overcooked_ai_py.agents.agent import AgentPair, RandomAgent
from overcooked_ai_py.agents.benchmarking import AgentEvaluator
from overcooked_ai_py.visualization.state_visualizer import StateVisualizer  
# import importlib
# importlib.reload(overcooked_ai_py.mdp.overcooked_mdp.py)


# Here we create an evaluator for the cramped_room layout
layout = "cramped_room_cutting_board"
ae = AgentEvaluator.from_layout_name(mdp_params={"layout_name": layout, "old_dynamics": True}, 
                                     env_params={"horizon": 400})

ap = AgentPair(RandomAgent(), RandomAgent())

trajs = ae.evaluate_agent_pair(ap, 10)

trajs2 = ae.evaluate_human_model_pair(1)


StateVisualizer().display_rendered_trajectory(trajs2, ipython_display=True)

filepath /Users/varshinichinta/Desktop/IRL/overcooked_ai/src/overcooked_ai_py/data/layouts/cramped_room_cutting_board.layout
layout grid: [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', '2', 'O'], ['X', '1', ' ', ' ', 'X'], ['X', 'D', 'C', 'S', 'X']]
all elements ['X', 'X', 'P', 'X', 'X', 'O', ' ', ' ', '2', 'O', 'X', '1', ' ', ' ', 'X', 'X', 'D', 'C', 'S', 'X']


  0%|          | 0/10 [00:00<?, ?it/s]

Computing MotionPlanner to be saved in /Users/varshinichinta/Desktop/IRL/overcooked_ai/src/overcooked_ai_py/data/planners/cramped_room_cutting_board_mp.pkl
It took 0.11051225662231445 seconds to create mp


Avg rew: 0.00 (std: 0.00, se: 0.00); avg len: 400.00; : 100%|██████████| 10/10 [00:01<00:00,  5.14it/s]
Avg rew: 200.00 (std: 0.00, se: 0.00); avg len: 400.00; : 100%|██████████| 1/1 [00:00<00:00,  2.89it/s]


state:  Players: ((1, 2) facing (0, -1) holding None, (3, 1) facing (0, -1) holding None), Objects: [], Bonus orders: [] All orders: [('onion', 'onion', 'onion')] Timestep: 0
frame name: counter
frame rect: <rect(1, 1, 17, 17)>
frame name: counter
frame rect: <rect(1, 1, 17, 17)>
frame name: pot
frame rect: <rect(39, 1, 17, 17)>
frame name: counter
frame rect: <rect(1, 1, 17, 17)>
frame name: counter
frame rect: <rect(1, 1, 17, 17)>
frame name: onions
frame rect: <rect(20, 1, 17, 17)>
frame name: floor
frame rect: <rect(1, 39, 17, 17)>
frame name: floor
frame rect: <rect(1, 39, 17, 17)>
frame name: floor
frame rect: <rect(1, 39, 17, 17)>
frame name: onions
frame rect: <rect(20, 1, 17, 17)>
frame name: counter
frame rect: <rect(1, 1, 17, 17)>
frame name: floor
frame rect: <rect(1, 39, 17, 17)>
frame name: floor
frame rect: <rect(1, 39, 17, 17)>
frame name: floor
frame rect: <rect(1, 39, 17, 17)>
frame name: counter
frame rect: <rect(1, 1, 17, 17)>
frame name: counter
frame rect: <rect(1

interactive(children=(IntSlider(value=0, description='timestep', max=399), Output()), _dom_classes=('widget-in…

# Deprecated stuff which requires BC and RL training (see README for details)

# Getting started: Training your agent

You can train BC agents using files under the `human_aware_rl/imitation` directory. 

In [None]:

#%pip install ray tree
# %pip install -q human_aware_rl
layout = "cramped_room" # any compatible layouts 
from human_aware_rl.imitation.behavior_cloning_tf2 import get_bc_params, train_bc_model
from human_aware_rl.static import CLEAN_2019_HUMAN_DATA_TRAIN

params_to_override = {
    # this is the layouts where the training will happen
    "layouts": [layout], 
    # this is the layout that the agents will be evaluated on
    # Most of the time they should be the same, but because of refactoring some old layouts have more than one name and they need to be adjusted accordingly
    "layout_name": layout, 
    "data_path": CLEAN_2019_HUMAN_DATA_TRAIN,
    "epochs": 10,
    "old_dynamics": True,
}

bc_params = get_bc_params(**params_to_override)
train_bc_model("tutorial_notebook_results/BC", bc_params, verbose = True)

ModuleNotFoundError: No module named 'tree'

: 

# 1): Loading trained agents
This section will show you how to load a pretrained agents. 

## 1.1) Loading BC agent
The BC (behavior cloning) agents are trained separately without using Ray. We showed how to train a BC agent in the previous section, and to load a trained agent, we can use the load_bc_model function

In [9]:
from human_aware_rl.imitation.behavior_cloning_tf2 import load_bc_model
#this is the same path you used when training the BC agent
bc_model_path = "tutorial_notebook_results/BC"
bc_model, bc_params = load_bc_model(bc_model_path)
bc_model, bc_params

(<keras.engine.functional.Functional at 0x7f73ac2c2110>,
 {'eager': True,
  'use_lstm': False,
  'cell_size': 256,
  'data_params': {'layouts': ['cramped_room'],
   'check_trajectories': False,
   'featurize_states': True,
   'data_path': '/nas/ucb/micah/overcooked_ai/src/human_aware_rl/static/human_data/cleaned/2019_hh_trials_train.pickle'},
  'mdp_params': {'layout_name': 'cramped_room', 'old_dynamics': True},
  'env_params': {'horizon': 400,
   'mlam_params': {'start_orientations': False,
    'wait_allowed': False,
    'counter_goals': [],
    'counter_drop': [],
    'counter_pickup': [],
    'same_motion_goals': True}},
  'mdp_fn_params': {},
  'mlp_params': {'num_layers': 2, 'net_arch': [64, 64]},
  'training_params': {'epochs': 10,
   'validation_split': 0.15,
   'batch_size': 64,
   'learning_rate': 0.001,
   'use_class_weights': False},
  'evaluation_params': {'ep_length': 400, 'num_games': 1, 'display': False},
  'action_shape': (6,),
  'observation_shape': (96,)})

Now that we have loaded the model, since we used Tensorflow to train the agent, we need to wrap it so it is compatible with other agents. We can do it by converting it to a Rllib-compatible policy class, and wraps it as a RllibAgent. 

In [10]:
from human_aware_rl.imitation.behavior_cloning_tf2 import _get_base_ae, BehaviorCloningPolicy
bc_policy = BehaviorCloningPolicy.from_model(bc_model, bc_params, stochastic=True)
# We need the featurization function that is specifically defined for BC agent
# The easiest way to do it is to create a base environment from the configuration and extract the featurization function
# The environment is also needed to do evaluation

base_ae = _get_base_ae(bc_params)
base_env = base_ae.env

from human_aware_rl.rllib.rllib import RlLibAgent
bc_agent0 = RlLibAgent(bc_policy, 0, base_env.featurize_state_mdp)
bc_agent0

bc_agent1 = RlLibAgent(bc_policy, 1, base_env.featurize_state_mdp)
bc_agent1

<human_aware_rl.rllib.rllib.RlLibAgent at 0x7f73ac5b4040>

Now we have a BC agent that is ready for evaluation 

## 1.3) Loading & Creating Agent Pair

To do evaluation, we need a pair of agents, or an AgentPair. We can directly load a pair of agents for evaluation, which we can do with the load_agent_pair function, or we can create an AgentPair manually from 2 separate RllibAgent instance. To directly load an AgentPair from a trainer:

To create an AgentPair manually, we can just pair together any 2 RllibAgent object. For example, we have created a **ppo_agent** and a **bc_agent**. To pair them up, we can just construct an AgentPair with them as arguments.

In [5]:
from human_aware_rl.rllib.rllib import AgentPair
ap_bc = AgentPair(bc_agent0, bc_agent1)
ap_bc

<overcooked_ai_py.agents.agent.AgentPair at 0x7f743e8c9330>

# 2): Evaluating AgentPair

To evaluate an AgentPair, we need to first create an AgentEvaluator. You can create an AgentEvaluator in various ways, but the simpliest way to do so is from the layout_name. 

You can modify the settings of the layout by changing the **mdp_params** argument, but most of the time you should only need to include "layout_name", which is the layout you want to evaluate the agent pair on, and "old_dynamics", which determines whether the envrionment conforms to the design in the Neurips2019 paper, or whether the cooking should start automatically when all ingredients are present.  

For the **env_params**, you can change how many steps are there in one evaluation. The default is 400, which means the game runs for 400 timesteps. 

In [6]:
from overcooked_ai_py.agents.benchmarking import AgentEvaluator
# Here we create an evaluator for the cramped_room layout
layout = "cramped_room"
ae = AgentEvaluator.from_layout_name(mdp_params={"layout_name": layout, "old_dynamics": True}, 
                                     env_params={"horizon": 400})
ae

<overcooked_ai_py.agents.benchmarking.AgentEvaluator at 0x7f743e62efe0>

To run evaluations, we can use the evaluate_agent_pair method associated with the AgentEvaluator:

In [7]:
# ap: The AgentPair we created earlier
# 10: how many times we should run the evaluation since the policy is stochastic
trajs = ae.evaluate_agent_pair(ap_bc, 10)
trajs

Avg rew: 58.00 (std: 24.41, se: 7.72); avg len: 400.00; : 100%|██████████| 10/10 [06:57<00:00, 41.80s/it]


{'ep_actions': array([[((0, 0), (0, 0)), ((0, 0), (0, 0)), ((0, 1), (0, 0)), ...,
         ((1, 0), (0, 0)), ((0, -1), (0, 0)), ((0, 0), (0, 0))],
        [((0, 0), (0, 0)), ((0, 0), (0, 0)), ((0, 0), (0, 0)), ...,
         ((0, -1), (0, 0)), ('interact', (0, 0)), ((1, 0), (0, 0))],
        [((0, 0), (0, 0)), ((0, 0), (0, 0)), ((0, -1), (0, 0)), ...,
         ((-1, 0), (0, 0)), ((0, 0), (0, 0)), ((0, 0), (0, 0))],
        ...,
        [((-1, 0), (0, 0)), ((0, 0), (0, 0)), ((0, 0), (0, 0)), ...,
         ((0, 0), (0, 0)), ((0, 0), (0, 0)), ((0, 0), (0, 0))],
        [((0, 0), (0, 0)), ((0, 0), (0, 0)), ((0, 0), (0, 0)), ...,
         ((0, 0), (0, 0)), ((0, 0), (0, 0)), ((0, 0), (0, 0))],
        [((0, 0), (0, 0)), ((0, 0), (0, 0)), ((0, 0), (0, 0)), ...,
         ((0, 0), (0, 0)), ((0, 0), (0, 0)), ((0, 0), (0, 0))]],
       dtype=object),
 'metadatas': {},
 'ep_infos': array([[{'agent_infos': [{'action_probs': array([[0.05297125, 0.00369196, 0.00564479, 0.01859784, 0.9077534 ,
        

The result returned by the AgentEvaluator contains detailed information about the evaluation runs, including actions taken by each agent at each timestep. Usually you don't need to directly interact with them, but the most direct performance measures can be retrieved with result["ep_returns"], which returns the average sparse reward of each evaluation run

In [15]:
trajs["ep_returns"]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [None]:
result = ae.evaluate_agent_pair(ap_sp, 1, 400)

# 3): Visualization

We can also visualize the trajectories of agents. One way is to run the web demo with the agents you choose, and the specific instructions can be found in the [overcooked_demo](https://github.com/HumanCompatibleAI/overcooked_ai/tree/master/src/overcooked_demo) module, which requires some setup. Another simpler way is to use the StateVisualizer, which uses the information returned by the AgentEvaluator to create a simple dynamic visualization. You can checkout [this Colab Notebook](https://colab.research.google.com/drive/1AAVP2P-QQhbx6WTOnIG54NXLXFbO7y6n#scrollTo=6Xlu54MkiXCR) that let you play with fixed agents

In [None]:
from overcooked_ai_py.visualization.state_visualizer import StateVisualizer
StateVisualizer().display_rendered_trajectory(trajs, ipython_display=True)

This should spawn a window where you can see what the agents are doing at each timestep. You can drag the slider to go forward and backward in time.