## Introduction to overcooked_ai

Overcooked-AI is a benchmark environment for fully cooperative multi-agent performance, based on the wildly popular video game [Overcooked](http://www.ghosttowngames.com/overcooked/). 

The goal of the game is to deliver soups as fast as possible. Each soup requires placing up to 3 ingredients in a pot, waiting for the soup to cook, and then having an agent pick up the soup and delivering it. The agents should split up tasks on the fly and coordinate effectively in order to achieve high reward.

You can **try out the game [here](https://humancompatibleai.github.io/overcooked-demo/)** (playing with some previously trained DRL agents). To play with your own trained agents using this interface, you can use [this repo](https://github.com/HumanCompatibleAI/overcooked-demo). To run human-AI experiments, check out [this repo](https://github.com/HumanCompatibleAI/overcooked-hAI-exp). You can find some human-human gameplay data already collected [here](https://github.com/HumanCompatibleAI/human_aware_rl/tree/master/human_aware_rl/data/human/anonymized).
The agent evaluator is an object used to evaluate different agents.

Check out [this repo](https://github.com/HumanCompatibleAI/human_aware_rl) for the DRL implementations compatible with the environment and reproducible results to our paper: *[On the Utility of Learning about Humans for Human-AI Coordination](https://arxiv.org/abs/1910.05789)* (also see our [blog post](https://bair.berkeley.edu/blog/2019/10/21/coordination/)).

# Setup
Run cell below only if you did not installed overcooked_ai yet (e.g. when using this notebook in google collab) to install newest version of overcooked_ai from github repository.

In [None]:
!pip install --progress-bar off git+https://github.com/HumanCompatibleAI/overcooked_ai.git

Collecting git+https://github.com/HumanCompatibleAI/overcooked_ai.git
  Cloning https://github.com/HumanCompatibleAI/overcooked_ai.git to /tmp/pip-req-build-eb_ycj9x
  Running command git clone -q https://github.com/HumanCompatibleAI/overcooked_ai.git /tmp/pip-req-build-eb_ycj9x
Building wheels for collected packages: overcooked-ai
  Building wheel for overcooked-ai (setup.py) ... [?25l[?25hdone
  Created wheel for overcooked-ai: filename=overcooked_ai-1.0.4-cp36-none-any.whl size=2394778 sha256=bb8ff8ce0a6e0ac83780f4e281b4a6f108dd126ad3a158ca30a74d3a7f261711
  Stored in directory: /tmp/pip-ephem-wheel-cache-mfoer51s/wheels/2a/7f/1d/7cb1dc49cbbf5b9a8e462507c7c30499ba82a4436e5aa12ced
Successfully built overcooked-ai
Installing collected packages: overcooked-ai
Successfully installed overcooked-ai-1.0.4


In [None]:
# all imports used in this tutorial, run this if you want to jump to different sections and run only selected cells
import numpy as np
from overcooked_ai_py.mdp.actions import Action, Direction
from overcooked_ai_py.agents.agent import Agent, AgentPair, StayAgent
from overcooked_ai_py.agents.benchmarking import AgentEvaluator, LayoutGenerator

## Agent evaluator introduction
Most easy way to start using overcooked_ai is to use agent evaluator object that lets you to run agents on the choosen layouts. 

In [None]:
from overcooked_ai_py.agents.benchmarking import AgentEvaluator, LayoutGenerator
mdp_gen_params = {"layout_name": 'cramped_room'}
mdp_fn = LayoutGenerator.mdp_gen_fn_from_dict(mdp_gen_params)
env_params = {"horizon": 1000}
agent_eval = AgentEvaluator(env_params=env_params, mdp_fn=mdp_fn)

To create agent evaluator you need to supply 2 parameters: `mdp_fn` and `env_params`.  
`mdp_fn` is function that returns OvercookedGridworld object that resolves interactions of agents with environemnt. The quickest method to create valid `mdp_fn` is to supply dict with layout name to `LayoutGenerator.mdp_gen_fn_from_dict`. More on generation of layouts later.  
`env_params` is a dict with additional options. Most imporant thing to supply here is `horizon` key that indicates how many timesteps will be made in each episode.

The central method of the AgentEvaluator object is evaluate_agent_pair that runs 2 agents on the chosen layout.
Other methods can call evaluate_agent_pair method with preexisting agents. Let's run random agents 5 times and see the results.

In [None]:
# does random actions
trajectory_random_pair = agent_eval.evaluate_random_pair(num_games=5)
print("Random pair rewards", trajectory_random_pair["ep_returns"])

  0%|          | 0/5 [00:00<?, ?it/s]

Recomputing motion planner due to: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/overcooked_ai_py/data/planners/cramped_room_mp.pkl'
Computing MotionPlanner to be saved in /usr/local/lib/python3.6/dist-packages/overcooked_ai_py/data/planners/cramped_room_mp.pkl
It took 0.04822421073913574 seconds to create mp


Avg rew: 0.00 (std: 0.00, se: 0.00); avg len: 1000.00; : 100%|██████████| 5/5 [00:01<00:00,  3.07it/s]

Skipping trajectory consistency checking because MDP was recognized as variable. Trajectory consistency checking is not yet supported for variable MDPs.
Random pair rewards [0 0 0 0 0]





## Custom layouts
Besides premade layouts found in the [layout directory](https://github.com/HumanCompatibleAI/overcooked_ai/tree/master/src/overcooked_ai_py/data/layouts) you can create your own layouts to run agents on. Lets first look at example layout:
```
{
    "grid":  """XXXPPXXX
                X  2   X
                D XXXX S
                X  1   X
                XXXOOXXX""",
    "start_order_list": None,
    "cook_time": 20,
    "num_items_for_soup": 3,
    "delivery_reward": 20,
    "rew_shaping_params": None
}
```

Layout territory is defined by grid. Every character is one tile. Available tiles are:
- empty space - ' '
- counter - 'X'
- onion dispenser - 'O'
- tomato dispenser - 'T'
- pot (place where players cook soup from onions and tomatoes) - 'P' 
- dish dispenser - 'D '
- serving location - 'S'
- player starting location - number  
  
You can save layout in ovecooked_ai/overcooked_ai_py/data/layouts directory and then run agent evaluator AgentEvaluator({"layout_name": layout_name}) where layout_name is filename without `.layout` extension.  
You can also generate random, but valid grids in automated way. Lets create one and run agents on it.

In [None]:
mdp_gen_params = {"inner_shape": (7,7),
                "prop_empty":0.2, # proportion of empty space in generated layout
                "prop_feats":0.8, # proportion of counters with features on them
                "display": False,
                "start_all_orders": # list of recipes that can be delived
                   [{ "ingredients" : ["onion", "onion", "onion"]},
                    { "ingredients" : ["onion", "onion"]},
                    { "ingredients" : ["onion"]}],
                # (optional param) reward for delivering recipes (for every recipe in start_all_orders)
                "recipe_values" : [20, 9, 4], 
                # (optional param) cooking time of recipes (for every recipe in start_all_orders)
                "recipe_times" : [20, 15, 10]
                 }

env_params =  {"horizon": 500}

mdp_fn = LayoutGenerator.mdp_gen_fn_from_dict(mdp_gen_params, outer_shape=(7, 7))
agent_eval = AgentEvaluator(env_params=env_params, mdp_fn=mdp_fn)

trajectory_random_pair = agent_eval.evaluate_random_pair(num_games=10)
print("Random pair rewards", trajectory_random_pair["ep_returns"])

def pretty_grid(grid):
    return "\n".join("".join(line) for line in grid)

print("\nGenerated grid:\n" + pretty_grid(trajectory_random_pair["mdp_params"][0]["terrain"]))

  0%|          | 0/10 [00:00<?, ?it/s]

Recomputing motion planner due to: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/overcooked_ai_py/data/planners/XXPPPSX|O   1 X|X     P|P   D2O|P    DX|D XO  S|XSXXOPX_mp.pkl'
Computing MotionPlanner to be saved in /usr/local/lib/python3.6/dist-packages/overcooked_ai_py/data/planners/XXPPPSX|O   1 X|X     P|P   D2O|P    DX|D XO  S|XSXXOPX_mp.pkl


Avg rew: 0.00 (std: 0.00, se: 0.00); avg len: 500.00; :  10%|█         | 1/10 [00:01<00:13,  1.47s/it]

It took 1.2998719215393066 seconds to create mp


Avg rew: 0.40 (std: 1.20, se: 0.38); avg len: 500.00; : 100%|██████████| 10/10 [00:03<00:00,  3.10it/s]

Skipping trajectory consistency checking because MDP was recognized as variable. Trajectory consistency checking is not yet supported for variable MDPs.
Random pair rewards [0 0 0 0 4 0 0 0 0 0]

Generated grid:
XXPPPSX
O     X
X     P
P   D O
P    DX
D XO  S
XSXXOPX





## Custom agents
We can also run own custom agents to see how they are would work. Lets re-create agent doing random actions on out own.

In [None]:
import numpy as np
from overcooked_ai_py.mdp.actions import Action, Direction
from overcooked_ai_py.agents.agent import Agent, AgentPair

class CustomRandomAgent(Agent):
    """
    An agent that randomly picks motion actions.
    NOTE: Does not perform interact actions, unless specified
    """   
    def action(self, state):
        action_probs = np.zeros(Action.NUM_ACTIONS)
        legal_actions = Action.ALL_ACTIONS
        legal_actions_indices = np.array([Action.ACTION_TO_INDEX[motion_a] for motion_a in legal_actions])
        action_probs[legal_actions_indices] = 1 / len(legal_actions_indices)
        return Action.sample(action_probs), {"action_probs": action_probs}

    def actions(self, states, agent_indices):
        return [self.action(state) for state in states]


agent_pair = AgentPair(CustomRandomAgent(), CustomRandomAgent())
mdp_gen_params = {"layout_name": 'cramped_room'}
mdp_fn = LayoutGenerator.mdp_gen_fn_from_dict(mdp_gen_params)
env_params = {"horizon": 1000}
agent_eval = AgentEvaluator(env_params=env_params, mdp_fn=mdp_fn)
trajectory_custom_random_pair = agent_eval.evaluate_agent_pair(agent_pair, num_games=4)
print("Custom random pair rewards", trajectory_custom_random_pair["ep_returns"])

Avg rew: 10.00 (std: 10.00, se: 5.00); avg len: 1000.00; : 100%|██████████| 4/4 [00:01<00:00,  2.87it/s]

Skipping trajectory consistency checking because MDP was recognized as variable. Trajectory consistency checking is not yet supported for variable MDPs.
Custom random pair rewards [20 20  0  0]





CustomRandomAgent is lightweight version of RandomAgent from overcooked_ai_py.agents.agent module. ```trajectory_custom_random_pair = agent_eval.evaluate_agent_pair(agent_pair)``` have same effect as ```agent_eval.evaluate_random_pair()```.

## Single player variant
If you want to make single player variant you need to set one of the agents to stay and do nothing. `StayAgent` is a such agent. It is good to take choose layout where every player is not blocking crucial path to resource e.g. only onion dispenser on the layout.

In [None]:
from overcooked_ai_py.agents.agent import StayAgent, RandomAgent
mdp_gen_params = {"layout_name": 'five_by_five'}
mdp_fn = LayoutGenerator.mdp_gen_fn_from_dict(mdp_gen_params)
env_params = {"horizon": 500}
agent_eval = AgentEvaluator(env_params=env_params, mdp_fn=mdp_fn)
single_agent_pair = AgentPair(RandomAgent(all_actions=True), StayAgent())

trajectory_single_agent = agent_eval.evaluate_agent_pair(single_agent_pair, num_games=10)
print("single agent rewards", trajectory_single_agent["ep_returns"])


Avg rew: 0.00 (std: 0.00, se: 0.00); avg len: 500.00; :  10%|█         | 1/10 [00:00<00:01,  5.44it/s]

Recomputing motion planner due to: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/overcooked_ai_py/data/planners/five_by_five_mp.pkl'
Computing MotionPlanner to be saved in /usr/local/lib/python3.6/dist-packages/overcooked_ai_py/data/planners/five_by_five_mp.pkl
It took 0.08415436744689941 seconds to create mp


Avg rew: 0.00 (std: 0.00, se: 0.00); avg len: 500.00; : 100%|██████████| 10/10 [00:01<00:00,  7.59it/s]

Skipping trajectory consistency checking because MDP was recognized as variable. Trajectory consistency checking is not yet supported for variable MDPs.
single agent rewards [0 0 0 0 0 0 0 0 0 0]



