# HDDLGym Tutorial
Welcome to a walkthrough of HDDLGym system. This notebook provides the guide from create and modifying HDDL domains to run and train RL policy for hierarchical planning problem in multi-agent settings. 

Table of Contents
1. [Modify HDDL files to adapt to the HDDLGym system](#1-Modify-HDDL-files-to-adapt-to-the-HDDLGym-system)

2. [Test HDDL domain and problem with random exploration](#2-Test-HDDL-domain-and-problem-files-with-random-exploration)

3. [Design a RL policy](#3-Design-a-RL-policy)

4. [Train RL policy](#4-Train-RL-policy)

5. [Deploy RL policy](#5-Deploy-RL-policy)

6. [Extra - Overcooked Policy Visualization Tutorial](#Overcooked-Policy-Visualization-Tutorial)

# Install dependencies
For majority of features of HDDLGym, we need general python packages such as gym, torch, tensorboard, etc. 

Additionally, for required packages to run demo such as Overcooked, please look at _requirements.txt_.

On the other hand, you can use Docker to run the code. In this case, please make sure to adjust the command in _Dockerfile_ to match your purpose.  

[back to top](#HDDLGym-Tutorial)

In [1]:
!pip install gym
!pip install gymnasium
!pip install torch
!pip install tensorboard

[0m

# 1 Modify HDDL files to adapt to the HDDLGym system
\(with an example of Transport domain\)
As discussed in the paper, several modications are required before applying HDDL domain and problem files to the HDDLGym system.

- Identify agent:
    
    Identify and explicitly define "agent" as a type or a supertype.

    For example, in Transport domain, 'vehicle' can be classified as an 'agent'.

    If the domain has no type that can be classified as an agent, simply include 'agent' in the types block.

- Add agent to parameters of agent-related actions:

    For _:action_ blocks that defined agent-related actions, please ensure to include 'agent' or agent's type in its parameters.

- Add effects to tasks

    Each _:task_ block are required to include _effects_. 

- Add none-action to the domain:

    Please ensure there is a block of none-action in the domain file. If there is similar action (but different name), then replace it with the none-action. The none-action block should be:


    (:action none 

>:parameters (?agent - agent) 

>:precondition () 

>:effect ()
    )

### Checking if HDDL domain files meet HDDLGym format requirements:
Use the following code to check your the HDDL domain files

[back to top](#HDDLGym-Tutorial)

In [43]:
# Some helper functions 
import re
import os
def check_none_action_defined_correctly(file_path, with_none_name=True):
    '''
    check if in the HDDL file (file_path), the none action is defined correctly
    '''
    temp_file = file_path.split('.hddl')[0] + '_temp.hddl'
    remove_comments_from_pddl(file_path,temp_file)
    if with_none_name:
        # Define a regex pattern to match the 'none' action with strict conditions
        pattern = re.compile(
            r"\(:action\s+none\s*"  # Match the action definition
            r":parameters\s*\(\?\w+\s*-\s*agent\)\s*"  # Match any parameter name before "- agent"
            r":precondition\s*\(\)\s*"  # Match empty precondition
            r":effect\s*\(\)\s*"  # Match empty effect
            r"\)",  # Closing parenthesis
            re.MULTILINE
        )
    else:
        pattern = re.compile(
            r"\(:action\s+([^\s]+)\s*"  # Capture action name (group 1)
            r":parameters\s*\(\?\w+\s*-\s*agent\)\s*"  # Match any parameter name before "- agent"
            r":precondition\s*\(\)\s*"  # Match empty precondition
            r":effect\s*\(\)\s*"  # Match empty effect
            r"\)",  # Closing parenthesis
            re.MULTILINE
        )

    # Read the PDDL file
    with open(temp_file, "r") as file:
        content = file.read()
    # remove temp_file:
    if os.path.exists(temp_file):
        os.remove(temp_file)
    # Check if the pattern exists in the HDDL file
    if with_none_name:
        if pattern.search(content):
            # print("The action `none` is correctly defined in the HDDL file.")
            return True
        else:
            # print("The action `none` is missing or incorrectly formatted.")
            return False
    else:
        matches = pattern.findall(content)
        if matches:
            ans_str = "Found similar action(s) to none-action but with different name:\n"
            for action in matches:
                ans_str += f" {action}\n"
                if len(matches) == 1:
                    ans_str += f"---> Suggest change its name to none: '{action}' ==> 'none'"
            if len(matches) > 1:
                ans_str += "---> Suggest combine these similar actions into one action and name it 'none'."
            return ans_str
        else:
            return "No similar action to none-action"


In [46]:
from HDDL_files.modify_hddl_domain_for_HDDLGym import check_hddl_format, parse_types, is_agent_in_hierarchy,find_actions_without_agent
import os
from pathlib import Path
script_dir = Path(os.getcwd())
# TO DO: Modify the domain_file_dir to check if your HDDL domain file meets requirements
domain_file = str(script_dir / "HDDL_files/test_input_modify_overcooked.hddl")
problem_file = str(script_dir / "HDDL_files/Overcooked_specialization/overcooked_short_prob2.hddl")
# End To Do

# Assign directory for the valid domain file:

# Remove all comments before checking the format:
domain_file_no_comments = domain_file.split('.hddl')[0] + '_no_comments.hddl'
problem_file_no_comments = problem_file.split('.hddl')[0] + '_no_comments.hddl'
remove_comments_from_pddl(domain_file, domain_file_no_comments)
remove_comments_from_pddl(problem_file, problem_file_no_comments)

# Check if the domain file has 'agent' in the type:
print("*** Checking if the domain file has 'agent' in the type block ***")
type_hierarchy = parse_types(domain_file_no_comments)
if type_hierarchy:
    is_agent_in_hierarchy(type_hierarchy)

# Agent-related actions and environment actions:
print("\n***Checking if agent is in parameters of actions ***")
no_agent_actions = find_actions_without_agent(domain_file_no_comments)
print("No agent actions: ", no_agent_actions)
if len(no_agent_actions):
    print("---> Suggest adding '?agent - agent' in the parameter if any action above is not an environment action!")

# Check None-action:
print("\n*** Checking none-action ***")
if check_none_action_defined_correctly(domain_file_no_comments, with_none_name = True):
    print("The none action is correctly defined.")
else:
    print("No none-action is defined as required.")
    error = check_none_action_defined_correctly(domain_file_no_comments, with_none_name=False)
    print(error)

# Check if the domain file has correct number of parentheses, actions, tasks, methods with correct format:
print("\n*** Checking general format of action, task, and method blocks ***")
check_hddl_format(domain_file_no_comments)



*** Checking if the domain file has 'agent' in the type block ***
'agent' is a type or supertype.

***Checking if agent is in parameters of actions ***
No agent actions:  ['cooking', 'a-complete-cooking']
---> Suggest adding '?agent - agent' in the parameter if any action above is not an environment action

*** Checking none-action ***
No none-action is defined as required.
Found similar action(s) to none-action but with different name:
 noop
---> Suggest change its name to none: 'noop' ==> 'none'

*** Checking general format of action, task, and method blocks ***
HDDL file format is correct.


# 2 Test HDDL domain and problem files with random exploration

To test if the HDDLGym Planner working with the HDDL domain and problem files, we run the environment with random exploration

[back to top](#HDDLGym-Tutorial)

In [8]:
from learning_methods import evaluate_policy
from hddl_env import HDDLEnv
from HDDL_files.modify_hddl_domain_for_HDDLGym import remove_comments_from_pddl
from learning_methods import PPO_discrete
import torch

# TO DO: replace the domain_file and problem_file below with your files    
domain_file = "HDDL_files/Overcooked_specialization/overcooked_short_domain.hddl"
problem_file = "HDDL_files/Overcooked_specialization/overcooked_short_prob2.hddl"
# End TO DO

class Parameters():
    # Mimicking the arguments of parse_arguments for main_train.py
    def __init__(self, domain, problem):
        self.dvc = 'cpu' # running device: cuda or cpu
        self.domain = domain # directory to domain file
        self.problem = problem # directory to problem file
        self.state_dim = None # state dimension
        self.action_dim = None # action dimension
        self.use_central_planner = False
        self.planner_time_limit = 5
        self.write = False
        self.Loadmodel = False
        self.debug = False
        # related to training:
        self.net_width = 64
        self.lr = 1e-4
        self.T_horizon = 2048
        self.seed = 502
        self.activation = 'tanh'
        


domain_file_no_comments = domain_file.split('.hddl')[0] + '_no_comments.hddl'
problem_file_no_comments = problem_file.split('.hddl')[0] + '_no_comments.hddl'
remove_comments_from_pddl(domain_file, domain_file_no_comments)
remove_comments_from_pddl(problem_file, problem_file_no_comments)
opt = Parameters(domain=domain_file_no_comments, problem=problem_file_no_comments)


class RandomPolicy():
    def __init__(self, state_dim, action_dim):
        self.state_dim = state_dim
        self.action_dim = action_dim

    def select_action(self, s, deterministic=False, value=False):
        # regardless deterministic or not, it is still a random process
        probabilities = torch.rand(self.action_dim)  # Generate random values
        probabilities /= probabilities.sum()  # Normalize to sum to 1
        if value:
            return probabilities
        else:
            return argmax


###
env = HDDLEnv(opt.domain, opt.problem)
opt.state_dim = env.observation_space.n
opt.action_dim = env.action_space.n
# Two way to do random exploration:
# 1. Use PPO_discrete when it is just randomly initiated, when the weights are not trained.
# policy = PPO_discrete(**vars(opt)).to(opt.dvc)
# 2. to create a RandomPolicy as follow:
policy = RandomPolicy(opt.state_dim, opt.action_dim) # random policy defined above
score = evaluate_policy(env, policy, turns = 1, opt=opt)
print('SCORE: ', score)


start evaluating policy ...
action_dict {'chef1': 'none chef1', 'chef2': 'none chef2'}
action_dict {'chef1': 'none chef1', 'chef2': 'a-interact chef2 onion-pile empty onion onion onion'}
action_dict {'chef1': 'none chef1', 'chef2': 'none chef2'}
action_dict {'chef1': 'none chef1', 'chef2': 'none chef2'}
action_dict {'chef1': 'none chef1', 'chef2': 'a-interact chef2 pot1 onion empty empty onion-1'}
action_dict {'chef1': 'a-interact chef1 pot1 empty empty onion-1 cooking-soup-1-onion', 'chef2': 'none chef2'}
action_dict {'chef1': 'a-interact chef1 bowl-pile empty bowl bowl bowl', 'chef2': 'none chef2'}
action_dict {'chef1': 'wait chef1 pot1 cooking-soup-1-onion-stage3 cooking-soup-1-onion-stage4', 'chef2': 'none chef2'}
action_dict {'chef1': 'none chef1', 'chef2': 'none chef2'}
action_dict {'chef1': 'none chef1', 'chef2': 'none chef2'}
action_dict {'chef1': 'none chef1', 'chef2': 'none chef2'}
action_dict {'chef1': 'none chef1', 'chef2': 'none chef2'}
action_dict {'chef1': 'none chef1', 

# 3 Design a RL policy

Designing a RL policy is a feature that the HDDLGym would like to support. The current version of HDDLGym offer PPO_discrete policy. Its input (observation) is a one-hot coded array of dynamic grounded predicates, lifted operators and objects of all agents' previous action hierarchies.

Navigating to the file **learning_methods.py**, users can modify contents of functions in the file to alternate the observation and action space, design of RL models, training methods, evaluating methods, ect., as long as the function names remain the same.

Following is some simple examples of modifying functions in learning_methods.py to change different components of the design of RL policy.

**RL model**

To change the RL model, make sure the class of new model, please ensure to modify the following function accordingly:
- For just deploying the policy (no training): \_\_init__, select_action
- For both deploying and training: \_\_init__, select_action, to, train, put_data, save, and load

[back to top](#HDDLGym-Tutorial)

In [48]:
class RandomPolicy():
    '''Notes: 
    This Random Policy is just for deploying the HDDLGym Planner with random exploration
    Therefore, only implement __init__ and select_action method.
    Other RL policies for require training like PPO_discrete should also implement other methods with similar features as functions in PPO_discrete 
    (such as train, put_data, save, and load)
    '''
    def __init__(self, state_dim, action_dim):
        self.state_dim = state_dim
        self.action_dim = action_dim

    def select_action(self, s, deterministic=False, value=False):
        '''select action
        input:
        - s: an array representing the observation, or input for the policy
        - deterministic: boolen, whether to choose action with the max value or
                        to randomly choose it with weights are value of the policy
        - value: boolean, whether to return the value of policy (probability list)
        output:
        - either probabilities or index of action with max probability (or value), depending on the value flag.
        '''
        # regardless deterministic or not, it is still a random process
        probabilities = torch.rand(self.action_dim)  # Generate random values
        probabilities /= probabilities.sum()  # Normalize to sum to 1
        if value:
            return probabilities
        else:
            max_index = probabilities.index(max(probabilities))
            return max_index
    

**Observation-Action Spaces**

To change the observation and action spaces, modifying the contents of functions:
- *get_observation_space*: specify what elements from the env_dictionary are in the observation. Here are some examples:
  - \[default\] grounded dynamic predicates + lifted operators + related objects
  - grounded dynamic predicates + lifted operators
  - grounded predicates
- *get_action_space*: specify what elements of the env_dictionary are in the action. Some examples are:
  - \[default\] lifted operators + related objects
  - lifted operators
  - lifted method + lifted task + grounded action
- *get_prob_from_prob_list*: get the probability of a grounded operator from the output of the policy (probability array)
  - Depend on how the action space is designed, the probability of a grounded operator (g_oper) is calculated using the probabilities of elements that are relevant to the grounded operator (g_oper). For example:
    - \[default\] logprob of g_oper =  mean(logprob of lifted operator of g_oper + logprob of objects in g_oper))
    - if action space is grounded operators, then prob of g_oper can be directly get from the corresponding value in the probability array.

- *get_observation_one_hot_vector*: generate observation array for the policy from the information of the current state (including current grounded predicates in the env, previous action or action hierarchy of agents). This function is modified according to how the observation space is defined.

*Belows are default functions in learning_methods.py that are relevant to desgining observation-action spaces.*

In [None]:
# Change observation and action space:
def get_observation_space(env_dictionary):
    '''
    This function return the size of observation space (or state space), which is also the output size of the RL policy
    input:
    - env_dictionary: environment dictionary, result of parsing the HDDL domain and problem file
    output:
    - gym.spaces.Discrete(n), with n is the len of observation
    Note: 
    - in this approach, the observation space involves grounded dynamic predicate list, lifted action, and object list
    '''
    observation_space = gym.spaces.Discrete(n=len(env_dictionary['grounded dynamic predicate list'])+\
          len(env_dictionary['lifted operators list'])+len(env_dictionary['objects list']))

    # Replace the above line of code by the below line, if just don't want to have objects list in observation:
    # observation_space = gym.spaces.Discrete(n=len(env_dictionary['grounded dynamic predicate list'])+\
          # len(env_dictionary['lifted operators list']))
    return observation_space

def get_action_space(env_dictionary):
    '''
    This function return the size of action space, which is also the output size of the RL policy
    input:
    - env_dictionary: environment dictionary, result of parsing the HDDL domain and problem file
    output:
    - gym.spaces.Discrete(n), with n is the len of action
    Note: 
    - in this approach, the action space involves lifted action and object list
    '''
    action_space = gym.spaces.Discrete(n=len(env_dictionary['lifted operators list'])+len(env_dictionary['objects list']))
    # Use the below line of code instead if don't want include objects in action space
    # action_space = gym.spaces.Discrete(n=len(env_dictionary['lifted operators list']))
    return action_space

def get_prob_from_prob_list(operator_str, prob_list, env_dictionary,device='cpu'):
  '''This function provide probability value (prob) for the operator_str according to probabilities in the prob_list
  inputs:
  - operator_str: a string of operator_name + objects
  - prob_list: a tensor, listing prob of lifted operators and objects
  - env_dictionary: dictionary of environment's elements
  outpus:
  - prob: a float number of the probability of the operator_str
  '''
  oper_name, oper_obj = extract_object_list(operator_str)
  oper_lifted_id = None
  for lifted_ind, lifted_oper in enumerate(env_dictionary['lifted operators list']):
    if lifted_oper.name == oper_name:
      oper_lifted_id = lifted_ind
  assert oper_lifted_id != None, "Could not find lifted operators of oper {}".format(operator_str)
  indices_list = [oper_lifted_id]
  for obj in oper_obj:
    indices_list.append(env_dictionary['objects list'].index(obj) + len(env_dictionary['lifted operators list']))

  relevant_prob_tensor = prob_list[indices_list]
  prob = torch.exp(torch.div(torch.sum(torch.log(relevant_prob_tensor)),len(indices_list)+1e-20))
  return prob


def get_observation_one_hot_vector(current_state, all_operators, env_dictionary, device=False):
    '''Get observation from current state and operators from action hierarchies
    inputs:
    - current_state: list of predicate
    - all_operators: list of all operators
    - env_dictionary: environment dictionary
    - device: the device (CPU/GPU) to place the tensors on
    output:
    - observation: PyTorch tensor on the specified device
    '''
    if not device:
      device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    state_num = enumerate_list(current_state, env_dictionary['grounded dynamic predicate list'])
    state_num_tensor = torch.tensor(state_num, device=device, dtype=torch.float32)
    lifted_operators_id, objects_id = convert_grounded_to_lifted_index(
        all_operators, env_dictionary['lifted operators list'], env_dictionary['objects list']
    )

    lifted_operator_matrix = enumerate_index_to_binary_matrix(
        lifted_operators_id, array_len=len(env_dictionary['lifted operators list']), device=device
    )

    object_matrix = enumerate_index_to_binary_matrix(
        objects_id, array_len=len(env_dictionary['objects list']), device=device
    )

    observation = torch.cat([state_num_tensor, lifted_operator_matrix, object_matrix])

    return observation


# 4 Train RL policy

For the default policy, we offer PPO_discrete, which is defined in **learning_methods.py**. To train the policy, simply call **main_train.py**:

    python main_train.py

There are parameters that users can either call with the command or modify in the file **main_train.py**. The common parameters are _domain_, _problem_ (file directions), and _dvc_ (training device option):

    python main_train.py --domain <path/to/domain/file> --problem <path/to/problem/file> --dvc cpu

### Here are optional arguments:
| Argument | Parameter (opt.[])     | Description |
|-----------------|-----------------|-------------|
| `-h, --help`       |              | Show this help message and exit |
| `--dvc` | `DVC`                       | Running device: `cuda` or `cpu` |
| `--domain `|`DOMAIN`                 | Which domain HDDL file to load? |
| `--problem `|`PROBLEM`               | Which problem HDDL file to load? |
| `--write `|`WRITE`                   | Use SummaryWriter to record the training |
| `--render `|`RENDER`                 | Render or Not |
| `--Loadmodel `|`LOADMODEL`           | Load pretrained model or Not |
| `--Model `|`MODEL`                   | Which model to load |
| `--seed `|`SEED`                     | Random seed |
| `--T_horizon `|`T_HORIZON`           | Length of long trajectory |
| `--Max_train_steps `|`MAX_TRAIN_STEPS` | Max training steps |
| `--save_interval `|`SAVE_INTERVAL`   | Model saving interval, in steps |
| `--eval_interval `|`EVAL_INTERVAL`   | Model evaluating interval, in steps |
| `--planner_time_limit `|`PLANNER_TIME_LIMIT` | The time limit (in seconds) for running the planner for each agent at each step |
| `--gamma `|`GAMMA`                   | Discounted Factor |
| `--lambd `|`LAMBD`                   | GAE Factor |
| `--clip_rate `|`CLIP_RATE`           | PPO Clip rate |
| `--K_epochs `|`K_EPOCHS`             | PPO update times |
| `--net_width `|`NET_WIDTH`           | Hidden net width |
| `--lr `|`LR`                         | Learning rate |
| `--l2_reg `|`L2_REG`                 | L2 regularization coefficient for Critic |
| `--batch_size `|`BATCH_SIZE`         | Length of sliced trajectory |
| `--entropy_coef `|`ENTROPY_COEF`     | Entropy coefficient of Actor |
| `--entropy_coef_decay `|`ENTROPY_COEF_DECAY` | Decay rate of `entropy_coef` |
| `--adv_normalization `|`ADV_NORMALIZATION` | Advantage normalization |
| `--max_episode `|`MAX_EPISODE`       | Max number of episodes in training |
| `--max_step `|`MAX_STEP`             | Max number of steps in each episode |
| `--epsilon `|`EPSILON`               | Epsilon |
| `--activation `|`ACTIVATION`         | Activation function for learning model |
| `--debug `|`DEBUG`                   | Debug mode or Not |
| `--use_central_planner `|`USE_CENTRAL_PLANNER` | Whether to run centralized planner for multi-agent planning (default: `False`) |


### Train custom RL policy:

The file **main_train.py** in the codebase in used to train the default policy, PPO_discrete. However, if you design different RL methods, such as using different action spaces (lifted operation only) and/or observation space (use grounded predicates instead of dynamic grounded predicates), please ensure the observation and action space are modified in **learning_methods.py**, and the step of enumerating state and action in **main_train.py** are adjusted accordingly.


[back to top](#HDDLGym-Tutorial)

In [11]:
# Call the default main_train.py:
!python3 main_train.py

Namespace(K_epochs=10, Loadmodel=False, Max_train_steps=50000000.0, Model=2, T_horizon=2048, activation='tanh', adv_normalization=False, batch_size=64, clip_rate=0.2, debug=False, domain='/home/nicole/MIT Dropbox/Ngoc La/PhD/Projects/HDDLGym_developing_rebuttal/HDDLGym_dev/HDDL_files/Overcooked_specialization/overcooked_short_domain.hddl', dvc=device(type='cpu'), entropy_coef=0.01, entropy_coef_decay=0.99, epsilon=1.0, eval_interval=1000.0, gamma=0.99, l2_reg=0, lambd=0.95, lr=0.0001, max_episode=50, max_step=25, net_width=64, planner_time_limit=5, problem='/home/nicole/MIT Dropbox/Ngoc La/PhD/Projects/HDDLGym_developing_rebuttal/HDDLGym_dev/HDDL_files/Overcooked_specialization/overcooked_short_prob2.hddl', render=False, save_interval=1000.0, seed=209, use_central_planner=False, write=False)
Namespace(K_epochs=10, Loadmodel=False, Max_train_steps=50000000.0, Model=2, T_horizon=2048, activation='tanh', adv_normalization=False, batch_size=64, clip_rate=0.2, debug=False, domain='/home/nic

# 5 Deploy RL policy
The section guides you through training a PPO model, running a rollout, and visualizing the results using a local server.
- Run the trained model and record the plan
- Visualize the hierarchical plan

[back to top](#HDDLGym-Tutorial)

In [18]:
# Step 1: Train the Model
# This will generate model files like ppo_actor1000.pth and ppo_critic1000.pth in the 'model/' directory
!python3 main_train.py

Namespace(K_epochs=10, Loadmodel=False, Max_train_steps=50000000.0, Model=2, T_horizon=2048, activation='tanh', adv_normalization=False, batch_size=64, clip_rate=0.2, debug=False, domain='/home/nicole/MIT Dropbox/Ngoc La/PhD/Projects/HDDLGym_developing_rebuttal/HDDLGym_dev/HDDL_files/Overcooked_specialization/overcooked_short_domain.hddl', dvc=device(type='cpu'), entropy_coef=0.01, entropy_coef_decay=0.99, epsilon=1.0, eval_interval=1000.0, gamma=0.99, l2_reg=0, lambd=0.95, lr=0.0001, max_episode=50, max_step=25, net_width=64, planner_time_limit=5, problem='/home/nicole/MIT Dropbox/Ngoc La/PhD/Projects/HDDLGym_developing_rebuttal/HDDLGym_dev/HDDL_files/Overcooked_specialization/overcooked_short_prob2.hddl', render=False, save_interval=1000.0, seed=209, use_central_planner=False, write=False)
Namespace(K_epochs=10, Loadmodel=False, Max_train_steps=50000000.0, Model=2, T_horizon=2048, activation='tanh', adv_normalization=False, batch_size=64, clip_rate=0.2, debug=False, domain='/home/nic

## 5.1: Run the HDDL Model
Specify the exact model you want to use for the rollout by setting `opt.model`.
For example, to use `ppo_actor2.pth` and `ppo_critic2.pth`:

[back to top](#HDDLGym-Tutorial)

In [12]:
# Run the rollout with the specified model
# Replace '2' with the desired model number
!python3 run_hddl_policy.py --Model 2

2140.00s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Namespace(K_epochs=10, Loadmodel=False, Max_train_steps=50000000.0, Model=2, T_horizon=2048, activation='tanh', adv_normalization=False, batch_size=64, clip_rate=0.2, debug=False, domain='/home/rumon/dev/HDDLGym_dev/HDDL_files/Overcooked_specialization/overcooked_short_domain.hddl', dvc=device(type='cpu'), entropy_coef=0.01, entropy_coef_decay=0.99, epsilon=1.0, eval_interval=1000.0, gamma=0.99, l2_reg=0, lambd=0.95, lr=0.0001, max_episode=50, max_step=25, net_width=64, planner_time_limit=5, problem='/home/rumon/dev/HDDLGym_dev/HDDL_files/Overcooked_specialization/overcooked_short_prob2.hddl', render=False, save_interval=1000.0, seed=209, use_central_planner=False, write=False)
  self.critic.load_state_dict(torch.load(str(script_dir / "model/ppo_critic{}.pth").format(episode)))
  self.actor.load_state_dict(torch.load(str(script_dir / "model/ppo_actor{}.pth").format(episode)))
start evaluating policy ...
action_dict {'chef1': 'none chef1', 'chef2': 'a-interact chef2 onion-pile empty oni

# 5.2: Launch the Local Server
The generated data will be dumped into `website/data.json`. To visualize the data, navigate to the website folder and start a local server:

In [11]:
# Launch the local server on port 8000
!python3 -m http.server 8000 --directory website


2123.23s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
127.0.0.1 - - [04/Feb/2025 14:00:44] "GET / HTTP/1.1" 304 -
127.0.0.1 - - [04/Feb/2025 14:00:44] "GET /styles.css HTTP/1.1" 304 -
127.0.0.1 - - [04/Feb/2025 14:00:44] "GET /script.js HTTP/1.1" 304 -
127.0.0.1 - - [04/Feb/2025 14:00:44] "GET /data.json HTTP/1.1" 200 -
^C

Keyboard interrupt received, exiting.


# 6 [Extra] 
# Overcooked Policy Visualization Tutorial

This section will guide you through the process of visualizing the Overcooked policy using Docker and Docker Compose. This has been tested on **Ubuntu 20.04**.

**Prerequisites:**  
- Docker and Docker Compose must be installed on your system.


#### Setting Up the Environment

**Step 1: Navigate to the Project Directory**

Open your terminal and navigate to the `overcooked_demo` directory:

```bash
cd ../overcooked/src/overcooked_demo
```

**Step 2: Create the `.env` File**

Create a `.env` file and add the following environment variables:

```bash
GITHUB_TOKEN=<YOUR-GITHUB-TOKEN>
REPO_URL=<GITHUB-URL-OF-THIS-REPO>
FLASK_SECRET_KEY=<FLASK-SECRET-KEY>
```

Example:

```bash
GITHUB_TOKEN=ghp_example1234567890
REPO_URL=github.com/HDDLGym/HDDLGym.git
FLASK_SECRET_KEY=myflasksecretkey123
```


#### Building and Running the Docker Containers

**Step 3: Navigate to the Server Directory**

```bash
cd ../overcooked_demo/server
```

**Step 4: Build the Docker Image**

Use the following command to build the Docker image:

```bash
sudo docker compose build
```

> ⚠️ **Note:** The build process may take over **1300 seconds** (~20 minutes), depending on your system.

**Step 5: Start the Docker Containers**

After the build is complete, start the containers:

```bash
sudo docker compose up
```


#### Visualizing the Overcooked Policy

Once the container is up and running:

1. Open your web browser.
2. Navigate to: [http://localhost/experiment](http://localhost/experiment)

You should now be able to view the **rollout/2D visualization** of the Overcooked policy.


#### Troubleshooting

- **Docker Build Fails:** Ensure Docker and Docker Compose are properly installed and updated.
- **Cannot Access the Visualization:** Confirm that the Docker container is running without errors and the `.env` file is correctly configured.

[back to top](#HDDLGym-Tutorial)