# Elevator Environment

In this Jupyter Notebook, we are going to learn how the elevator environment works.

## Setup

### Using Google Colab

If you are using Google Colab (and we encourage you to do so), please run the following code cell. If you are not using Google Colab, you can skip this code cell.

**Note**: The path `'/content/drive/'` cannot be changed. For example, if your assignment folder in Google Drive is located at `My Drive -> CSXX46A2`, you should specify the path as `'/content/drive/MyDrive/CSXX46A2'`.

In [1]:
from google.colab import drive
drive.mount('/content/drive/')

import sys
sys.path.append('/content/drive/MyDrive/CSXX46A2')

%cd /content/drive/MyDrive/CSXX46A2

Mounted at /content/drive/
/content/drive/MyDrive/CSXX46A2


### Installing Dependencies

The elevator task is implemented using the `PyRDDLGym` library. Before we begin, please install the following packages.

**Note**: If you are using Google Colab, you may need to restart the session. Please follow the prompt to do so.

In [2]:
!pip install pyRDDLGym
!pip install rddlrepository

Collecting pyRDDLGym
  Downloading pyrddlgym-2.5-py3-none-any.whl.metadata (1.3 kB)
Downloading pyrddlgym-2.5-py3-none-any.whl (111 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m111.8/111.8 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pyRDDLGym
Successfully installed pyRDDLGym-2.5
Collecting rddlrepository
  Downloading rddlrepository-2.1-py3-none-any.whl.metadata (959 bytes)
Downloading rddlrepository-2.1-py3-none-any.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rddlrepository
Successfully installed rddlrepository-2.1


## Environment

The Elevator environment models evening rush hours when people from different floors in a building want to go down to the bottom floor using elevators.

The building has 5 floors and 2 elevator. Each floor may have several people waiting. The objective is to pick up passengers from various floors and deliver them to the first floor. New passengers may arrive at each floor while the elevator is in operation. The elevator can move up and down, and pick up and drop off passengers. However, it can only do so when the door is open, and it can only move when the door is closed.

In [3]:
import pyRDDLGym
from pyRDDLGym.core.env import RDDLEnv

import numpy as np

import gymnasium as gym
from gymnasium.spaces import Box, Discrete
from gymnasium.wrappers import RecordEpisodeStatistics

from utils import DictToListWrapper

### Initialization

To initialize the environment, call the `Elevator` class. Here, we will use the `DictToListWrapper` to convert the environment's state from a dictionary to a list, with the detail given in the "Environment Description" section below.

In [4]:
def create_elevator_env():
    env = RDDLEnv(
        domain="selfDefinedEnvs/domain.rddl",
        instance="selfDefinedEnvs/instance5.rddl",  # instance-5 file
    )
    # If your observation is a Dict of booleans, flatten it:
    env = DictToListWrapper(env)
    env = RecordEpisodeStatistics(env)
    return env

env = create_elevator_env().env

Generating LALR tables
>> ( sum_{?f: floor} [ elevator-at-floor(?e, ?f) ] ) == 1


### Observation and Action Space

Using the following code cell, we can check the observation and action space of the environment, with the detailed descriptions.

In [5]:
print(f"Observation space: {env.observation_space}")
env.get_state_description()

print(f"Action space: {env.action_space}")
env.get_action_description()

Observation space: Box(-inf, inf, (13,), float32)
State description:
state dim 0: num-person-waiting___f0
state dim 1: num-person-waiting___f1
state dim 2: num-person-waiting___f2
state dim 3: num-person-waiting___f3
state dim 4: num-person-waiting___f4
state dim 5: num-person-in-elevator___e0
state dim 6: elevator-dir-up___e0
state dim 7: elevator-closed___e0
state dim 8: elevator-at-floor___e0__f0
state dim 9: elevator-at-floor___e0__f1
state dim 10: elevator-at-floor___e0__f2
state dim 11: elevator-at-floor___e0__f3
state dim 12: elevator-at-floor___e0__f4
Action space: Discrete(6)
Action description:
Action 0: {'move-current-dir___e0': np.int64(0)}
Action 1: {'move-current-dir___e0': np.int64(1)}
Action 2: {'open-door___e0': np.int64(0)}
Action 3: {'open-door___e0': np.int64(1)}
Action 4: {'close-door___e0': np.int64(0)}
Action 5: {'close-door___e0': np.int64(1)}


### Interaction

The agent interacts with the environment following the [OpenAI Gym API](https://gymnasium.farama.org/). The environment provides the following methods:

- `reset()`: Resets the environment and returns the initial state along with any additional information (usually empty).
- `step(action)`: Takes an action in the environment and returns:
    - *next state*: The resulting state after the action.
    - *reward*: The reward received for the action.
    - *done*: A boolean indicating whether the episode has ended.
    - *truncated*: A boolean indicating whether the episode was truncated (terminated for any unspecified reason), though this is not applicable to our task.
    - *info*: Additional information returned as a dictionary.
- `close()`: Closes the environment and releases any resources.

A template for interacting with the environment is shown below:

In [6]:
state, info = env.reset()

for i in range(20):
    # randomly sample an action from the action space
    action = env.action_space.sample()

    print(f"Action: {action}")

    next_state, reward, terminated, truncated, info = env.step(action)

    print(f"Next state: {next_state}")
    print(f"Reward: {reward}")
    print(f"Terminated: {terminated}")
    print(f"Truncated: {truncated}")

    done = terminated or truncated
    if done:
        state, info = env.reset()


Action: 3
Next state: [0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0.]
Reward: 0.0
Terminated: False
Truncated: False
Action: 1
Next state: [0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0.]
Reward: 0.0
Terminated: False
Truncated: False
Action: 4
Next state: [0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0.]
Reward: 0.0
Terminated: False
Truncated: False
Action: 5
Next state: [0. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0.]
Reward: 0.0
Terminated: False
Truncated: False
Action: 2
Next state: [0. 1. 0. 0. 1. 0. 1. 1. 0. 1. 0. 0. 0.]
Reward: -3.0
Terminated: False
Truncated: False
Action: 1
Next state: [0. 1. 0. 0. 1. 0. 1. 1. 0. 0. 1. 0. 0.]
Reward: -6.0
Terminated: False
Truncated: False
Action: 1
Next state: [0. 1. 1. 0. 1. 0. 1. 1. 0. 0. 0. 1. 0.]
Reward: -6.0
Terminated: False
Truncated: False
Action: 5
Next state: [0. 1. 1. 0. 2. 0. 1. 1. 0. 0. 0. 1. 0.]
Reward: -9.0
Terminated: False
Truncated: False
Action: 3
Next state: [0. 1. 2. 1. 2. 0. 0. 0. 0. 0. 0. 1. 0.]
Reward: -12.0
Terminated: False
Truncated: False
Acti