# Elevator Environment

In this Jupyter Notebook, we are going to learn how the elevator environment works.

## Setup

### Installing Dependencies

The elevator task is implemented using the `PyRDDLGym` library. Before we begin, please install the following packages.

**Note**: If you are using Google Colab, you may need to restart the session. Please follow the prompt to do so.

In [1]:
!pip install -q git+https://github.com/tasbolat1/pyRDDLGym.git --force-reinstall

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.9/60.9 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m721.7/721.7 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.7/163.7 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.3/8.3 MB[0m [31m59.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.3/16.3 MB[0m [31m55.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.4/4.4 MB[0m [31m59.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Then we need to reinstall the `numpy` to a specific version, as the `PyRDDLGym` library requires Numpy version 1.24.2.

**Note**: if you are using Google Colab, you may be prompted to restart the session. Please follow the prompt to do so. After restart, **DO NOT run the following cell again!**

In [None]:
!pip install numpy==1.24.2 --force-reinstall

In [None]:
import numpy as np
assert np.__version__ == '1.24.2', f"The numpy version ({np.__version__}) is NOT 1.24.2"

### Using Google Colab

If you are using Google Colab (and we encourage you to do so), please run the following code cell. If you are not using Google Colab, you can skip this code cell.

**Note**: The path `'/content/drive/'` cannot be changed. For example, if your assignment folder in Google Drive is located at `My Drive -> CSXX46A2`, you should specify the path as `'/content/drive/MyDrive/CSXX46A2'`.

In [1]:
from google.colab import drive
drive.mount('/content/drive/')

import sys
sys.path.append('/content/drive/MyDrive/CSXX46A2')

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


ModuleNotFoundError: No module named 'google.colab'

## Environment

The Elevator environment models evening rush hours when people from different floors in a building want to go down to the bottom floor using elevators.

The building has 5 floors and 1 elevator. Each floor can accommodate a maximum of 3 people waiting, while the elevator can carry up to 10 passengers. The objective is to pick up passengers from various floors and deliver them to the first floor. New passengers may arrive at each floor while the elevator is in operation. The elevator can move up and down, and pick up and drop off passengers. However, it can only do so when the door is open, and it can only move when the door is closed.

The state space of the environment is represented as a vector of size 13, which contains the following information:
- Values 0-4: The number of passengers waiting on floors 0-4.
- Value 5: The number of passengers currently in the elevator.
- Value 6: A value of 0 or 1 indicating the direction of the elevator (0 for down, 1 for up).
- Value 7: A value of 0 or 1 indicating whether the elevator door is open (1) or closed (0).
- Values 8-12: One-hot encoding of the current floor of the elevator. For example, if the elevator is at floor 0, then value 8 is 1, and the rest are 0.

The action space consists of 6 actions:
- Move up
- Move down
- Not close door
- Close door
- Not open door
- Open door

The actions "not close door" and "not open door" are effectively no-operations in the real environment. However, they are included in the action space to maintain consistency. 

### Initialization

To initialize the environment, call the `Elevator` class. Here, we will use the `DictToListWrapper` to convert the environment's state from a dictionary to a list, with the detail given in the "Environment Description" section below.

In [12]:
from pyRDDLGym.Elevator import Elevator
from utils import DictToListWrapper

In [13]:
env = Elevator(instance=5)
env = DictToListWrapper(env)

/usr/local/lib/python3.10/dist-packages/pyRDDLGym/Examples /usr/local/lib/python3.10/dist-packages/pyRDDLGym/Examples/manifest.csv
Available example environment(s):
Reservoir_discrete -> Discrete version of management of the water level in interconnected reservoirs.
Reservoir_continuous -> Continuous action version of management of the water level in interconnected reservoirs.
PowerGen_discrete -> A simple power generation problem loosely modeled on the problem of unit commitment.
PowerGen_continuous -> A continuous simple power generation problem loosely modeled on the problem of unit commitment.
RecSim -> A problem of recommendation systems, with consumers and providers.
Elevators -> The Elevator domain models evening rush hours when people from different floors in a building want to go down to the bottom floor using elevators.
HVAC -> Multi-zone and multi-heater HVAC control problem
CartPole_discrete -> A simple continuous state MDP for the classical cart-pole system by Rich Sutton,

<op> is one of {<=, <, >=, >}
<rhs> is a deterministic function of non-fluents or constants only.
>> ( sum_{?f: floor} [ elevator-at-floor(?e, ?f) ] ) == 1


The state space and action space can be shown as follows:

In [14]:
print(env.observation_space)
print(env.action_space)

Box(-inf, inf, (13,), float64)
Discrete(6)


### Interaction

The agent interacts with the environment following the [OpenAI Gym API](https://gymnasium.farama.org/). The environment provides the following methods:

- `reset()`: Resets the environment and returns the initial state along with any additional information (usually empty).
- `step(action)`: Takes an action in the environment and returns:
    - *next state*: The resulting state after the action.
    - *reward*: The reward received for the action.
    - *done*: A boolean indicating whether the episode has ended.
    - *truncated*: A boolean indicating whether the episode was truncated (terminated for any unspecified reason), though this is not applicable to our task.
    - *info*: Additional information returned as a dictionary.
- `close()`: Closes the environment and releases any resources.

A template for interacting with the environment is shown below:

In [15]:
state, info = env.reset()

for i in range(20):
    # randomly sample an action from the action space
    action = env.action_space.sample()

    print(f"Action: {action}:")

    next_state, reward, done, _, info = env.step(action)

    print(f"Next state: {next_state}")
    print(f"Reward: {reward}")
    print(f"Done: {done}")

    if done:
        state, info = env.reset()

Action: 0:
Next state: [0, 0, 1, 1, 0, 0, True, True, True, False, False, False, False]
Reward: 0.0
Done: False
Action: 4:
Next state: [0, 0, 1, 1, 0, 0, True, True, True, False, False, False, False]
Reward: -6.0
Done: False
Action: 0:
Next state: [0, 0, 1, 1, 0, 0, True, True, True, False, False, False, False]
Reward: -6.0
Done: False
Action: 1:
Next state: [0, 0, 1, 1, 0, 0, True, True, False, True, False, False, False]
Reward: -6.0
Done: False
Action: 5:
Next state: [0, 0, 1, 1, 0, 0, False, False, False, True, False, False, False]
Reward: -6.0
Done: False
Action: 0:
Next state: [0, 0, 3, 1, 0, 0, False, False, False, True, False, False, False]
Reward: -6.0
Done: False
Action: 3:
Next state: [0, 0, 3, 1, 0, 0, False, True, False, True, False, False, False]
Reward: -12.0
Done: False
Action: 2:
Next state: [0, 0, 3, 2, 0, 0, False, True, False, True, False, False, False]
Reward: -12.0
Done: False
Action: 3:
Next state: [0, 0, 3, 2, 0, 0, False, True, False, True, False, False, False]
