# Task 1: Implementation of a Tabular Reinforcement Learning Environment
### By Nattaphat Thanaussawanun and Prapas Rakchartkiattikul

This notebook has demonstrated the nature of the warehouse environment into 5 scenarios.

## Content
- <a href='#scenario1'>Scenario 1: Basic Move Actions</a>
- <a href='#scenario2'>Scenario 2: Agent Hit the Box</a>
- <a href='#scenario3'>Scenario 3: Agent Hit the Human</a>
- <a href='#scenario4'>Scenario 4: Pickup Action</a>
- <a href='#scenario5'>Scenario 5: Dropoff Action</a>

The notation that used to represent the location in the environment: 
- Empty Space --> .
- Stationary Obstacles --> X
- Box --> B
- Human --> H
- Parcel --> P
- Destination --> P
- Agent --> A

In [1]:
#Import nessary functions
from warehouse import *
import pickle

In [2]:
# Example of environment of size 10
env = Warehouse(10)
env.reset()
env.display()

X X X X X X X X X X 
X P . . . . D . . X 
X . . . . . X . . X 
X . . X . . X B . X 
X X . . . . . . . X 
X . . . . . . . . X 
X . . . . . . . . X 
X . . H . . . . . X 
X . . . . . . . X X 
X X X X X X A X X X 



## Scenario 1: Basic Move Actions <a id='scenario1'></a> 

In [3]:
# Import a predefined enviroment
FILENAME = open('env_scenario_1.p', 'rb')
env_scenario_1_test = pickle.load(FILENAME)
FILENAME.close()

In [4]:
# Perform set of actions
print('Scenario 1: Starting Point')
env_scenario_1_test.display()

_, reward, _ = env_scenario_1_test.step('right')
print(f'''Perform "right" action --> Received {reward} point''')
env_scenario_1_test.display()

_, reward, _ = env_scenario_1_test.step('down')
print(f'''Perform "down" action --> Received {reward} point''')
env_scenario_1_test.display()

_, reward, _=  env_scenario_1_test.step('left')
print(f'''Perform "left" action --> Received {reward} point''')
env_scenario_1_test.display()

_, reward, _ =  env_scenario_1_test.step('up')
print(f'''Perform "up" action --> Received {reward} point''')
env_scenario_1_test.display()

Scenario 1: Starting Point
X X X X X X X X X X 
X . . . . . . . . X 
X D . . . . . X . X 
X X . . . . . . . X 
X . . . . B . . . X 
X . . . . . . X H X 
X . . A . . . . P X 
X . . . . . . . . X 
X . . . . . X . . X 
X X X X X X X X X X 

Perform "right" action --> Received -1 point
X X X X X X X X X X 
X . . . . . . . . X 
X D . . . . . X . X 
X X . . . . . . . X 
X . . . . B . . . X 
X . . . . . . X H X 
X . . . A . . . P X 
X . . . . . . . . X 
X . . . . . X . . X 
X X X X X X X X X X 

Perform "down" action --> Received -1 point
X X X X X X X X X X 
X . . . . . . . . X 
X D . . . . . X . X 
X X . . . . . . . X 
X . . . . B . . . X 
X . . . . . . X H X 
X . . . . . . . P X 
X . . . A . . . . X 
X . . . . . X . . X 
X X X X X X X X X X 

Perform "left" action --> Received -1 point
X X X X X X X X X X 
X . . . . . . . . X 
X D . . . . . X . X 
X X . . . . . . . X 
X . . . . B . . . X 
X . . . . . . X H X 
X . . . . . . . P X 
X . . A . . . . . X 
X . . . . . X . . X 
X X X X X X X X X 

## Scenario 2: Agent Hit the Box   <a id='scenario2'></a> 

In [5]:
# Import a predefined enviroment
FILENAME = open('env_scenario_2.p', 'rb')
env_scenario_2 = pickle.load(FILENAME)
FILENAME.close()

In [6]:
# Perform set of actions
print('Scenario 2: Starting Point')
env_scenario_2.display()

print('>>> The "right" action will push the box backward, which results in a huge punishment.')
_, reward, _= env_scenario_2.step('right')
print(f'''Perform "right" action --> Received {reward} points''')
env_scenario_2.display()


print('>>> We can still perform the right action again, but the box will not move further since there is no free space behind.')
print('>>> The agent still receive a huge punishment.')
_, reward, _ = env_scenario_2.step('right')
print(f'''Perform "right" action --> Received {reward} points''')
env_scenario_2.display()

Scenario 2: Starting Point
X X X X X X X X X X 
X . . . . . . . . X 
X D . . . . . X . X 
X X . . . . . . . X 
X . . . . . . . H X 
X . . . A B . X . X 
X . . . . . . . P X 
X . . . . . . . . X 
X . . . . . X . . X 
X X X X X X X X X X 

>>> The "right" action will push the box backward, which results in a huge punishment.
Perform "right" action --> Received -21 points
X X X X X X X X X X 
X . . . . . . . . X 
X D . . . . . X . X 
X X . . . . . . . X 
X . . . . . . . . X 
X . . . . A B X H X 
X . . . . . . . P X 
X . . . . . . . . X 
X . . . . . X . . X 
X X X X X X X X X X 

>>> We can still perform the right action again, but the box will not move further since there is no free space behind.
>>> The agent still receive a huge punishment.
Perform "right" action --> Received -21 points
X X X X X X X X X X 
X . . . . . . . . X 
X D . . . . . X . X 
X X . . . . . . . X 
X . . . . . . . . X 
X . . . . A B X H X 
X . . . . . . . P X 
X . . . . . . . . X 
X . . . . . X . . X 
X X X X X X X 

## Scenario 3: Agent Hit the Human   <a id='scenario3'></a> 

In [7]:
# Import a predefined enviroment
FILENAME = open('env_scenario_3.p', 'rb')
env_scenario_3 = pickle.load(FILENAME)
FILENAME.close()

In [8]:
# Perform set of actions
print('Scenario 3: Starting Point')
env_scenario_3.display()

print('>>> The left action will make the agent hit the human, which results in a huge punishment and terminate the environment.')
_, reward, done = env_scenario_3.step('left')
print(f'''Perform "left" action --> Received {reward} points --> Done state {done}''')
env_scenario_3.display()

Scenario 3: Starting Point
X X X X X X X X X X 
X . . . . . . . . X 
X . X . . X . . . X 
X D . . H A . . P X 
X . . . . . X . . X 
X . . . . . . . . X 
X . X X . . . . . X 
X . . B . . . . . X 
X . . . . . . . . X 
X X X X X X X X X X 

>>> The left action will make the agent hit the human, which results in a huge punishment and terminate the environment.
Perform "left" action --> Received -51.0 points --> Done state True
X X X X X X X X X X 
X . . . . . . . . X 
X . X . . X . . . X 
X D . . A . . . P X 
X . . . . . X . . X 
X . . . . . . . . X 
X . X X . . . . . X 
X . . B . . . . . X 
X . . . . . . . . X 
X X X X X X X X X X 



If the done status is not True, please uncomment the code in the below cell and run it.
This is due to the random chance the human can move.

In [9]:
# print('>>> We can still perform right action again, but the box will not move further since there is no free space in the back.')
# _, reward, _ = env_scenario_2.step('down')
# print(f'''Perform "down" action --> Received {reward} points''')
# env_scenario_2.display()

## Scenario 4: Pickup action <a id='scenario4'></a> 

In [10]:
FILENAME = open('env_scenario_4.p', 'rb')
env_scenario_4 = pickle.load(FILENAME)
FILENAME.close()

In [11]:
print('Scenario 4: Starting Point')
env_scenario_4.display()

print('>>> The "right" action will make the agent to be at the parcel location.')
_, reward, done = env_scenario_4.step('right')
print(f'''Perform "right" action --> Received {reward} point''')
env_scenario_4.display()

print('>>> The agent needs to perform the "pickup" action to pick the parcel up and it gets moderate rewards.')
_, reward, done = env_scenario_4.step('pickup')
print(f'''Perform "pickup" action --> Received {reward} points''')
env_scenario_4.display()

Scenario 4: Starting Point
X X X X X X X X X X 
X . . . . . . . . X 
X . X . . X . . . X 
X D . . . . . A P X 
X . . . H . X . . X 
X . . . . . . . . X 
X . X X . . . . . X 
X . . B . . . . . X 
X . . . . . . . . X 
X X X X X X X X X X 

>>> The "right" action will make the agent to be at the parcel location.
Perform "right" action --> Received -1 point
X X X X X X X X X X 
X . . . . . . . . X 
X . X . . X . . . X 
X D . . . . . . A X 
X . . . . H X . . X 
X . . . . . . . . X 
X . X X . . . . . X 
X . . B . . . . . X 
X . . . . . . . . X 
X X X X X X X X X X 

>>> The agent needs to perform the "pickup" action to pick the parcel up and it gets moderate rewards.
Perform "pickup" action --> Received 49.0 points
X X X X X X X X X X 
X . . . . . . . . X 
X . X . . X . . . X 
X D . . . H . . A X 
X . . . . . X . . X 
X . . . . . . . . X 
X . X X . . . . . X 
X . . B . . . . . X 
X . . . . . . . . X 
X X X X X X X X X X 



## Scenario 5: Dropoff Action  <a id='scenario5'></a> 

In [12]:
FILENAME = open('env_scenario_5.p', 'rb')
env_scenario_5 = pickle.load(FILENAME)
FILENAME.close()

In [13]:
print('Scenario 5: Starting Point')
env_scenario_5.display()

print('>>> At this point, the agent has already picked up the parcel, which needs to move to the left to be at the destination location.')
_, reward, done = env_scenario_5.step('left')
print(f'''Perform "left" action --> Received {reward} point''')
env_scenario_5.display()

print('>>> The agent needs to perform the "dropoff" action to drop the parcel at the destination and terminate the environment.')
_, reward, done = env_scenario_5.step('dropoff')
print(f'''Perform "dropoff" action --> Received {reward} points --> Done state {done}''')
env_scenario_5.display()

Scenario 5: Starting Point
X X X X X X X X X X 
X . . . . . . . . X 
X . X . . X . . . X 
X D A . . . . . . X 
X . . . . H X . . X 
X . . . . . . . . X 
X . X X . . . . . X 
X . . B . . . . . X 
X . . . . . . . . X 
X X X X X X X X X X 

>>> At this point, the agent has already picked up the parcel, which needs to move to the left to be at the destination location.
Perform "left" action --> Received -1 point
X X X X X X X X X X 
X . . . . . . . . X 
X . X . . X . . . X 
X A . . . . . . . X 
X . . . . H X . . X 
X . . . . . . . . X 
X . X X . . . . . X 
X . . B . . . . . X 
X . . . . . . . . X 
X X X X X X X X X X 

>>> The agent needs to perform the "dropoff" action to drop the parcel at the destination and terminate the environment.
Perform "dropoff" action --> Received 49.0 points --> Done state True
X X X X X X X X X X 
X . . . . . . . . X 
X . X . . X . . . X 
X A . . . . . . . X 
X . . . . H X . . X 
X . . . . . . . . X 
X . X X . . . . . X 
X . . B . . . . . X 
X . . . . . . . . 

## END