# AI Exam

An advanced aquatic drone, is deployed to collect critical data on marine biodiversity in a coastal region. The drone starts at point $S$, located near the shore, and must navigate to point $G$, a designated marine research site rich in coral reefs and sea life. Along the way, the drone must carefully maneuver through dynamic underwater environments, avoiding hazards and optimizing its energy usage.

The environment includes:  
1. **(O) Open Water:** Normal movement; no additional challenges.  
2. **(C) Currents:** Areas where the drone's movement is influenced by ocean currents, potentially pushing it off course.  
3. **(F) Seaweed Forests:** Dense vegetation that slows down the drone, incurring extra energy costs per move.    
5. **(E) Energy Stations:** Specific points where the drone can recharge its battery, reducing the total cost navigation.  


<img src="images/env_ex2.png" style="zoom: 20%;"/>



### Environment Details:

- **Grid Representation:** The environment is represented as in the above image (a grid 10x10).  
- $S$ - Start state: The drone's starting point at (0, 0).  
- $G$ - Goal state: The marine research site at (9, 7), providing a large positive reward +20.0 and ending the episode.  
- **Movement Costs:** Each move has a default energy cost of -0.04.  
- **Hazards:**  
  - **Strong Currents:** Entering a zone with current results in a stochastic movement:  
    - 80% chance to move as intended.  
    - 10% chance to be pushed one cell in the left direction perpendicular to the desired movement.  
    - 10% chance to be pushed one cell in the right direction perpendicular to the desired movement. 
  - **Seaweed Forests:** Entering these zones incurs an additional -0.2 reward penalty with respect to the standard movement cost (i.e., a total -0.24 penalty).

- **Energy Stations:** Provide a +1.0 reward when visited (however reaching these cells may require the agent to move far from the goal).  

You can use the following code to explore better the environment

In [None]:
import os, sys 
import tqdm

module_path = os.path.abspath(os.path.join('tools'))
if module_path not in sys.path:
    sys.path.append(module_path)

import gym, envs
from utils.ai_lab_functions import *
import numpy as np
from timeit import default_timer as timer
from tqdm import tqdm as tqdm

env_name = 'AquaticEnv-v0'
env = gym.make(env_name)

env.render()

print("\nActions encoding: ", env.actions)

# Remember that you can know the type of a cell whenever you need by accessing the grid element of the environment:
print("Cell type of start state: ",env.grid[env.startstate])
print("Cell type of goal state: ",env.grid[env.goalstate])
state = 3 # forest
print(f"Cell type of cell {env.state_to_pos(state)}: ",env.grid[state])
state = 12 # corrent
print(f"Cell type of cell {env.state_to_pos(state)}: ",env.grid[state])
state = 17 # energy station
print(f"Cell type of cell {env.state_to_pos(state)}: ",env.grid[state])

In [None]:
#Action encoding
print("\nActions encoding: ", env.actions)

# Remember that you can know the type of a cell whenever you need by accessing the grid element of the environment:
print("Cell type of start state: ",env.grid[env.startstate])
print("Cell type of goal state: {} Reward: {}".format(env.grid[env.goalstate],env.RS[env.goalstate]))

state = 1 # normal state
print(f"\nCell type of cell {env.state_to_pos(state)}: ",env.grid[state])
print(f"Probability of effectivelty performing action {env.actions[0]} from cell {env.state_to_pos(state)} to cell {env.state_to_pos(state+1)}: {env.T[state, list(env.actions.keys())[0], state+1]}")
print(f"Probability of effectivelty performing action {env.actions[1]} from cell {env.state_to_pos(state)} to cell {env.state_to_pos(state+1)}: {env.T[state, list(env.actions.keys())[1], state+1]}")
print(f"Probability of effectivelty performing action {env.actions[2]} from cell {env.state_to_pos(state)} to cell {env.state_to_pos(state+1)}: {env.T[state, list(env.actions.keys())[2], state+1]}")
print(f"Probability of effectivelty performing action {env.actions[3]} from cell {env.state_to_pos(state)} to cell {env.state_to_pos(state+1)}: {env.T[state, list(env.actions.keys())[3], state+1]}")

state = 13 # state with stochastic transitions
print(f"\nCell type of cell {env.state_to_pos(state)}: ",env.grid[state])
print(f"Probability of effectivelty performing action {env.actions[0]} from cell {env.state_to_pos(state)} to cell {env.state_to_pos(state+1)}: {env.T[state, list(env.actions.keys())[0], state+1]}")
print(f"Probability of effectivelty performing action {env.actions[1]} from cell {env.state_to_pos(state)} to cell {env.state_to_pos(state+1)}: {env.T[state, list(env.actions.keys())[1], state+1]}")
print(f"Probability of effectivelty performing action {env.actions[2]} from cell {env.state_to_pos(state)} to cell {env.state_to_pos(state+1)}: {env.T[state, list(env.actions.keys())[2], state+1]}")
print(f"Probability of effectivelty performing action {env.actions[3]} from cell {env.state_to_pos(state)} to cell {env.state_to_pos(state+1)}: {env.T[state, list(env.actions.keys())[3], state+1]}")


#### Q1. Find an optimal solution to this problem by using the approach that you think is most appropriate. Motivate your choice

In [None]:
env_name = 'AquaticEnv-v0'
env = gym.make(env_name)

# your code here
solution = np.array([])

You can visualize and check your solution using the following code:

In [None]:
print("Method: ...")
# create a random policy just for testing
solution = np.random.choice(list(env.actions.keys()), env.observation_space.n)
visual_solution = np.vectorize(env.actions.get)(solution.reshape(env.rows, env.cols)) 

print(visual_solution)
plot_policy(policy=visual_solution, name_env="ex2_render")

#### Analyze the solution returned by your approach and comment on whether the solution passes by at least two charging stations to reach the coral for every possible execution.