# Unmanned Aircraft System (UAS) Obstacle Avoidance Integrating Safety Bound with Reinforcement Learning
Author: Jueming Hu, Arizona State University

Email: jueming.hu@asu.edu

## Module Description

This module demonstrates path planning for UAS obstacle avoidance using a **probabilistic dynamic anisotropic (DA) safety bound coupled with reinforcement learning (RL) methods**. The DA safety bound is influenced by UAS performance, weather condition, and uncertainties in UAS operations such as positioning error. 

A **new reward function** based on the operational DA safety bound concept enables trajectory optimization under risk-based dynamic separation criterion. A **Q-learning RL algorithm** is used to learn the optimal path plan with the DA safety bound reward function. A comparison of optimized trajectories around static obstacles is conducted to show the deconflict capability of the DA safety bound with RL methods. 

The detailed information can be found [here](https://arc.aiaa.org/doi/abs/10.2514/6.2020-1372).

## Installing the required Python packages

This code has been tested with Python 3.7. The required Python packages for this module are:
```gym```, ```itertools```, ```matplotlib```, ```numpy```, ```pandas```, ```collections```. All but the first one is standard to most Python distributions like in Anaconda. More information on the ```gym``` package is below. 
- **[```gym```]**(https://anaconda.org/conda-forge/gym)
    - The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.
    
In the Ubuntu or Anaconda terminal, execute ```conda install gym```. This will install the ```gym``` package. 

## Supplementary Python Modules
Several modular functions and Python classes were created to facilitate this work. They are contained in this repository and used in this Jupyter notebook. They are:
 - **```SafetyBound.py```**
     - Obtains the size of the safety bound as a function of UAV state
 - **```ObstacleAvoidanceENV.py```**
     - Defines RL environment, including transition model and reward function.
 - **```Q_learning.py```**
     - Contains the Q-learning algorithm
 - **```geometryCheck.py```**
     - Checks for potential collision within in DA safety bound
 - **```plotting.py```**
     - Provides utilities to visualize RL training process statistics and convergence
 - **```draw.py```**
     - Provides utilities to visualize optimized trajectories on grid. 
     
Theses modules will be called and used in subsequent steps. 

## Section 1: UAV Environment Setup
For the UAV environment, we use the ```UAVEnv``` class from the ```ObstacleAvoidanceENV``` module. To create an instance of ```UAVEnv```, we define the origin and destination and the presence of the safety bound.

### Step 1.1 Define the starting point and destination

In [1]:
origin = (45, 0)
destination=(0, 40)

### Step 1.2. Generate a ```UAVEnv``` without safety bound during training

In [2]:
from ObstacleAvoidanceENV import UAVEnv
from my_env import MyEnv

# uav_no_bound = UAVEnv(origin, destination, safetybound = False)
uav_no_bound = UAVEnv(origin, destination, safetybound = False, safety_bound_size = 10)

47.41279827825916 3.979800634514499
### -143.1153381018261
###### -1143.1153381018262
###### -1075.5777745107648
###### -1040.0
##### -60.0
###### -1060.0
###### -1041.2243707222376
###### -1020.0
###### -1018.5607111932705
###### -1020.1121853611188
### -143.1153381018261
###### -1143.1153381018262
###### -1073.8105683489837
###### -1040.0
##### -60.0
###### -1060.0
###### -1041.2243707222376
###### -1020.0
###### -1018.1107702762748
###### -1020.1121853611188
### -143.1153381018261
###### -1143.1153381018262
###### -1072.11102550928
###### -1040.0
##### -60.0
###### -1060.0
###### -1040.2428337406973
###### -1020.0
###### -1017.6776695296637
###### -1020.1121853611188
### -1143.1153381018262
###### -2143.115338101826
###### -1070.4840407468243
###### -1040.0
##### -58.51922760939348
###### -1058.5192276093935
###### -1039.2883794381532
###### -1020.0
###### -1017.2626765016321
###### -1020.1121853611188
### -1143.1153381018262
###### -2143.115338101826
###### -1068.9347517584563
####

### Step 1.3. Generate a ```UAVEnv``` with safety bound during training

In [3]:
uav_with_bound = UAVEnv(origin, destination, safetybound = True, safety_bound_size = 10)
# uav_no_bound = UAVEnv(origin, destination, safetybound = True)

### -143.1153381018261
###### -1143.1153381018262
###### -1075.5777745107648
###### -1040.0
##### -1060.0
###### -2060.0
###### -1041.2243707222376
###### -1020.0
###### -1018.5607111932705
###### -1020.1121853611188
### -143.1153381018261
###### -1143.1153381018262
###### -1073.8105683489837
###### -1040.0
##### -1060.0
###### -2060.0
###### -1041.2243707222376
###### -1020.0
###### -1018.1107702762748
###### -1020.1121853611188
### -1143.1153381018262
###### -2143.115338101826
###### -1072.11102550928
###### -1040.0
##### -1060.0
###### -2060.0
###### -1040.2428337406973
###### -1020.0
###### -1017.6776695296637
###### -1020.1121853611188
### -1143.1153381018262
###### -2143.115338101826
###### -1070.4840407468243
###### -1040.0
##### -1058.5192276093935
###### -2058.5192276093935
###### -1039.2883794381532
###### -1020.0
###### -1017.2626765016321
###### -1020.1121853611188
### -1143.1153381018262
###### -2143.115338101826
###### -1068.9347517584563
###### -1040.0
##### -1057.078892

## Section 2: Case Studies 
## Case Study I: Path planning without safety bound

### Step 1.  Optimize path with Q-learning

In [5]:
from Q_learning import q_learning
from my_env import MyEnv

Q, stats, trajectories = q_learning(uav_no_bound, num_episodes = 3000, wind_direction = "N", wind_strength = 2)
# print("_________" ,len(Q))

TypeError: 'float' object is not subscriptable

### Step 2.  Save the learned trajectory (output of Q-learning)

In [None]:
import numpy as np
with open('multiPoly_withoutbound.txt', 'w') as outfile:
    outfile.write('# Array shape: {0}\n'.format(trajectories.shape))
    for data_slice in trajectories:
        np.savetxt(outfile, data_slice, fmt='%-7.2f')
        outfile.write('# New slice\n')

### Step 3. Visualization of the learned trajectory

In [None]:
import matplotlib.pyplot as plt
from ObstacleAvoidanceENV import polysize1
from ObstacleAvoidanceENV import polysize2
from ObstacleAvoidanceENV import polysize3
from ObstacleAvoidanceENV import polysize4
from ObstacleAvoidanceENV import polysize5
from draw import trajectory

trajectories1 = np.loadtxt('multiPoly_withoutbound.txt')
trajectories1 = trajectories1.astype(int)
trajectories1 = trajectories1.reshape((1,16, 3))
plt.rcParams['figure.dpi'] = 150
trajectory(trajectories1[0,:,0], (50,50), polysize1, polysize2, polysize3, polysize4, polysize5)

### Step 4. Visualization of the RL process

In [None]:
import plotting

plt.rcParams['figure.dpi'] = 100
plotting.plot_episode_stats(stats) #debug- ctrl shift p- python image/graph viewer - hover over img stored variable- select bulb - open image

## Case Study 2: Path Planning with DA Safety Bound

### Step 1. Optimize Path with Q-learning

In [None]:
Q, stats, trajectories = q_learning(uav_with_bound, num_episodes = 3000)

### Step 2. Save the learned trajectory (output of Q-learning)

In [None]:
with open('multiPoly_withbound.txt', 'w') as outfile:
    outfile.write('# Array shape: {0}\n'.format(trajectories.shape))
    for data_slice in trajectories:
        np.savetxt(outfile, data_slice, fmt='%-7.2f')
        outfile.write('# New slice\n')

### Step 3. Visualization of the learned trajectory

In [None]:
trajectories2 = np.loadtxt('multiPoly_withbound.txt')
trajectories2 = trajectories2.astype(int)
trajectories2 = trajectories2.reshape((1,44, 3))
plt.rcParams['figure.dpi'] = 150
trajectory(trajectories2[0,:,0], (50,50), polysize1, polysize2, polysize3, polysize4, polysize5)

### Step 4. Visualization of the RL learning process

In [None]:
plt.rcParams['figure.dpi'] = 100
plotting.plot_episode_stats(stats)

## Section 3: Comparison of Trajectories with and without Safety Bound

In [None]:
from draw import trajectory2

plt.rcParams['figure.dpi'] = 150
trajectory2(trajectories1[0,:,0], trajectories2[0,:,0], (50,50), polysize1, polysize2, polysize3, polysize4, polysize5)