In [2]:
%load_ext tensorboard

# Participants
Multiple UAVs and Sensors (defined by variables `num_drones` and `num_sensors`) 

# Objectives
UAVs must collect data from all sensors at least once. UAVs are constantly attempting to collect 
information from the space around them, so data is automatically collected once they get close
enough to a sensor. When all sensors are visited an episode ends.

# Constraints
- Communication has limited range
- UAV speed is limited (and fixed at 10m/s)
- All participants must be within a square area of parametrized side.
- Episode length is limited

# Algorithm
The algorithm used to solve this problem was DDPG, adapted 
from https://docs.cleanrl.dev/rl-algorithms/ddpg/#ddpg_continuous_actionpy. A single RL agent is trained 
using information from all UAVs (Centralized training with parameter sharing).

The agents were integrated into GrADyS-SIM-Nextgen. Iterations happen every 0.5 seconds of simulation time.

## State
Agents (UAVs) know all agent's positions (including their own), they know their ID, they know the
positions of all sensors and they know if sensors were visited or not.

## Action
Agents chose a direction to travel

## Reward
- 1 to all agents if all sensor data has been collected
- -1 to a specific agent if it leaves the scenario area
- 0 otherwise

# Training
Episodes start with UAVs being placed randomly in a small area in the center of the scenario. Sensors are 
placed randomly outside of that small area. The simulation runs until the time limit. If an agent leaves
the scenario's area the simulation is terminated. If all sensors have been visited, the simulation is
terminated.

# Parameters
These are the parameters being studied:
- Number of drones [1, 2]
- Number of sensors [1, 2]
- Scenario size [50, 100] (size of the side of the scenario square)
- Training time [1_000_000, 10_000_000]


# Performance benchmark
A benchmark was executed to figure out hot to best run simulation campaigns.

Benchmarking code:

```python
import multiprocessing
import subprocess

parrallelism = 1

experiments = [
    ["python", "main.py", "--scenario_size=50", "--num-drones=1", "--num-sensors=1", f"--exp_name=bench_parallel_{parrallelism}cpu", "--max_episode_length=60", "--no-checkpoint-visual-evaluation", "--total-timesteps=100000"]
    for _ in range(parrallelism)
]

def run_experiment(experiment):
    print("Running experiment: ", experiment)
    subprocess.run(experiment)


if __name__ == "__main__":
    pool = multiprocessing.Pool(processes=parrallelism)
    pool.map(run_experiment, experiments)
```
## Results
| Parrallelism | Total SPS | Cuda? |
| ------------ | --------- | ----- |
| 10           | 940       | Yes   |
| 8            | 904       | Yes   |
| 1            | 280       | Yes   |

# Preliminary Evaluation
Several simulation scenarios were ran to evaluate the algorithm's performance on the scenario.

## Parameters
These are the parameters being studied:
- Number of drones [1, 2]
- Number of sensors [1, 2]
- Scenario size [50, 100] (size of the side of the scenario square)
- Training time [1_000_000, 10_000_000]

## Benchmarking code
```python
import multiprocessing
import subprocess
from dataclasses import dataclass

import tyro


@dataclass
class Args:
    concurrency: int = 3

experiments = [
    ["python", "main.py", "--scenario_size=50", "--num-drones=1", "--num-sensors=1", "--exp_name=1-sensor-1-drone-50-size", "--max_episode_length=60", "--no-checkpoint-visual-evaluation"],
    ["python", "main.py", "--scenario_size=50", "--num-drones=1", "--num-sensors=2", "--exp_name=2-sensor-1-drone-50-size", "--max_episode_length=60", "--no-checkpoint-visual-evaluation"],
    ["python", "main.py", "--scenario_size=50", "--num-drones=2", "--num-sensors=1", "--exp_name=1-sensor-2-drone-50-size", "--max_episode_length=60", "--no-checkpoint-visual-evaluation"],
    ["python", "main.py", "--scenario_size=50", "--num-drones=2", "--num-sensors=2", "--exp_name=2-sensor-2-drone-50-size", "--max_episode_length=60", "--no-checkpoint-visual-evaluation"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=1", "--num-sensors=1", "--exp_name=1-sensor-1-drone-100-size", "--max_episode_length=60", "--no-checkpoint-visual-evaluation"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=1", "--num-sensors=2", "--exp_name=2-sensor-1-drone-100-size", "--max_episode_length=60", "--no-checkpoint-visual-evaluation"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=1", "--exp_name=1-sensor-2-drone-100-size", "--max_episode_length=60", "--no-checkpoint-visual-evaluation"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=2", "--exp_name=2-sensor-2-drone-100-size", "--max_episode_length=60", "--no-checkpoint-visual-evaluation"],
]

def run_experiment(experiment):
    print("Running experiment: ", experiment)
    subprocess.run(experiment)


if __name__ == "__main__":
    args = tyro.cli(Args)
    pool = multiprocessing.Pool(processes=args.concurrency)
    pool.map(run_experiment, experiments)
```

## Results

In [2]:
%tensorboard --logdir runs/preliminary

## Conclusions
- On a 1 million training routine, some of the hardest scenarios don't optimize
- Increasing the scenario area size negatively affects the training results. It makes the exploration of the environment harder, increasing samples needed for an optimal policy to arise
- Running training for too long negatively impacts performance. An error in the early stopping strategy left some scenarios running for longer than they should, the negative effects on these scenarios can be easily spotted
- The scenario becomes harder the more drones are present. Further evaluation is required on this front

# Sensor number evaluation
Does the system performance scale well with higher sensor counts? 

## Parameters
- Number of drones = 2
- Number of sensors = [3, 4, 5, 6, 7, 8, 9, 10]
- Scenario square size = 100
- Training time = 10mil

## Benchmarking code:

```python
import multiprocessing
import subprocess
from dataclasses import dataclass

import tyro


@dataclass
class Args:
    concurrency: int = 3

experiments = [
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=3", "--exp_name=10mil_3-sensor-2-drone-100-size", "--max_episode_length=200", "--no-checkpoint-visual-evaluation", "--run-name=SensorIncrease", "--checkpoint-freq=1000000", "--total-timesteps=10000000"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=4", "--exp_name=10mil_4-sensor-2-drone-100-size", "--max_episode_length=200", "--no-checkpoint-visual-evaluation", "--run-name=SensorIncrease", "--checkpoint-freq=1000000", "--total-timesteps=10000000"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=5", "--exp_name=10mil_5-sensor-2-drone-100-size", "--max_episode_length=200", "--no-checkpoint-visual-evaluation", "--run-name=SensorIncrease", "--checkpoint-freq=1000000", "--total-timesteps=10000000"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=6", "--exp_name=10mil_6-sensor-2-drone-100-size", "--max_episode_length=200", "--no-checkpoint-visual-evaluation", "--run-name=SensorIncrease", "--checkpoint-freq=1000000", "--total-timesteps=10000000"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=7", "--exp_name=10mil_7-sensor-2-drone-100-size", "--max_episode_length=200", "--no-checkpoint-visual-evaluation", "--run-name=SensorIncrease", "--checkpoint-freq=1000000", "--total-timesteps=10000000"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=8", "--exp_name=10mil_8-sensor-2-drone-100-size", "--max_episode_length=200", "--no-checkpoint-visual-evaluation", "--run-name=SensorIncrease", "--checkpoint-freq=1000000", "--total-timesteps=10000000"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=9", "--exp_name=10mil_9-sensor-2-drone-100-size", "--max_episode_length=200", "--no-checkpoint-visual-evaluation", "--run-name=SensorIncrease", "--checkpoint-freq=1000000", "--total-timesteps=10000000"],
    ["python", "main.py", "--scenario_size=100", "--num-drones=2", "--num-sensors=10", "--exp_name=10mil_10-sensor-2-drone-100-size", "--max_episode_length=200", "--no-checkpoint-visual-evaluation", "--run-name=SensorIncrease", "--checkpoint-freq=1000000", "--total-timesteps=10000000"],
]

def run_experiment(experiment):
    print("Running experiment: ", experiment)
    subprocess.run(experiment)


if __name__ == "__main__":
    args = tyro.cli(Args)
    pool = multiprocessing.Pool(processes=args.concurrency)
    pool.map(run_experiment, experiments)
```
## Results

In [3]:
%tensorboard --logdir runs/sensor-increase

## Conclusions
- With 10 million training steps, no cases could be optimized. The three sensor case reaches a 1.5% success rate, which is still not great
- Hypothesis: since the reward function is very sparse, it becomes more and more unlikely that a successful scenario is achieved during exploration
- A denser reward function would probably help
- More work is needed to make sure the algorithm converges for bigger scenarios