Setup
---

Make sure to select `GPU` under Runtime > Change runtime type > Hardware accelerator!

In [0]:
import sys

# Setup for use in Colab
if 'google.colab' in sys.modules:
    # Clone GitHub repository
    !git clone https://github.com/MasterScrat/droneRL-workshop.git --single-branch
        
    # Install packages via pip
    !pip install -r "droneRL-workshop/colab-requirements.txt"
    
    # Restart Runtime so everything takes effect
    import os
    os.kill(os.getpid(), 9)

    # Your Runtime will crash after this - this is normal!

In [0]:
%cd /content/droneRL-workshop

In [0]:
%matplotlib inline
import os
from PIL import Image
from IPython.lib.pretty import pretty

The challenge environment
---

The environment for this challenge is called **`DeliveryDrones`**.

After creating the environment, call `reset()` to initalize the environment.

In [0]:
from env.env import DeliveryDrones

# Create environment
env = DeliveryDrones()

# Resets it and get the initial observation
observation = env.reset()

# Render in text
print(env.render(mode='ansi'))

In [0]:
# Render as an RGB image to see things more clearly
Image.fromarray(env.render(mode='rgb_array'))

Observations spaces
---

By default, the environment returns `ground` and `air` grids as observations.

In [0]:
# Observations are returned after env.reset() or env.step() calls
print(observation)

In [0]:
# We can inspect what's on the ground
observation['ground'].grid

We use **observation wrappers** to produce states that can be used with RL agents.

In [0]:
from env.wrappers import CompassQTable, CompassChargeQTable, LidarCompassQTable, LidarCompassChargeQTable

# Create the environment
env = DeliveryDrones()

# Use an observation wrappers
env = CompassQTable(env)

# Reset the environment and print inital observation
observation = env.reset()
print(pretty(observation))

# Render as an RGB image
Image.fromarray(env.render(mode='rgb_array'))

In [0]:
# Print the state in a nicer way using `env.format_state`
{drone: env.format_state(observation) for drone, observation in observation.items()}

In [0]:
from env.env import Action

Action??

In [0]:
observation, reward, done, info = env.step({0: Action.STAY})

print('Rewards: {}'.format(reward))
Image.fromarray(env.render(mode='rgb_array'))

In [0]:
{drone: env.format_state(observation) for drone, observation in observation.items()}

The `WindowedGridView` observation wrapper
---

This is the "official" wrapper for the competition!

```
Observation wrapper: (N, N, 6) numerical arrays with location of
(0) drones         marked with                   1 / 0 otherwise
(1) packets        marked with                   1 / 0 otherwise
(2) dropzones      marked with                   1 / 0 otherwise
(3) stations       marked with                   1 / 0 otherwise
(4) drones charge  marked with   charge level 0..1 / 0 otherwise
(5) obstacles      marked with                   1 / 0 otherwise
Where N is the size of the window, i the number of drones
```

In [0]:
from env.wrappers import WindowedGridView

env = WindowedGridView(DeliveryDrones(), radius=2)
states = env.reset()
Image.fromarray(env.render(mode='rgb_array'))

In [0]:
{drone: env.format_state(state) for drone, state in states.items()}

In [0]:
states[0][:, :, 5] # Obstacles from the perspective of drone 0

Create and run agents
---

After creating your agents, you can run them with the `test_agents()` method

In [0]:
from agents.random import RandomAgent

# Create and setup the environment
env = WindowedGridView(DeliveryDrones(), radius=3)
states = env.reset()

# Create random agents
agents = {drone.index: RandomAgent(env) for drone in env.drones}
agents

In [0]:
# The random agents just pick an action randomly
RandomAgent??

In [0]:
from helpers.rl_helpers import test_agents

# Run agents for 1000 steps
rewards_log = test_agents(env, agents, n_steps=1000, seed=0)

# Print rewards
for drone_index, rewards in rewards_log.items():
    print('Drone {} rewards: {} ..'.format(drone_index, rewards[:10]))

And visualize the rewards with the helpers functions

In [0]:
from helpers.rl_helpers import plot_cumulative_rewards

plot_cumulative_rewards(
    rewards_log,
    events={'pickup': [1], 'crash': [-1]}, # Optional, default: pickup/crash ±1
    drones_labels={0: 'My drone'}, # Optional, default: drone index 
)

Train a first agent
---

To train your agents, you will use the `MultiAgentTrainer()`

In [0]:
from agents.dqn import DQNAgent, DenseQNetworkFactory
from helpers.rl_helpers import MultiAgentTrainer, plot_rolling_rewards

# Create and setup the environment
env = WindowedGridView(DeliveryDrones(), radius=3)
env.env_params.update({'n_drones': 3, 'skyscrapers_factor': 0, 'charge_reward': 0, 'discharge': 0})
states = env.reset()

# Create random agents
agents = {drone.index: RandomAgent(env) for drone in env.drones}

# Use a DQNAgent for agent 0 - we will see how this works next
agents[0] = DQNAgent(
    env, DenseQNetworkFactory(env, hidden_layers=[32, 32]),
    gamma=0.95, epsilon_start=1.0, epsilon_decay=0.999, epsilon_end=0.01,
    memory_size=10000, batch_size=64, target_update_interval=5
)

agents

In [0]:
# Create trainer
trainer = MultiAgentTrainer(env, agents, reset_agents=True, seed=0)

# Train with different grids
trainer.train(1000)

# Print rewards
for drone_index, rewards in trainer.rewards_log.items():
    print('Drone {} rewards: {} ..'.format(drone_index, rewards[:10]))

And visualize training with helpers functions

In [0]:
plot_rolling_rewards(
    trainer.rewards_log,
    drones_labels={0: 'My drone'}, # Optional: specify drone names
)

Test agents
---

In [0]:
rewards_log = test_agents(env, agents, n_steps=1000, seed=0)
plot_cumulative_rewards(rewards_log, drones_labels={0: 'My drone'})

Visualize a "run"
---

Share videos of your best agents! `#AMLD2024` `#droneRL`

In [0]:
from helpers.rl_helpers import render_video, ColabVideo

path = os.path.join('output', 'videos', 'intro-run.mp4')
render_video(env, agents, video_path=path, n_steps=120, fps=1, seed=None)

In [0]:
ColabVideo(path)

## Submit to AIcrowd! 🚀

In [0]:
path = os.path.join('output', 'agents', 'first-agent.pt')
agents[0].save(path)

Download the file `output/agents/first-agent.pt` and submit it:

https://www.aicrowd.com/challenges/dronerl