# Adversarial Policies: Attacking Deep Reinforcement Learning

## Introduction


## Theory behind Adversarial Policies

To implement and see the effectiveness of Adversarial Policies we used this Berkeley [GitHub](https://github.com/HumanCompatibleAI/adversarial-policies). Below you will find a step-by-step guide on how to get it running and how to recreate the results on Windows.

## Reproducing results on Windows


### Setting up WSL, Docker & the Berkeley repository
- First you need to install WSL. To do this just follow along this guide by [Microsoft](https://docs.microsoft.com/en-us/windows/wsl/install).

- Afterwards download and install Docker Desktop according to [this](https://docs.docker.com/desktop/windows/install/). 

- Now simply clone the Berkeley [GitHub](https://github.com/HumanCompatibleAI/adversarial-policies) and download the [Mujoco activation key](https://www.roboti.us/license.html). 

- Move the Key in the cloned Repository. 

- Create a terminal in the repository and build a docker image with ```docker build -t rl_adversarial```. 

- After succesfully building the image start a docker Container with the Mujoco key by calling ```docker run -it --name rl_adv --env MUJOCO_PY_MJKEY_PATH=/adversarial-policies/mjkey.txt rl_adversarial /bin/bash```. When you get an error that sounds something like ```ERROR [python-req 6/6] RUN touch /root/.mujoco/mjkey.txt   && ci/build_venv.sh /venv && rm -rf HOME/.cache/ ```  while building the image consider running ``` git config --global core.autocrlf false ``` and repeat step 3 and 4. 

- If everything went smoothly a Linux command line should appear and you are ready to train the Adversarial Policies.


### Getting started with the repository

Now you have several options you can follow. We would suggest that you first run ``` python -m aprl.train```. This will come in handy when searching for the trained models. If an error like ``` multi_train is not in list```occurs simply restarting docker would fix it for us.

### aprl.train
```python -m aprl.train```lets you train a policy. To get a better understanding of the different settings you can run, head to ```aprl/train```  and take a look at different parameters under ```train_config()```. The environment supports a total of 6 games. A summary is provided under ```Games```. To simply recreate the results in the game SumoHumans use ``` python -m aprl.train with env_name=multicomp/SumoHumans-v0 paper```. This will train a policy for a total of 20 Million time steps. After the training is finished you can test the policy by using ```aprl.score_agent```

### aprl.score_agent
``` python -m aprl.score_agent``` allows us  to test the quality of our trained policy. Just like ```aprl.train```, there are lot of different paramters. You can find these at ```aprl/score_agent  default_score_config()``` . To evaluate the quality of our trained Policy from before run the following command```python -m aprl.score_agent with env_name=multicomp/SumoHumans-v0 agent_b_type=ppo2 agent_b_path=/adversarial-policies/data/baselines/20220322_162856-default/final_model/ episodes=100 ```. You need to change ```20220322_162856-default```to the actual name of the folder the policy is stored in. Simply follow along the ``` Save location``` part below to find the folders. ```aprl.score_agent```has the option to creat videos aswell. To create videos of the policies simply add ```videos=True```. In our case we had to set the ```annotated```parameter to False under ```video_params``` or we would recieve an error. Occasionally other errors while creating videos can occur. Most of the time restarting ```WSL```would fix these for us. The videos are stored in the same folder as the logs of the score session if the path is not changed. To find these folders a small guide to locate them is provided below.

### Save location
To find the directory in which the trained policies and the scores are safed head to ``` \\wsl$ ``` -> ```docker-desktop-data``` -> ``` version-pack-data ``` -> ``` community``` -> ```docker```-> ```overlay2``` -> At this point there should be several folders with weird names. Simply sort by last edited and open the last edited folder(to make sure this works atleast one Policy should´ve been trained already). ->  ```diff```-> ```adversarial-policies``` -> ```data```. The trained policies are stored in ```baselines```. The logs of the training sessions and the scoring sessions are stored in ```sacred```.

### Games

A total of 6 games are provided by ```gym_compete```. There are ``` KickAndDefend-v0, RunToGoalHumans-v0, SumoHumans-v0, YouShallNotPassHumans-v0 SumoAnts-v0 and RunToGoalAnts-v0```. To see the games in action simply run ```aprl.score_agent``` with the specific game as ```env_name``` and create a few Videos. (maybe add example)

### Mistakes to avoid
The most time consuming mistake we encountered was training the wrong agent in```YouShallNotPassHumans-v0```. While it doesn´t really matter which Agent you select to train in the Sumo games  its important to select the correct agent here. If you simply select the game and run ``` aprl.train``` you will train the attacking agent. To make sure you select the defending agent use ```python -m aprl.train with env_name=multicomp/YouShallNotPassHumans-v0 embed_index=1 paper```. (Mabe add other mistakes)


## Results

To test the effectiveness of adversarial policies we trained several agents in different games and let them compete against each other.

### SumoHumans

In this game you have two agents fighting each other in a small arena. The goal is to push the opposing agent over or out of the arena(Fig 1). A total of 3 different baseline agents are provided by gym_compete and we trained one adversary for each.  
<center>
<figure>
<img src="..\workspace\adv_policy_training\Sumo_Humans_1v1_tourney\gifs\1v1_norm.gif" style="width: 300px;">
<img src="..\workspace\adv_policy_training\Sumo_Humans_1v1_tourney\gifs\1v1_adv(v1).gif" style="width: 300px;">
<figcaption>(Fig.1)A 1 vs 1 between two zoo agents</figcaption>
</figure>
</center>

<center>
<figure>
<img src="..\workspace\adv_policy_training\Sumo_humans_1v1_tourney/Übersicht.png" style="width: 500px;">
<figcaption>(Fig.2)From left to right: Adversary wins, victim wins and ties</figcaption>
</figure>
</center>