# Run Soft Actor-Critic in Google Colab

## 1. Preparation
Pull code from GitHub repository and install dependencies.

In [None]:
!git clone https://github.com/chris-hoffmann/post2_soft_actor_critic.git

In [None]:
cd post2_soft_actor_critic/
rm Experiments/* environment.yml

In [None]:
!xargs sudo apt install -y < dependencies/colab_pkgs.txt

In [None]:
!pip install -r dependencies/colab_requirements.txt

In [None]:
!dos2unix scripts/run_inv_pendulum.sh

In [None]:
from pyvirtualdisplay import Display

display = Display(visible=0, size=(1400, 900))
display.start()

## 2. Train the agent
We perform 3 runs in each Gym environment using distinct random seeds.

### Train in the 1st environment: ***HalfCheetah-v4***

In [None]:
!bash scripts/run_half_cheetah.sh

### Train in the 2nd environment: ***InvertedPendulum-v4***

In [None]:
!bash scripts/run_inv_pendulum.sh

### Train in the 3rd environment: ***Hopper-v4***

In [None]:
!bash scripts/run_hopper.sh

### Check training output

If things went according to plan, the directory `Experiments` should have the following structure:
```
Experiments/
├── HalfCheetah-v4__seed_1
│   ├── events.out.tfevents.*
│   └── policy_ckpt.pth
├── HalfCheetah-v4__seed_2
│   ├── events.out.tfevents.*
│   └── policy_ckpt.pth
├── HalfCheetah-v4__seed_3
│   ├── events.out.tfevents.*
│   └── policy_ckpt.pth
├── Hopper-v4__seed_1
│   ├── events.out.tfevents.*
│   └── policy_ckpt.pth
├── Hopper-v4__seed_2
│   ├── events.out.tfevents.*
│   └── policy_ckpt.pth
├── Hopper-v4__seed_3
│   ├── events.out.tfevents.*
│   └── policy_ckpt.pth
├── InvertedPendulum-v4__seed_1
│   ├── events.out.tfevents.*
│   └── policy_ckpt.pth
├── InvertedPendulum-v4__seed_2
│   ├── events.out.tfevents.*
│   └── policy_ckpt.pth
├── InvertedPendulum-v4__seed_3
│   ├── events.out.tfevents.*
│   └── policy_ckpt.pth
```
As you can see, each single run is represented by a specific directory containing a TensorBoard event file (`events.out.tfevents.*`) and the parameters of the trained Actor (`policy_ckpt.pth`).

## Analyze the training

We generate learning curves displaying the return per time-step averaged over 3 independent runs performed in each environment as well as videos illustrating the quality of the learned policies. 

In [None]:
python analyze_runs.py --out-dir <dir_path>

Note that the resulting artifacts (plots and gif files) are available from the [GitHub repository](https://github.com/chris-hoffmann/post2_soft_actor_critic/tree/main/assets) and are also displayed in the [README](https://github.com/chris-hoffmann/post2_soft_actor_critic/blob/main/README.md). 