##Setup

You will need to make a copy of this notebook in your Google Drive before you can edit the homework files. You can do so with **File &rarr; Save a copy in Drive**.

In [1]:
#@title mount your Google Drive
#@markdown Your work will be stored in a folder called `cs285_f2022` by default to prevent Colab instance timeouts from deleting your edits.

import os
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [2]:
#@title set up mount symlink

DRIVE_PATH = '/content/gdrive/My\ Drive/cs285_f2023'
DRIVE_PYTHON_PATH = DRIVE_PATH.replace('\\', '')
if not os.path.exists(DRIVE_PYTHON_PATH):
  %mkdir $DRIVE_PATH

## the space in `My Drive` causes some issues,
## make a symlink to avoid this
SYM_PATH = '/content/cs285_f2023'
if not os.path.exists(SYM_PATH):
  !ln -s $DRIVE_PATH $SYM_PATH

In [3]:
#@title apt install requirements

#@markdown Run each section with Shift+Enter

#@markdown Double-click on section headers to show code.

!apt update
!apt install -y --no-install-recommends \
        build-essential \
        curl \
        git \
        gnupg2 \
        make \
        cmake \
        ffmpeg \
        swig \
        libz-dev \
        unzip \
        zlib1g-dev \
        libglfw3 \
        libglfw3-dev \
        libxrandr2 \
        libxinerama-dev \
        libxi6 \
        libxcursor-dev \
        libgl1-mesa-dev \
        libgl1-mesa-glx \
        libglew-dev \
        libosmesa6-dev \
        lsb-release \
        ack-grep \
        patchelf \
        wget \
        xpra \
        xserver-xorg-dev \
        ffmpeg
!apt-get install python-opengl -y
!apt install xvfb -y

[33m0% [Working][0m            Hit:1 http://security.ubuntu.com/ubuntu jammy-security InRelease
[33m0% [Connecting to archive.ubuntu.com (185.125.190.81)] [Connecting to cloud.r-project.org] [Connecti[0m                                                                                                    Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
[33m0% [Waiting for headers] [Connecting to cloud.r-project.org] [Connected to r2u.stat.illinois.edu (19[0m[33m0% [Waiting for headers] [Connecting to cloud.r-project.org] [Waiting for headers] [Connected to ppa[0m                                                                                                    Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
[33m0% [Waiting for headers] [Connecting to cloud.r-project.org] [Waiting for headers] [Connected to ppa[0m                                                                                                    Ign:4 

In [4]:
#@title clone homework repo

%cd $SYM_PATH
!git clone https://github.com/berkeleydeeprlcourse/homework_fall2023.git
%cd homework_fall2023/hw2
%pip install -r requirements.txt
%pip install -e .
%pip install tensorboard

/content/gdrive/My Drive/cs285_f2023
fatal: destination path 'homework_fall2023' already exists and is not an empty directory.
/content/gdrive/My Drive/cs285_f2023/homework_fall2023/hw2
Obtaining file:///content/gdrive/MyDrive/cs285_f2023/homework_fall2023/hw2
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: cs285
  Attempting uninstall: cs285
    Found existing installation: cs285 0.1.0
    Uninstalling cs285-0.1.0:
      Successfully uninstalled cs285-0.1.0
  Running setup.py develop for cs285
Successfully installed cs285-0.1.0


## Editing Code

To edit code, click the folder icon on the left menu. Navigate to the corresponding file (`cs285_f2022/...`). Double click a file to open an editor. There is a timeout of about ~12 hours with Colab while it is active (and less if you close your browser window). We sync your edits to Google Drive so that you won't lose your work in the event of an instance timeout, but you will need to re-mount your Google Drive and re-install packages with every new instance.

## Run Policy Gradients

In [5]:
#@title imports

import os
import time

%load_ext autoreload
%autoreload 2

# Experiment 1: Cartpole-v0

In [None]:
!python cs285/scripts/run_hw2.py --env_name CartPole-v0 -n 100 -b 1000 \
-na --exp_name q1_sb_no_rtg_na

!python cs285/scripts/run_hw2.py --env_name CartPole-v0 -n 100 -b 1000 \
-rtg -na --exp_name q1_sb_rtg_na

!python cs285/scripts/run_hw2.py --env_name CartPole-v0 -n 100 -b 1000 \
-rtg --exp_name q1_sb_rtg_no_na

In [None]:
!python cs285/scripts/run_hw2.py --env_name CartPole-v0 -n 100 -b 5000 -na --exp_name q1_lb_no_rtg_na
!python cs285/scripts/run_hw2.py --env_name CartPole-v0 -n 100 -b 5000 -rtg -na --exp_name q1_lb_rtg_na
!python cs285/scripts/run_hw2.py --env_name CartPole-v0 -n 100 -b 5000 -rtg --exp_name q1_lb_rtg_no_na

# Experiment 2: InvertedPendulum-v4

Your task is to find the smallest batch size b* and largest learning rate r* that gets to optimum
(maximum score of 1000) in less than 100 iterations.
Then subtitute b* and r* in the command below.

In [None]:
!python cs285/scripts/run_hw2.py --env_name InvertedPendulum-v4 \
--ep_len 1000 --discount 0.9 -n 100 -l 2 -s 64 -b <b*> -lr <r*> -rtg \
--exp_name q2_b<b*>_r<r*>

# Experiment 3: LunarLanderContinuous-v2

In [None]:
!python cs285/scripts/run_hw2.py \
--env_name LunarLanderContinuous-v2 --ep_len 1000 \
--discount 0.99 -n 100 -l 2 -s 64 -b 40000 -lr 0.005 \
-rtg --use_baseline --exp_name q3_b40000_r0.005

# Experiment 4: HalfCheetah-v4

You will be using your policy gradient implementation to learn a controller
for the HalfCheetah-v4 benchmark environment with an episode length of 150. This is shorter than the default
episode length (1000), which speeds up training significantly. Search over batch sizes b ∈ [10000, 30000, 50000]
and learning rates r ∈ [0.005, 0.01, 0.02] to replace (b) and (r) below.

Hint: You need to run the following command 9 times to find the best curve!

In [None]:
!python cs285/scripts/run_hw2.py --env_name HalfCheetah-v4 --ep_len 150 \
--discount 0.95 -n 100 -l 2 -s 32 -b <b> -lr <r> -rtg --use_baseline \
--exp_name q4_search_b<b>_lr<r>_rtg_nnbaseline

Once you’ve found optimal values b* and r*, use them to run the following commands (replace the terms in
angle brackets):

In [None]:
!python cs285/scripts/run_hw2.py --env_name HalfCheetah-v4 --ep_len 150 \
--discount 0.95 -n 100 -l 2 -s 32 -b <b*> -lr <r*> \
--exp_name q4_b<b*>_r<r*>

!python cs285/scripts/run_hw2.py --env_name HalfCheetah-v4 --ep_len 150 \
--discount 0.95 -n 100 -l 2 -s 32 -b <b*> -lr <r*> -rtg \
--exp_name q4_b<b*>_r<r*>_rtg

!python cs285/scripts/run_hw2.py --env_name HalfCheetah-v4 --ep_len 150 \
--discount 0.95 -n 100 -l 2 -s 32 -b <b*> -lr <r*> --use_baseline \
--exp_name q4_b<b*>_r<r*>_nnbaseline

!python cs285/scripts/run_hw2.py --env_name HalfCheetah-v4 --ep_len 150 \
--discount 0.95 -n 100 -l 2 -s 32 -b <b*> -lr <r*> -rtg --use_baseline \
--exp_name q4_b<b*>_r<r*>_rtg_nnbaseline

# Visulaize Results

In [None]:
#@markdown You can visualize your runs with tensorboard from within the notebook

%load_ext tensorboard
%tensorboard --logdir /content/cs285_f2023/homework_fall2023/hw2/data