Last time, we looked at an example reinforcement problem that balanced an object in space following  [this tutorial](https://youtu.be/cO5g5qLrLSo).

In this lab, you will choose a reinforcement learning problem to explore. Here are some suggestions for problems that you can investigate.

1. Autonomous driving: Building a racing car with reinforcement learning. [Tutorial](https://youtu.be/Mut_u40Sqz4?t=6020), [code on github](https://github.com/nicknochnack/ReinforcementLearningCourse/blob/main/Project%202%20-%20Self%20Driving.ipynb). 
2. Custom environments: Shower environment to get the temperature right every time. [Tutorial](https://youtu.be/Mut_u40Sqz4?t=6020), [code on github](https://github.com/nicknochnack/ReinforcementLearningCourse/blob/main/Project%203%20-%20Custom%20Environment.ipynb).
3. Some other options to check out later if interested:
    1. Solving the Lunar Landing Problem using Stable Baselines algorithm: [tutorial](https://youtu.be/nRHjymV2PX8), [code on github](https://github.com/nicknochnack/StableBaselinesRL). ACER is only available in a previous version of stable_baselines that is compatible with tensorflow 1.5, which is not available on edStem.
    2. Datasets for Deep Data-Driven Reinforcement Learning (D4RL): [environments description](https://sites.google.com/view/d4rl/home), [code on github](https://github.com/rail-berkeley/d4rl).

**Note:** This is a rough guide with the general mains steps in a reinforcement learning program. Please add more sections as your implementation requires, with comments describing each section.

# Problem description
Enter in the text cell below the problem that you chose to solve with reinforcement learning.

The problem I decided to solve with reinforment learning was the autonomous drivinf problem since my project group want to focus on the topic of autonomous driving.

# **Build an RL environment**

**1. Install packages**

Note: Please inform the TA of any additional packages that you need to install for the problem that you selected.

In [14]:
import os
if not os.getenv("ED_COURSE_ID"):
    !pip install tensorflow stable_baselines3 torch collections gym box2d-py --user
# in order to leverage the racing car environment, we do need to install swig
# Install SWIG https://sourceforge.net/projects/swig/files/swigwin/swigwin-4.0.2/swigwin-4.0.2.zip/download?use_mirror=ixpeering
!pip install gym[box2d] pyglet==1.3.2
#racing car environment is built on top of box2d

Defaulting to user installation because normal site-packages is not writeable
Collecting pyglet==1.3.2
  Using cached pyglet-1.3.2-py2.py3-none-any.whl (1.0 MB)
Collecting box2d-py==2.3.5
  Using cached box2d-py-2.3.5.tar.gz (374 kB)
Collecting gym[box2d]
  Using cached gym-0.21.0.tar.gz (1.5 MB)
  Using cached gym-0.20.0.tar.gz (1.6 MB)
  Using cached gym-0.19.0.tar.gz (1.6 MB)
Collecting cloudpickle<1.7.0,>=1.2.0
  Using cached cloudpickle-1.6.0-py3-none-any.whl (23 kB)
Collecting box2d-py~=2.3.5
  Using cached box2d-py-2.3.8.tar.gz (374 kB)
Collecting gym[box2d]
  Using cached gym-0.18.3.tar.gz (1.6 MB)
  Using cached gym-0.18.0.tar.gz (1.6 MB)
  Using cached gym-0.17.3.tar.gz (1.6 MB)
  Using cached gym-0.17.2.tar.gz (1.6 MB)
  Using cached gym-0.17.1.tar.gz (1.6 MB)
  Using cached gym-0.17.0.tar.gz (1.6 MB)
  Using cached gym-0.16.0.tar.gz (1.6 MB)
  Using cached gym-0.15.7.tar.gz (1.6 MB)
  Using cached gym-0.15.6.tar.gz (1.6 MB)
  Using cached gym-0.15.4.t

**1.b import packages**

In [15]:
# Add your code here to import all needed packages. 
# Contact the instructor if you get an error.
import gym 
# importing open ai gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3.common.evaluation import evaluate_policy
import os

**2. Create the environment**

In [16]:
environment_name = "CarRacing-v0"
env = gym.make(environment_name)
# we're creating our environment

**3. Test the environment with random policy**

In [17]:
# Trigger Ed's X display
!xdpyinfo

# Add your code here to display the environment with random choice
episodes = 5
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0 
    
    while not done:
        env.render()
        action = env.action_space.sample()
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))
env.close()
# this is our testing code!

name of display:    :1.0
version number:    11.0
vendor string:    The X.Org Foundation
vendor release number:    12009000
X.Org version: 1.20.9
maximum request size:  16777212 bytes
motion buffer size:  256
bitmap unit, bit order, padding:    32, LSBFirst, 32
image byte order:    LSBFirst
number of supported pixmap formats:    6
supported pixmap formats:
    depth 1, bits_per_pixel 1, scanline_pad 32
    depth 4, bits_per_pixel 8, scanline_pad 32
    depth 8, bits_per_pixel 8, scanline_pad 32
    depth 16, bits_per_pixel 16, scanline_pad 32
    depth 24, bits_per_pixel 32, scanline_pad 32
    depth 32, bits_per_pixel 32, scanline_pad 32
keycode range:    minimum 8, maximum 255
focus:  window 0x40020b, revert to PointerRoot
number of extensions:    23
    BIG-REQUESTS
    Composite
    DAMAGE
    DOUBLE-BUFFER
    GLX
    Generic Event Extension
    MIT-SCREEN-SAVER
    MIT-SHM
    Present
    RANDR
    RECORD
    RENDER
    SHAPE
    SYNC
    VNC-EXTE

In [18]:
env.close()
# closes the environment

In [19]:
env.action_space.sample()

array([0.7407547 , 0.9964477 , 0.78228635], dtype=float32)

In [20]:
env.observation_space.sample()

array([[[194,  10,   5],
        [110,  97,  95],
        [162,  14, 175],
        ...,
        [ 30,  68, 182],
        [ 41, 243, 148],
        [116, 249, 246]],

       [[203, 204, 109],
        [104,  52,  38],
        [254,  71,  79],
        ...,
        [104, 124,  45],
        [ 67, 200,  70],
        [140, 135, 178]],

       [[105, 178, 200],
        [ 18, 250,  76],
        [184, 170,  25],
        ...,
        [224, 157, 148],
        [137,  34,  55],
        [135, 149,  69]],

       ...,

       [[ 33,  65,  39],
        [131,  41,  60],
        [ 21, 222, 157],
        ...,
        [249,   4,  34],
        [104, 220, 164],
        [235, 171,  45]],

       [[ 74,  63, 240],
        [ 75, 168, 157],
        [ 71, 211,  12],
        ...,
        [141,  84, 173],
        [ 34,  58,   4],
        [123, 137,  81]],

       [[244,   3, 134],
        [ 49,  16, 172],
        [145,  65, 207],
        ...,
        [179,  70,  19],
        [220, 206,  75],
        [  7, 124, 118]]

# **Build and Train the Model**

**4. Build the training model**

In [21]:
# env = gym.make(environment_name)
# env = VecFrameStack([lambda: env])
# instantiating and setting up our environment

In [22]:
# setting up our agent and model
log_path = os.path.join('Training', 'Logs')
# specifying our logging path
model = PPO("CnnPolicy", env, verbose=1, tensorboard_log=log_path)
# specificied our agent

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.


In [0]:
model.learn(total_timesteps=70000)
# here we're training our model for 70000 steps

Track generation: 1040..1304 -> 264-tiles track
Logging to Training/Logs/PPO_1
Track generation: 1282..1607 -> 325-tiles track


# **Save, Reload and Evaluate the model**

**5. Save the model**

In [0]:
ppo_path = os.path.join('Training', 'Saved Models', 'PPO_Driving_model')

In [0]:
model.save(ppo_path)

**6. Reload the model**

In [0]:
# Add your code here to test the system:
# Reload the model
# Evaluate and test
del model

In [0]:
model = PPO.load(ppo_path, env)

**7. Display the environment**

In [0]:
# Trigger Ed's X display
!xdpyinfo
# Add loop here to display the smart agent!
evaluate_policy(model, env, n_eval_episodes=10, render=True)

In [0]:
env.close()

In [0]:
obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

In [0]:
env.close()