# Setting up the `BipedalWalker-v3` environment

The `gym` environment `BipedalWalker-v3` simulates a bipedal robot standing on an uneven surface. The goal is to teach the robot how to walk.

![BipedalWalker-v3](bw.png)

In the previous lesson, I showed you how to set up the `CartPole-v1` environment. Now, you are going to follow the same steps to set up the `BipedalWalker-v3` environment.

Ready? Let's go!

In [1]:
import gym
import numpy as np
from pathlib import Path

# !pip install Box2D
from stable_baselines3 import DQN, PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.monitor import Monitor 

In [2]:
# Create the BipedalWalker-v3 environment and store it in a variable called env
# Create folder to save models
directory_path = 'models'
Path(directory_path).mkdir(parents=True, exist_ok=True)

# Create environment
env_name = 'BipedalWalker-v3'
env = gym.make(env_name)

num_steps = 5_0#00_000
model_file_name = Path(directory_path, env_name + '_' + str(num_steps))
print(env.action_space)
print(env.observation_space)

Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
Box([-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf], [inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf], (24,), float32)


In [3]:
# Reset the simulation to its initial state (this will start the simulation)
def simulate_random_actions(render=False):
    episodes = 10
    all_rewards = []
    for episode in range(1, episodes):
        state = env.reset() # Restart the agent at the beginning
        done = False # If the agent has completed the level
        score = 0 # Called score not return cause it's python
        while not done:
            if render:
                env.render()
            random_action = env.action_space.sample() # Do random actions
            _, reward, done, _ = env.step(random_action) 
            score += reward
        all_rewards.append(score)
        print(f'Episode n= {episode}, score= {score}')
    env.reset()   
    env.close()
    print(f"Mean reward:{np.mean(all_rewards)} Num episodes:{episodes}")
simulate_random_actions()

Episode n= 1, score= -83.79130894064167
Episode n= 2, score= -101.9939853315934
Episode n= 3, score= -108.91581433144042
Episode n= 4, score= -113.7671551550962
Episode n= 5, score= -106.72043106108593
Episode n= 6, score= -108.14981128230225
Episode n= 7, score= -79.92826934795367
Episode n= 8, score= -79.0114705365611
Episode n= 9, score= -116.3754665541115
Mean reward:-99.85041250453179 Num episodes:10


In [4]:
env = DummyVecEnv([lambda: env])
model = PPO(policy = 'MlpPolicy', env = env)

In [5]:
# Train the agent
model.learn(total_timesteps = num_steps)

<stable_baselines3.ppo.ppo.PPO at 0x7ff90b1155e0>

In [6]:
model.save(model_file_name)

In [7]:
mean_reward, std_reward = evaluate_policy(model,  env , n_eval_episodes=10)
print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")



mean_reward:-92.04 +/- 0.07


In [8]:
# Visualize the initial state
env = gym.make(env_name)
obs = env.reset()
for _ in range(500):
    env.render()
    action, _states = model.predict(obs, deterministic=True)
    obs, rewards, dones, info = env.step(action)
env.reset()   
env.close()

2022-05-13 11:54:05.912 Python[46048:1446428] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to (null)


Do you see a pop-up window showing a bipedal robot standing on an grassy surface? If yes, then congratulations! You have successfully set up and started the `BipedalWalker-v3` environment. 

You can close the window by calling `env.close()`.

If you are feeling brave, you can try setting up and visualizing other environments in the [Box2D](https://gym.openai.com/envs/#box2d) and [Classic Control](https://gym.openai.com/envs/#classic_control) sections.