# Environments in Stable-Baselines3

In this tutorial, we will discuss environments in Stable-Baselines3, a popular reinforcement learning library. We will explore how to create, use, and customize environments for training and evaluating reinforcement learning models.

## Setup

First, let's install stable-baselines3 and gym if you haven't already.

In [None]:
!pip install stable-baselines3 gym

## Creating an Environment

Stable-Baselines3 uses OpenAI's Gym library for environment creation. The `gym.make()` function allows us to create an environment instance by passing the environment's ID. Let's create an instance of the CartPole environment.

In [None]:
import gym

env = gym.make('CartPole-v1')
print(env)

## Interacting with the Environment

We can interact with the environment using the `reset()` method to initialize the environment and get the initial observation, and `step()` method to perform an action and get the next observation, reward, and other information.

In [None]:
observation = env.reset()
print('Initial observation:', observation)

action = env.action_space.sample()
observation, reward, done, info = env.step(action)
print('Next observation:', observation)
print('Reward:', reward)
print('Done:', done)
print('Info:', info)

## Customizing Environments

To create a custom environment, you can subclass the `gym.Env` class and implement the required methods: `__init__`, `step`, `reset`, and `render`. Additionally, you need to define the action and observation spaces using Gym's `Space` classes, such as `Discrete`, `Box`, or `Tuple`.

In [None]:
from gym import spaces

class CustomEnvironment(gym.Env):
    def __init__(self):
        super(CustomEnvironment, self).__init__()
        self.observation_space = spaces.Box(low=0, high=100, shape=(2,))
        self.action_space = spaces.Discrete(3)
    
    def step(self, action):
        # Implement your environment's step logic here
        pass
    
    def reset(self):
        # Implement your environment's reset logic here
        pass
    
    def render(self, mode='human'):
        # Implement your environment's rendering logic here
        pass

## Training with Stable-Baselines3

Once you have an environment, you can use it to train a reinforcement learning model using Stable-Baselines3. Here's an example using the PPO algorithm on the CartPole environment.

In [None]:
from stable_baselines3 import PPO

model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

## Evaluating the Trained Model

After training the model, we can evaluate its performance by running it on the environment and calculating the total reward for each episode.

In [None]:
num_episodes = 5

for episode in range(1, num_episodes + 1):
    obs = env.reset()
    done = False
    episode_reward = 0

    while not done:
        action, _states = model.predict(obs, deterministic=True)
        obs, reward, done, info = env.step(action)
        episode_reward += reward

    print(f'Episode {episode}: Reward = {episode_reward}')

## Next Steps

Now that you have learned how to create, use, and customize environments in Stable-Baselines3, you can further explore different algorithms, experiment with hyperparameters, and create more complex custom environments. Also, you can dive into the Stable-Baselines3 documentation for more advanced topics and best practices.