# Model Training and Evaluation using stable_baselines3

In this tutorial, we will learn how to train and evaluate reinforcement learning models using the `stable_baselines3` library, which is built on top of PyTorch. We will cover the following steps:

1. Installing stable_baselines3
2. Importing necessary libraries
3. Creating a custom gym environment
4. Training the model
5. Evaluating the model
6. Saving and loading the model
7. Practical application
8. Next steps

## Step 1: Installing stable_baselines3

To install stable_baselines3, run the following command:

In [None]:
!pip install stable-baselines3

## Step 2: Importing necessary libraries

First, we need to import the necessary libraries for our tutorial.

In [None]:
import gym
import numpy as np
from stable_baselines3 import PPO

## Step 3: Creating a custom gym environment

For this tutorial, we will use the `CartPole-v1` environment from the OpenAI Gym. This environment provides a simple and classic reinforcement learning problem.

In [None]:
env = gym.make('CartPole-v1')

## Step 4: Training the model

Now, we will create a PPO agent and train it on the `CartPole-v1` environment. We'll use the default hyperparameters provided by stable_baselines3.

In [None]:
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=50000)

## Step 5: Evaluating the model

Once the model is trained, we can evaluate its performance by running it in the environment and observing the total reward it receives over multiple episodes.

In [None]:
def evaluate(model, num_episodes=100):
    episode_rewards = []
    for _ in range(num_episodes):
        obs = env.reset()
        done = False
        episode_reward = 0
        while not done:
            action, _ = model.predict(obs)
            obs, reward, done, _ = env.step(action)
            episode_reward += reward
        episode_rewards.append(episode_reward)
    return np.mean(episode_rewards), np.std(episode_rewards)

mean_reward, std_reward = evaluate(model)
print(f'Mean reward: {mean_reward}, Standard deviation: {std_reward}')

## Step 6: Saving and loading the model

We can save the trained model to a file and load it later for reuse or further training.

In [None]:
model.save('ppo_cartpole')
loaded_model = PPO.load('ppo_cartpole')

## Step 7: Practical application

Now that we have a trained model, we can use it for various practical applications, such as controlling a robotic arm, playing a game, or optimizing a system. In this example, we will visualize the trained model controlling the CartPole environment.

In [None]:
from IPython import display
import matplotlib.pyplot as plt

def visualize_agent(model, num_episodes=1):
    for _ in range(num_episodes):
        obs = env.reset()
        done = False
        while not done:
            plt.imshow(env.render(mode='rgb_array'))
            display.display(plt.gcf())
            display.clear_output(wait=True)
            action, _ = model.predict(obs)
            obs, _, done, _ = env.step(action)
        env.close()

visualize_agent(loaded_model)

## Step 8: Next steps

Congratulations! You have successfully trained and evaluated a reinforcement learning model using stable_baselines3. To further your knowledge, consider exploring the following topics:

- Experiment with other environments and algorithms in stable_baselines3
- Learn how to create custom gym environments for your specific problems
- Fine-tune the hyperparameters of the model to improve performance
- Learn about other reinforcement learning libraries and frameworks