1. Stable_baselines3:

Stable Baselines3 is a set of high-quality implementations of reinforcement learning algorithms in PyTorch. The key aspects are:
   - Provides implementations for popular RL algorithms like PPO, A2C, DDPG, and SAC.
   - Offers a simple interface to train and evaluate agents.
   - Allows customizing policies and features extractors.

In [None]:
!pip install stable-baselines3

2. BaseFeaturesExtractor:

The BaseFeaturesExtractor is an abstract class in Stable Baselines3 used for defining custom feature extractors for your environment observations. Key points include:
   - Inherits from `torch.nn.Module`.
   - Requires implementing `forward()` method to process input observations.
   - Provides a basis for creating custom feature extractors tailored to specific environments.

In [None]:
# Example code block for BaseFeaturesExtractor

from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
import torch

class MyFeaturesExtractor(BaseFeaturesExtractor):
    def __init__(self, observation_space: gym.Space):
        super(MyFeaturesExtractor, self).__init__(observation_space, features_dim=64)
        self.flatten = nn.Flatten()

    def forward(self, observations: torch.Tensor) -> torch.Tensor:
        return self.flatten(observations)

3. CustomFeatureExtractor:

A CustomFeatureExtractor is a user-defined class derived from BaseFeaturesExtractor that processes environment-specific observations into suitable inputs for RL models. Important aspects are:
   - Inherits from `BaseFeaturesExtractor`.
   - Implements the `forward()` method to transform raw observations into features usable by RL models.
   - Can be integrated with existing Stable Baselines3 algorithms by passing it as an argument during model creation.

In [None]:
# Example code block for CustomFeatureExtractor

from stable_baselines3 import PPO
from stable_baselines3.common.policies import ActorCriticPolicy

# Assume MyFeaturesExtractor is defined as before
class CustomPolicy(ActorCriticPolicy):
    def __init__(self, *args, **kwargs):
        super(CustomPolicy, self).__init__(*args, **kwargs,
                                          features_extractor_class=MyFeaturesExtractor,
                                          features_extractor_kwargs=dict(features_dim=64))

model = PPO(CustomPolicy, env)

4. ActorCriticPolicy:

ActorCriticPolicy is an implementation of policy networks in Stable Baselines3 used by actor-critic based algorithms like PPO or A2C. Key points include:
    - Contains two separate networks: one for action selection (actor) and another for estimating state values (critic).
    - Inherits from `BasePolicy`, which provides basic functionality such as saving/loading models.
    - Supports continuous and discrete action spaces.

In [None]:
# Example code block for ActorCriticPolicy

from stable_baselines3 import PPO
from stable_baselines3.common.policies import ActorCriticPolicy

model = PPO(ActorCriticPolicy, env)

5. CustomActorCriticPolicy:

A CustomActorCriticPolicy is a user-defined policy class derived from ActorCriticPolicy that allows customization of the actor-critic architecture for specific environments or use cases. Important aspects are:
   - Inherits from `ActorCriticPolicy`.
   - Can override the `_build_mlp_extractor()` or `_build_cnn_extractor()` methods to customize feature extraction.
   - Can be integrated with existing Stable Baselines3 algorithms by passing it as an argument during model creation.

In [None]:
# Example code block for CustomActorCriticPolicy

class CustomActorCriticPolicy(ActorCriticPolicy):
    def _build_mlp_extractor(self) -> None:
        self.mlp_extractor = MyCustomMLPExtractor(self.features_dim, self.mlp_extractor_kwargs)

model = PPO(CustomActorCriticPolicy, env)