# Tutorial: Basic Reinforcement Learning Experiment

This notebook provides a step-by-step guide to running the basic reinforcement learning (RL) experiment in this repository. The experiment uses the `CartPole-v1` environment from `gymnasium` and the PPO algorithm from `stable-baselines3`.

## 1. Understanding the Configuration

The configuration for this experiment is defined in `cfg/config.yaml`. Let's take a look at its contents:

In [None]:
!cat cfg/config.yaml

The configuration file is split into several sections:
- **General settings**: `seed`, `device`, `total_timesteps`.
- **Environment settings**: `env.id` specifies the `gymnasium` environment to use.
- **Learner settings**: `learner.type` specifies the RL algorithm (`ppo`), `learner.policy` is the policy network type, and `learner.model_kwargs` contains the hyperparameters for the `stable-baselines3` model.
- **Net settings**: This section is for defining custom policy networks. It's not used in this basic experiment, but it will be used in the `tracking_in_RL` experiment.
- **Logging settings**: For logging with `wandb`.

## 2. Running the Experiment

The `train.py` script is the entry point for the experiment. It loads the configuration, creates the environment and the learner, and starts the training.

You can run the experiment from the command line, or directly from this notebook.

In [None]:
!python train.py

## 3. Understanding the Output

The script will print the configuration and then start training. `stable-baselines3` provides its own logging output, showing metrics like the episode reward, loss, and other algorithm-specific values.

After training, the script will save the trained model to a file called `model.zip` in the `outputs` directory (in a subdirectory created by `hydra`).