# Tutorial: Tracking in Reinforcement Learning

This notebook explains the `tracking_in_RL` experiment. This experiment demonstrates how to use a custom model architecture as a backbone for an RL agent and how to track metrics like the rank of the feature maps during training.

## 1. The Configuration

The configuration in `cfg/config.yaml` is set up for a more complex task (`BipedalWalker-v3`) and enables custom features.

In [None]:
!cat cfg/config.yaml

Key differences from the basic experiment:
- **`env.id`**: Set to `BipedalWalker-v3`, a more challenging environment.
- **`learner.policy`**: Set to `MlpPolicy`. We will provide a custom feature extractor to this policy.
- **`net` section**: This section is now used to define the custom backbone. `net.type` is set to `rl_mlp_backbone`, which is a custom feature extractor defined in `src/models/rl_backbones.py`. `net.netparams` contains the hyperparameters for this backbone, like `features_dim`.
- **`tracking` section**: This section is enabled. `track_rank_freq` controls how often the rank of the features is computed and logged.
- **`logging.use_wandb`**: Set to `True` to log the results to Weights & Biases.

## 2. Custom Backbone and Tracking

The `train.py` script contains two key features:

### Custom Backbone
The script checks if a `net` configuration is present. If so, it uses the `model_factory` to get the specified feature extractor class (`rl_mlp_backbone` in this case). It then constructs a `policy_kwargs` dictionary that tells `stable-baselines3` to use this class for feature extraction. This allows you to experiment with different backbone architectures by simply changing the configuration.

### Rank Tracking
A custom callback, `RankTrackingCallback`, is defined in the script. This callback is triggered periodically during training. It extracts the feature tensor from the policy's backbone, computes various rank metrics using functions from `src/utils/zeroth_order_features.py`, and logs these metrics to `wandb`. This allows you to monitor the internal dynamics of the network during training.

## 3. Running the Experiment

To run this experiment, you need to have a `wandb` account and be logged in. You can run the `train.py` script from the command line or from this notebook.

**Note**: This experiment runs for more timesteps and can take some time to complete.

In [None]:
# Make sure to log in to wandb first if you haven't already
# import wandb
# wandb.login()

!python train.py

## 4. Viewing the Results

After running the experiment, you can go to your `wandb` project to see the results. You will find the standard RL metrics (reward, loss, etc.) as well as the custom rank metrics that were logged by the `RankTrackingCallback`.