<a href="https://colab.research.google.com/github/Parviz-S/deep-reinforcement-learning/blob/main/train_dog_to_catch_stick.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training a dog to fetch a stick

### The environment

- Huggy the Dog, an environment created by [Thomas Simonini](https://twitter.com/ThomasSimonini) based on [Puppo The Corgi](https://blog.unity.com/technology/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit)

### The library used

- [MLAgents](https://github.com/Unity-Technologies/ml-agents)

## Objectives of this notebook

At the end of the notebook, we will:

- Understand **the state space, action space and reward function used to train Huggy**.
- **Train our own Huggy** to fetch the stick.
- Be able to play **with our trained Huggy directly in our browser**.




### (Google Colab) Hardware Accelerator
- T4 GPU (for faster training)

## Clone the repository and install the dependencies

- We need to clone the repository, that contains **ML-Agents.**

In [2]:
%%capture
# Clone the repository (can take 3min)
!git clone --depth 1 https://github.com/Unity-Technologies/ml-agents

In [3]:
%%capture
# Go inside the repository and install the package (can take 3min)
%cd ml-agents
!pip3 install -e ./ml-agents-envs
!pip3 install -e ./ml-agents

## Download and move the environment zip file in `./trained-envs-executables/linux/`

- Our environment executable is in a zip file.
- We need to download it and place it to `./trained-envs-executables/linux/`

In [4]:
!mkdir ./trained-envs-executables
!mkdir ./trained-envs-executables/linux

We download the file Huggy.zip from https://github.com/huggingface/Huggy using `wget`

In [5]:
!wget "https://github.com/huggingface/Huggy/raw/main/Huggy.zip" -O ./trained-envs-executables/linux/Huggy.zip

--2024-11-08 01:59:32--  https://github.com/huggingface/Huggy/raw/main/Huggy.zip
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://media.githubusercontent.com/media/huggingface/Huggy/main/Huggy.zip [following]
--2024-11-08 01:59:32--  https://media.githubusercontent.com/media/huggingface/Huggy/main/Huggy.zip
Resolving media.githubusercontent.com (media.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to media.githubusercontent.com (media.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39214997 (37M) [application/zip]
Saving to: ‘./trained-envs-executables/linux/Huggy.zip’


2024-11-08 01:59:35 (136 MB/s) - ‘./trained-envs-executables/linux/Huggy.zip’ saved [39214997/39214997]



In [6]:
%%capture
!unzip -d ./trained-envs-executables/linux/ ./trained-envs-executables/linux/Huggy.zip

We need to make sure our file is accessible

In [7]:
!chmod -R 755 ./trained-envs-executables/linux/Huggy

## Let's recap how this environment works

### The State Space: what Huggy "perceives."

Huggy doesn't "see" his environment. Instead, we provide him information about the environment:

- The target (stick) position
- The relative position between himself and the target
- The orientation of his legs.

Given all this information, Huggy **can decide which action to take next to fulfill his goal**.


### The Action Space: what moves Huggy can do

**Joint motors drive huggy legs**. It means that to get the target, Huggy needs to **learn to rotate the joint motors of each of his legs correctly so he can move**.

### The Reward Function

The reward function is designed so that **Huggy will fulfill his goal** : fetch the stick.

Let's not forget that one of the foundations of Reinforcement Learning is the *reward hypothesis*: a goal can be described as the **maximization of the expected cumulative reward**.

Here, our goal is that Huggy **goes towards the stick but without spinning too much**. Hence, our reward function must translate this goal.

Our reward function:
- *Orientation bonus*: we **reward him for getting close to the target**.
- *Time penalty*: a fixed-time penalty given at every action to **force him to get to the stick as fast as possible**.
- *Rotation penalty*: we penalize Huggy if **he spins too much and turns too quickly**.
- *Getting to the target reward*: we reward Huggy for **reaching the target**.

## Create the Huggy config file

- In ML-Agents, we define the **training hyperparameters into config.yaml files.**

- For the scope of this notebook, we're not going to modify the hyperparameters, but we could modify some other hyperparameters, Unity provides very [good documentation explaining each of them here](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md).

- But we need to create a config file for Huggy.  [in Colab]:

  - To do that we click on Folder logo on the left of our screen.

  - Wo go to `/content/ml-agents/config/ppo`
  - We right mouse click and create a new file called `Huggy.yaml`

- We copy and paste the content below and save the file:
```
behaviors:
  Huggy:
    trainer_type: ppo
    hyperparameters:
      batch_size: 2048
      buffer_size: 20480
      learning_rate: 0.0003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 512
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.995
        strength: 1.0
    checkpoint_interval: 200000
    keep_checkpoints: 15
    max_steps: 2e6
    time_horizon: 1000
    summary_freq: 50000
```

- **In the case we want to modify the hyperparameters**, in Google Colab notebook, we can click here to open the config.yaml: `/content/ml-agents/config/ppo/Huggy.yaml`

- For instance **if we want to save more models during the training** (for now, we save every 200,000 training timesteps). We need to modify:
  - `checkpoint_interval`: The number of training timesteps collected between each checkpoint.
  - `keep_checkpoints`: The maximum number of model checkpoints to keep.

=> We need to keep in mind that **decreasing the `checkpoint_interval` means more models to upload to the Hub and so a longer uploading time**

## Train our agent

To train our agent, we just need to **launch mlagents-learn and select the executable containing the environment.**

With ML Agents, we run a training script. We define four parameters:

1. `mlagents-learn <config>`: the path where the hyperparameter config file is.
2. `--env`: where the environment executable is.
3. `--run-id`: the name we want to give to your training run id.
4. `--no-graphics`: to not launch the visualization during the training.

We train the model and use the `--resume` flag to continue training in case of interruption.

> It will fail first time when we use `--resume`, we can try running the block again to bypass the error.



In [9]:
!mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id="Huggy2" --no-graphics

2024-11-08 02:01:34.457236: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-08 02:01:34.489410: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-08 02:01:34.499160: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-08 02:01:34.520952: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖

## Push the agent to the Hub and play with Huggy the Dog trained with our model.

- Now that we trained our agent, we **need to push it to the Hub to be able to play with Huggy on our browser.**

- We need to create a new token with write role and copy it.

- We need to paste the token after we run the following code cell:

In [10]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

If we don't want to use a Google Colab or a Jupyter Notebook, we need to use this command instead: `huggingface-cli login`

Then, we simply need to run `mlagents-push-to-hf`.

And we define 4 parameters:

1. `--run-id`: the name of the training run id.
2. `--local-dir`: where the agent was saved, it’s results/<run_id name>, so in my case results/First Training.
3. `--repo-id`: the name of the Hugging Face repo you want to create or update. It’s always <your huggingface username>/<the repo name>
If the repo does not exist **it will be created automatically**
4. `--commit-message`: since HF repos are git repository you need to define a commit message.

In [11]:
!mlagents-push-to-hf --run-id="HuggyTraining" --local-dir="./results/Huggy2" --repo-id="parviz-s/ppo-Huggy" --commit-message="Huggy"

[INFO] This function will create a model card and upload your HuggyTraining into HuggingFace Hub. This is a work in progress: If you encounter a bug, please send open an issue
[INFO] Pushing repo HuggyTraining to the Hugging Face Hub
Huggy-1199948.pt:   0% 0.00/13.5M [00:00<?, ?B/s]
Huggy-1199948.onnx:   0% 0.00/2.27M [00:00<?, ?B/s][A

Huggy-1399506.pt:   0% 0.00/13.5M [00:00<?, ?B/s][A[A



Huggy-1399506.onnx:   0% 0.00/2.27M [00:00<?, ?B/s][A[A[A[A


Upload 25 LFS files:   0% 0/25 [00:00<?, ?it/s][A[A[A




Huggy-1599810.onnx:   0% 0.00/2.27M [00:00<?, ?B/s][A[A[A[A[A

Huggy-1199948.pt:  12% 1.62M/13.5M [00:00<00:00, 15.2MB/s]
Huggy-1199948.onnx:  24% 541k/2.27M [00:00<00:00, 3.11MB/s][A



Huggy-1399506.onnx:  68% 1.56M/2.27M [00:00<00:00, 9.06MB/s][A[A[A[A
Huggy-1599810.onnx: 100% 2.27M/2.27M [00:00<00:00, 4.58MB/s]
Huggy-1399506.pt: 100% 13.5M/13.5M [00:00<00:00, 21.7MB/s]
Huggy-1199948.pt:  23% 3.15M/13.5M [00:00<00:02, 4.37MB/s]
Huggy-1199948.onnx:  48% 1.10

Now we have the model in the Hub,
- we can open the game Huggy in our browser: https://huggingface.co/spaces/ThomasSimonini/Huggy
- click on Play with my Huggy model
- type our username and click on the search button
- select our model repository
- as we saved a model every x steps, we can choose which one we want to use (this way we can see the improvement of the model by trying out earlier ones and compare them to latest ones)
