In this notebook you can train a student through DAgger for [FrankaKitchen](https://robotics.farama.org/envs/franka_kitchen/franka_kitchen/) environment. You should set:
- `seed`: For reproducibility of training runs

In [5]:
#@title Inizialization and import
import shutil
shutil.rmtree('DAgger4Robotics', ignore_errors=True)
!git clone "https://github.com/cybernetic-m/DAgger4Robotics.git"
!pip install gymnasium-robotics -q
!apt-get install -y xvfb ffmpeg -q
!pip install pyvirtualdisplay imageio -q
!pip install "minari[all]" -q

import gymnasium as gym
import gymnasium_robotics
gym.register_envs(gymnasium_robotics)
from gymnasium.wrappers import RecordVideo
import torch
import torch.nn as nn
import json
import os
import numpy as np
import minari # needed for dataset
import matplotlib.pyplot as plt

from DAgger4Robotics.model.NetworkInterface import NetworkInterface
from DAgger4Robotics.dagger.DAgger import DAgger
from DAgger4Robotics.utils.preprocess_dataset import preprocess_dataset
from DAgger4Robotics.simulator.Simulator import Simulator



device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

# Reproducibility instructions
seed=42 #@param {type:"integer"}
np.random.seed(seed)                   # NumPy
torch.manual_seed(seed)                # PyTorch CPU
torch.cuda.manual_seed(seed)           # PyTorch GPU
torch.cuda.manual_seed_all(seed)       # All GPUs

torch.backends.cudnn.deterministic = True   # Deterministic behaviour
torch.backends.cudnn.benchmark = False      # Avoid non-deterministic optimizations

# Dimensions of the problem
state_dim = 20
action_dim = 9

Cloning into 'DAgger4Robotics'...
remote: Enumerating objects: 861, done.[K
remote: Counting objects: 100% (93/93), done.[K
remote: Compressing objects: 100% (89/89), done.[K
remote: Total 861 (delta 52), reused 18 (delta 4), pack-reused 768 (from 1)[K
Receiving objects: 100% (861/861), 163.21 MiB | 21.48 MiB/s, done.
Resolving deltas: 100% (330/330), done.
Reading package lists...
Building dependency tree...
Reading state information...
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
xvfb is already the newest version (2:21.1.4-2ubuntu1.7~22.04.15).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.
[0mcuda


**THIS DATASET IS USED ONLY FOR EVALUATION**

In [2]:
#@title Loading and preprocess dataset
# Download the dataset
kitchen_dataset_type = 'complete'
validation_dataset = minari.load_dataset(f'D4RL/kitchen/{kitchen_dataset_type}-v2',download=True)
validation_dataset.set_seed(seed)  # Set a seed for reproducibility
print("Dataset loaded successfully!")
print(f'Total Episodes: {validation_dataset.total_episodes}')

# We preprocess the dataset in a way to extract a subset of observation needed only for the microwave task (open it)
# Then preprocess_dataset function will cut the complete kitchen dataset to the step in which the robot has completed the opening
# of the microwave (that is the first operation that it do in the env). In our cutted version of the dataset each obs, action is formed:
# Observation (20-dim):
#              - 0-6 joint angles
#              - 7-8 gripper joint translation value (LEFT-RIGHT pieces)
#              - 9-15 joint angular velocities
#              - 16-17 gripper joint linear velocity (LEFT-RIGHT pieces)
#              - 31  Rotation of the joint in the microwave door (angle)
#              - 52 Angular velocity of the microwave door joint
# Actions (9-dim):
#              - 0-6 joint angular velocities
#              - 7-8 gripper joint linear velocity (LEFT-RIGHT)
# You can access to the single episode and to the single step using 'microwave_dataset['observations/actions'][epi][step]'

#Using all the episodes to validate the new student
validation_microwave_dataset = preprocess_dataset(validation_dataset)

Dataset loaded successfully!
Total Episodes: 19


### Training Student Hyperparameters
This part of code runs and tests the DAgger algorithm. You must set
- `batch_size`: Number of samples state-action per batch
- `lr`: Learning rate for the optimizer
- `num_epochs`: Number of training epochs
- `rollouts_per_iteration`: Number of generated episodes in each iteration of DAgger
- `n_iterations`: Number of iteration of DAgger
- `betaMode`: The type of function that manages the decay of Beta. It can be `'inverse'`, `'linear'`, `'exponential'`
- `exponential_beta_k`: Exponent used in the beta function when betaMode is set to `'exponential'`
- `student_type`: It can be `'simple'` or `'deep'`, but generally `'simple'` for students

### Loading a different expert
If you want to use a different expert , you can load it manually by changing:
- `path_to_expert_model`: Insert here the complete path to the model provided by `Dagger4Robotics` folder
- `expert_type`: The architecture type of the loaded model (`"simple"` or `"deep"`)

In [3]:
#@title Run the DAgger Algorithm
loss_fn = nn.MSELoss()

# Hyperparameters for training the student
lr = 1e-3  #@param {type:"number"}
batch_size = 512 #@param {type:"integer"}
num_epochs = 5 #@param {type:"integer"}

# Number of generated episodes in each iteration of DAgger
rollouts_per_iteration=20 #@param {type:"integer"}

# Number of iteration of DAgger
n_iterations=20 #@param {type:"integer"}


# The type of function that manages the value of Beta. It can be 'inverse', 'linear', 'exponential'
betaMode = 'exponential' #@param ["inverse","linear","exponential"]

# Exponent used in the beta function when betaMode is set to 'exponential'
exponential_beta_k = 0.3 #@param {type:"number"}

# It can be 'simple' or 'deep, but generally 'simple' for students
student_type='simple' #@param ["simple","deep"]

# Initializing the student
net_wrapper = NetworkInterface(net_type=student_type,input_dim=state_dim,output_dim=action_dim)
net_wrapper.summary()
pi = net_wrapper.get_model().to(device)

# Loading pre-trained expert/teacher
expert_type='deep'
net_wrapper = NetworkInterface(net_type=expert_type,input_dim=state_dim,output_dim=action_dim)
net_wrapper.summary()
pi_star = net_wrapper.get_model().to(device)
pi_star.load_state_dict(torch.load('./DAgger4Robotics/experts_kitchen/deep/kitchen_complete/batch_size_64_lr_1e-3/expert_policy.pt', map_location=device))
pi_star.eval()

env = gym.make('FrankaKitchen-v1', tasks_to_complete=['microwave'])
optimizer = torch.optim.Adam(pi.parameters(), lr=lr)

# Initializing the DAgger algorithm
dagger = DAgger(env = env,
                env_type = 'kitchen',
                validationDataset= validation_microwave_dataset,
                studentPolicy = pi,
                expertPolicy = pi_star,
                optimizer=optimizer,
                loss_fn = loss_fn,
                batch_size = batch_size,
                num_epochs = num_epochs,
                betaMode = betaMode,
                device = device,
                rollouts_per_iteration=rollouts_per_iteration,
                exponential_beta_k = exponential_beta_k
                )

# Running the training
dagger.run(n_iterations=n_iterations)

SimplePolicyNet(
  (net): Sequential(
    (0): Linear(in_features=20, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=128, bias=True)
    (3): ReLU()
    (4): Linear(in_features=128, out_features=9, bias=True)
  )
)
DeepPolicyNet(
  (net): Sequential(
    (0): Linear(in_features=20, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=512, bias=True)
    (5): ReLU()
    (6): Linear(in_features=512, out_features=256, bias=True)
    (7): ReLU()
    (8): Linear(in_features=256, out_features=9, bias=True)
  )
)
Validation Dataset correctly loaded.
Obs Space Dim: 1058, Action Space Dim: 1058
Run DAgger algorithm...

--- ITERATION 1/20 | beta = 1.000 ---
Rollout... 
The Expert was selected 2251 times, while the student was selected 0 times
Collected data - Observations: 2251 Actions: 2251
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 5/5 [00:00<00:00, 43.24it/s]


TRAIN	 Loss: 0.06956767, RMSE: 0.21982442, MAE: 0.18712765, R2: -2.55954194
EPOCH 1:


Elements...: 100%|██████████| 5/5 [00:00<00:00, 414.80it/s]


TRAIN	 Loss: 0.05893051, RMSE: 0.20058390, MAE: 0.16608353, R2: -1.56348598
EPOCH 2:


Elements...: 100%|██████████| 5/5 [00:00<00:00, 435.03it/s]


TRAIN	 Loss: 0.04995199, RMSE: 0.18478410, MAE: 0.15138566, R2: -1.05634713
EPOCH 3:


Elements...: 100%|██████████| 5/5 [00:00<00:00, 444.17it/s]


TRAIN	 Loss: 0.04239215, RMSE: 0.17146227, MAE: 0.13936812, R2: -0.76474333
EPOCH 4:


Elements...: 100%|██████████| 5/5 [00:00<00:00, 437.51it/s]

TRAIN	 Loss: 0.03638558, RMSE: 0.15993601, MAE: 0.12771399, R2: -0.52789789
Model saved as student_policy_0.pt

--- ITERATION 2/20 | beta = 0.607 ---
Rollout... 





The Expert was selected 2984 times, while the student was selected 1952 times
Collected data - Observations: 7187 Actions: 7187
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 15/15 [00:00<00:00, 432.17it/s]


TRAIN	 Loss: 0.00626807, RMSE: 0.07262066, MAE: 0.05600550, R2: 0.41065803
EPOCH 1:


Elements...: 100%|██████████| 15/15 [00:00<00:00, 424.50it/s]


TRAIN	 Loss: 0.00495579, RMSE: 0.06416727, MAE: 0.04830245, R2: 0.56367826
EPOCH 2:


Elements...: 100%|██████████| 15/15 [00:00<00:00, 438.98it/s]


TRAIN	 Loss: 0.00417331, RMSE: 0.05881785, MAE: 0.04376966, R2: 0.62780881
EPOCH 3:


Elements...: 100%|██████████| 15/15 [00:00<00:00, 438.25it/s]


TRAIN	 Loss: 0.00360071, RMSE: 0.05472971, MAE: 0.04024391, R2: 0.67013192
EPOCH 4:


Elements...: 100%|██████████| 15/15 [00:00<00:00, 429.37it/s]


TRAIN	 Loss: 0.00315059, RMSE: 0.05134058, MAE: 0.03738237, R2: 0.70011914
Model saved as student_policy_1.pt

--- ITERATION 3/20 | beta = 0.368 ---
Rollout... 
The Expert was selected 816 times, while the student was selected 1382 times
Collected data - Observations: 9385 Actions: 9385
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 19/19 [00:00<00:00, 420.81it/s]


TRAIN	 Loss: 0.00107835, RMSE: 0.03063249, MAE: 0.02332627, R2: 0.88514179
EPOCH 1:


Elements...: 100%|██████████| 19/19 [00:00<00:00, 411.61it/s]


TRAIN	 Loss: 0.00094375, RMSE: 0.02867146, MAE: 0.02163554, R2: 0.90275300
EPOCH 2:


Elements...: 100%|██████████| 19/19 [00:00<00:00, 408.35it/s]


TRAIN	 Loss: 0.00085095, RMSE: 0.02725034, MAE: 0.02044651, R2: 0.91300911
EPOCH 3:


Elements...: 100%|██████████| 19/19 [00:00<00:00, 396.11it/s]


TRAIN	 Loss: 0.00077528, RMSE: 0.02604402, MAE: 0.01944443, R2: 0.92018181
EPOCH 4:


Elements...: 100%|██████████| 19/19 [00:00<00:00, 397.16it/s]


TRAIN	 Loss: 0.00071387, RMSE: 0.02500309, MAE: 0.01859363, R2: 0.92579246
Model saved as student_policy_2.pt

--- ITERATION 4/20 | beta = 0.223 ---
Rollout... 
The Expert was selected 513 times, while the student was selected 1739 times
Collected data - Observations: 11637 Actions: 11637
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 23/23 [00:00<00:00, 393.97it/s]


TRAIN	 Loss: 0.00038072, RMSE: 0.01832396, MAE: 0.01388509, R2: 0.95841271
EPOCH 1:


Elements...: 100%|██████████| 23/23 [00:00<00:00, 406.63it/s]


TRAIN	 Loss: 0.00035331, RMSE: 0.01766007, MAE: 0.01334807, R2: 0.96108544
EPOCH 2:


Elements...: 100%|██████████| 23/23 [00:00<00:00, 431.12it/s]


TRAIN	 Loss: 0.00032891, RMSE: 0.01705475, MAE: 0.01285930, R2: 0.96325660
EPOCH 3:


Elements...: 100%|██████████| 23/23 [00:00<00:00, 423.45it/s]


TRAIN	 Loss: 0.00030779, RMSE: 0.01651840, MAE: 0.01241964, R2: 0.96506542
EPOCH 4:


Elements...: 100%|██████████| 23/23 [00:00<00:00, 414.70it/s]


TRAIN	 Loss: 0.00029036, RMSE: 0.01606067, MAE: 0.01204328, R2: 0.96659237
Model saved as student_policy_3.pt

--- ITERATION 5/20 | beta = 0.135 ---
Rollout... 
The Expert was selected 323 times, while the student was selected 1931 times
Collected data - Observations: 13891 Actions: 13891
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 28/28 [00:00<00:00, 304.98it/s]


TRAIN	 Loss: 0.00019085, RMSE: 0.01309653, MAE: 0.00979602, R2: 0.97676986
EPOCH 1:


Elements...: 100%|██████████| 28/28 [00:00<00:00, 323.16it/s]


TRAIN	 Loss: 0.00018103, RMSE: 0.01275033, MAE: 0.00951403, R2: 0.97785163
EPOCH 2:


Elements...: 100%|██████████| 28/28 [00:00<00:00, 297.32it/s]


TRAIN	 Loss: 0.00017167, RMSE: 0.01244723, MAE: 0.00926998, R2: 0.97878224
EPOCH 3:


Elements...: 100%|██████████| 28/28 [00:00<00:00, 333.35it/s]


TRAIN	 Loss: 0.00016385, RMSE: 0.01217131, MAE: 0.00904943, R2: 0.97965622
EPOCH 4:


Elements...: 100%|██████████| 28/28 [00:00<00:00, 346.56it/s]


TRAIN	 Loss: 0.00015675, RMSE: 0.01190954, MAE: 0.00883827, R2: 0.98049831
Model saved as student_policy_4.pt

--- ITERATION 6/20 | beta = 0.082 ---
Rollout... 
The Expert was selected 195 times, while the student was selected 2048 times
Collected data - Observations: 16134 Actions: 16134
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 32/32 [00:00<00:00, 394.27it/s]


TRAIN	 Loss: 0.00011238, RMSE: 0.01009608, MAE: 0.00746115, R2: 0.98636365
EPOCH 1:


Elements...: 100%|██████████| 32/32 [00:00<00:00, 397.89it/s]


TRAIN	 Loss: 0.00010825, RMSE: 0.00991186, MAE: 0.00732270, R2: 0.98684776
EPOCH 2:


Elements...: 100%|██████████| 32/32 [00:00<00:00, 415.51it/s]


TRAIN	 Loss: 0.00010507, RMSE: 0.00975955, MAE: 0.00720765, R2: 0.98730916
EPOCH 3:


Elements...: 100%|██████████| 32/32 [00:00<00:00, 368.31it/s]


TRAIN	 Loss: 0.00010187, RMSE: 0.00960834, MAE: 0.00709106, R2: 0.98768467
EPOCH 4:


Elements...: 100%|██████████| 32/32 [00:00<00:00, 416.37it/s]


TRAIN	 Loss: 0.00009905, RMSE: 0.00946967, MAE: 0.00698185, R2: 0.98804748
Model saved as student_policy_5.pt

--- ITERATION 7/20 | beta = 0.050 ---
Rollout... 
The Expert was selected 104 times, while the student was selected 2190 times
Collected data - Observations: 18428 Actions: 18428
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 36/36 [00:00<00:00, 404.18it/s]


TRAIN	 Loss: 0.00008250, RMSE: 0.00863004, MAE: 0.00638968, R2: 0.99064225
EPOCH 1:


Elements...: 100%|██████████| 36/36 [00:00<00:00, 395.94it/s]


TRAIN	 Loss: 0.00007957, RMSE: 0.00847160, MAE: 0.00625255, R2: 0.99100101
EPOCH 2:


Elements...: 100%|██████████| 36/36 [00:00<00:00, 366.84it/s]


TRAIN	 Loss: 0.00007704, RMSE: 0.00833722, MAE: 0.00613964, R2: 0.99128425
EPOCH 3:


Elements...: 100%|██████████| 36/36 [00:00<00:00, 395.83it/s]


TRAIN	 Loss: 0.00007447, RMSE: 0.00819800, MAE: 0.00602169, R2: 0.99154216
EPOCH 4:


Elements...: 100%|██████████| 36/36 [00:00<00:00, 380.07it/s]


TRAIN	 Loss: 0.00007234, RMSE: 0.00808195, MAE: 0.00592789, R2: 0.99175060
Model saved as student_policy_6.pt

--- ITERATION 8/20 | beta = 0.030 ---
Rollout... 
The Expert was selected 65 times, while the student was selected 2185 times
Collected data - Observations: 20678 Actions: 20678
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 41/41 [00:00<00:00, 389.21it/s]


TRAIN	 Loss: 0.00006049, RMSE: 0.00737155, MAE: 0.00541875, R2: 0.99338681
EPOCH 1:


Elements...: 100%|██████████| 41/41 [00:00<00:00, 390.23it/s]


TRAIN	 Loss: 0.00005822, RMSE: 0.00723063, MAE: 0.00529432, R2: 0.99361736
EPOCH 2:


Elements...: 100%|██████████| 41/41 [00:00<00:00, 371.28it/s]


TRAIN	 Loss: 0.00005630, RMSE: 0.00711039, MAE: 0.00518798, R2: 0.99379361
EPOCH 3:


Elements...: 100%|██████████| 41/41 [00:00<00:00, 331.73it/s]


TRAIN	 Loss: 0.00005609, RMSE: 0.00708309, MAE: 0.00518013, R2: 0.99391258
EPOCH 4:


Elements...: 100%|██████████| 41/41 [00:00<00:00, 382.68it/s]


TRAIN	 Loss: 0.00005504, RMSE: 0.00701565, MAE: 0.00513025, R2: 0.99402863
Model saved as student_policy_7.pt

--- ITERATION 9/20 | beta = 0.018 ---
Rollout... 
The Expert was selected 45 times, while the student was selected 2205 times
Collected data - Observations: 22928 Actions: 22928
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 45/45 [00:00<00:00, 398.16it/s]


TRAIN	 Loss: 0.00004581, RMSE: 0.00639035, MAE: 0.00463728, R2: 0.99510831
EPOCH 1:


Elements...: 100%|██████████| 45/45 [00:00<00:00, 352.69it/s]


TRAIN	 Loss: 0.00004484, RMSE: 0.00632134, MAE: 0.00458661, R2: 0.99521703
EPOCH 2:


Elements...: 100%|██████████| 45/45 [00:00<00:00, 364.01it/s]


TRAIN	 Loss: 0.00004432, RMSE: 0.00628021, MAE: 0.00455776, R2: 0.99528778
EPOCH 3:


Elements...: 100%|██████████| 45/45 [00:00<00:00, 389.94it/s]


TRAIN	 Loss: 0.00004333, RMSE: 0.00621217, MAE: 0.00450211, R2: 0.99538285
EPOCH 4:


Elements...: 100%|██████████| 45/45 [00:00<00:00, 398.38it/s]


TRAIN	 Loss: 0.00004298, RMSE: 0.00618529, MAE: 0.00448984, R2: 0.99542862
Model saved as student_policy_8.pt

--- ITERATION 10/20 | beta = 0.011 ---
Rollout... 
The Expert was selected 24 times, while the student was selected 2226 times
Collected data - Observations: 25178 Actions: 25178
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 50/50 [00:00<00:00, 399.09it/s]


TRAIN	 Loss: 0.00003724, RMSE: 0.00574875, MAE: 0.00417393, R2: 0.99605983
EPOCH 1:


Elements...: 100%|██████████| 50/50 [00:00<00:00, 396.53it/s]


TRAIN	 Loss: 0.00003850, RMSE: 0.00583894, MAE: 0.00426912, R2: 0.99603391
EPOCH 2:


Elements...: 100%|██████████| 50/50 [00:00<00:00, 401.91it/s]


TRAIN	 Loss: 0.00003939, RMSE: 0.00588381, MAE: 0.00432533, R2: 0.99604100
EPOCH 3:


Elements...: 100%|██████████| 50/50 [00:00<00:00, 396.90it/s]


TRAIN	 Loss: 0.00003923, RMSE: 0.00586752, MAE: 0.00432196, R2: 0.99608922
EPOCH 4:


Elements...: 100%|██████████| 50/50 [00:00<00:00, 383.54it/s]


TRAIN	 Loss: 0.00003780, RMSE: 0.00576978, MAE: 0.00423171, R2: 0.99618244
Model saved as student_policy_9.pt

--- ITERATION 11/20 | beta = 0.007 ---
Rollout... 
The Expert was selected 12 times, while the student was selected 2238 times
Collected data - Observations: 27428 Actions: 27428
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 54/54 [00:00<00:00, 399.73it/s]


TRAIN	 Loss: 0.00003697, RMSE: 0.00560723, MAE: 0.00409392, R2: 0.99661201
EPOCH 1:


Elements...: 100%|██████████| 54/54 [00:00<00:00, 371.18it/s]


TRAIN	 Loss: 0.00003624, RMSE: 0.00557681, MAE: 0.00409968, R2: 0.99663883
EPOCH 2:


Elements...: 100%|██████████| 54/54 [00:00<00:00, 398.68it/s]


TRAIN	 Loss: 0.00003523, RMSE: 0.00550460, MAE: 0.00405160, R2: 0.99669445
EPOCH 3:


Elements...: 100%|██████████| 54/54 [00:00<00:00, 364.04it/s]


TRAIN	 Loss: 0.00003357, RMSE: 0.00539503, MAE: 0.00395088, R2: 0.99676883
EPOCH 4:


Elements...: 100%|██████████| 54/54 [00:00<00:00, 387.89it/s]


TRAIN	 Loss: 0.00003325, RMSE: 0.00536193, MAE: 0.00393234, R2: 0.99681306
Model saved as student_policy_10.pt

--- ITERATION 12/20 | beta = 0.004 ---
Rollout... 
The Expert was selected 9 times, while the student was selected 2242 times
Collected data - Observations: 29679 Actions: 29679
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 58/58 [00:00<00:00, 327.36it/s]


TRAIN	 Loss: 0.00003006, RMSE: 0.00512756, MAE: 0.00377713, R2: 0.99699551
EPOCH 1:


Elements...: 100%|██████████| 58/58 [00:00<00:00, 288.32it/s]


TRAIN	 Loss: 0.00003019, RMSE: 0.00514309, MAE: 0.00381050, R2: 0.99702287
EPOCH 2:


Elements...: 100%|██████████| 58/58 [00:00<00:00, 313.09it/s]


TRAIN	 Loss: 0.00002998, RMSE: 0.00512189, MAE: 0.00379632, R2: 0.99706376
EPOCH 3:


Elements...: 100%|██████████| 58/58 [00:00<00:00, 291.04it/s]


TRAIN	 Loss: 0.00003060, RMSE: 0.00515871, MAE: 0.00383956, R2: 0.99707669
EPOCH 4:


Elements...: 100%|██████████| 58/58 [00:00<00:00, 285.91it/s]


TRAIN	 Loss: 0.00003203, RMSE: 0.00526619, MAE: 0.00393565, R2: 0.99701375
Model saved as student_policy_11.pt

--- ITERATION 13/20 | beta = 0.002 ---
Rollout... 
The Expert was selected 6 times, while the student was selected 2294 times
Collected data - Observations: 31979 Actions: 31979
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 63/63 [00:00<00:00, 385.90it/s]


TRAIN	 Loss: 0.00002888, RMSE: 0.00503364, MAE: 0.00379787, R2: 0.99722362
EPOCH 1:


Elements...: 100%|██████████| 63/63 [00:00<00:00, 388.31it/s]


TRAIN	 Loss: 0.00002825, RMSE: 0.00495917, MAE: 0.00371802, R2: 0.99733067
EPOCH 2:


Elements...: 100%|██████████| 63/63 [00:00<00:00, 401.03it/s]


TRAIN	 Loss: 0.00002637, RMSE: 0.00480667, MAE: 0.00356966, R2: 0.99741715
EPOCH 3:


Elements...: 100%|██████████| 63/63 [00:00<00:00, 408.72it/s]


TRAIN	 Loss: 0.00002588, RMSE: 0.00476078, MAE: 0.00353285, R2: 0.99747252
EPOCH 4:


Elements...: 100%|██████████| 63/63 [00:00<00:00, 395.50it/s]


TRAIN	 Loss: 0.00002564, RMSE: 0.00473570, MAE: 0.00351748, R2: 0.99749863
Model saved as student_policy_12.pt

--- ITERATION 14/20 | beta = 0.002 ---
Rollout... 
The Expert was selected 0 times, while the student was selected 2249 times
Collected data - Observations: 34228 Actions: 34228
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 67/67 [00:00<00:00, 384.35it/s]


TRAIN	 Loss: 0.00002703, RMSE: 0.00479340, MAE: 0.00359174, R2: 0.99759388
EPOCH 1:


Elements...: 100%|██████████| 67/67 [00:00<00:00, 398.10it/s]


TRAIN	 Loss: 0.00002639, RMSE: 0.00472245, MAE: 0.00353989, R2: 0.99767643
EPOCH 2:


Elements...: 100%|██████████| 67/67 [00:00<00:00, 364.89it/s]


TRAIN	 Loss: 0.00002814, RMSE: 0.00485558, MAE: 0.00365623, R2: 0.99762577
EPOCH 3:


Elements...: 100%|██████████| 67/67 [00:00<00:00, 386.38it/s]


TRAIN	 Loss: 0.00002667, RMSE: 0.00476754, MAE: 0.00357527, R2: 0.99764514
EPOCH 4:


Elements...: 100%|██████████| 67/67 [00:00<00:00, 389.68it/s]


TRAIN	 Loss: 0.00002592, RMSE: 0.00470946, MAE: 0.00352964, R2: 0.99767375
Model saved as student_policy_13.pt

--- ITERATION 15/20 | beta = 0.001 ---
Rollout... 
The Expert was selected 1 times, while the student was selected 2246 times
Collected data - Observations: 36475 Actions: 36475
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 72/72 [00:00<00:00, 405.92it/s]


TRAIN	 Loss: 0.00001896, RMSE: 0.00410317, MAE: 0.00303227, R2: 0.99801952
EPOCH 1:


Elements...: 100%|██████████| 72/72 [00:00<00:00, 408.97it/s]


TRAIN	 Loss: 0.00002030, RMSE: 0.00422083, MAE: 0.00315455, R2: 0.99799782
EPOCH 2:


Elements...: 100%|██████████| 72/72 [00:00<00:00, 400.71it/s]


TRAIN	 Loss: 0.00002068, RMSE: 0.00425244, MAE: 0.00318567, R2: 0.99801230
EPOCH 3:


Elements...: 100%|██████████| 72/72 [00:00<00:00, 401.07it/s]


TRAIN	 Loss: 0.00002153, RMSE: 0.00431816, MAE: 0.00323326, R2: 0.99799985
EPOCH 4:


Elements...: 100%|██████████| 72/72 [00:00<00:00, 401.03it/s]


TRAIN	 Loss: 0.00002155, RMSE: 0.00431857, MAE: 0.00324147, R2: 0.99799579
Model saved as student_policy_14.pt

--- ITERATION 16/20 | beta = 0.001 ---
Rollout... 
The Expert was selected 1 times, while the student was selected 2249 times
Collected data - Observations: 38725 Actions: 38725
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 76/76 [00:00<00:00, 397.91it/s]


TRAIN	 Loss: 0.00002034, RMSE: 0.00422092, MAE: 0.00319812, R2: 0.99807793
EPOCH 1:


Elements...: 100%|██████████| 76/76 [00:00<00:00, 407.71it/s]


TRAIN	 Loss: 0.00001901, RMSE: 0.00408865, MAE: 0.00306950, R2: 0.99815536
EPOCH 2:


Elements...: 100%|██████████| 76/76 [00:00<00:00, 406.51it/s]


TRAIN	 Loss: 0.00001921, RMSE: 0.00411446, MAE: 0.00309895, R2: 0.99812102
EPOCH 3:


Elements...: 100%|██████████| 76/76 [00:00<00:00, 408.39it/s]


TRAIN	 Loss: 0.00001929, RMSE: 0.00411402, MAE: 0.00310291, R2: 0.99813533
EPOCH 4:


Elements...: 100%|██████████| 76/76 [00:00<00:00, 382.22it/s]


TRAIN	 Loss: 0.00001958, RMSE: 0.00413237, MAE: 0.00311437, R2: 0.99813557
Model saved as student_policy_15.pt

--- ITERATION 17/20 | beta = 0.000 ---
Rollout... 
The Expert was selected 0 times, while the student was selected 2251 times
Collected data - Observations: 40976 Actions: 40976
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 81/81 [00:00<00:00, 403.70it/s]


TRAIN	 Loss: 0.00001651, RMSE: 0.00379974, MAE: 0.00283791, R2: 0.99837017
EPOCH 1:


Elements...: 100%|██████████| 81/81 [00:00<00:00, 375.89it/s]


TRAIN	 Loss: 0.00002432, RMSE: 0.00451836, MAE: 0.00326966, R2: 0.99806458
EPOCH 2:


Elements...: 100%|██████████| 81/81 [00:00<00:00, 407.04it/s]


TRAIN	 Loss: 0.00002070, RMSE: 0.00421474, MAE: 0.00304040, R2: 0.99818921
EPOCH 3:


Elements...: 100%|██████████| 81/81 [00:00<00:00, 419.25it/s]


TRAIN	 Loss: 0.00001934, RMSE: 0.00408960, MAE: 0.00297209, R2: 0.99824470
EPOCH 4:


Elements...: 100%|██████████| 81/81 [00:00<00:00, 406.48it/s]


TRAIN	 Loss: 0.00001857, RMSE: 0.00401421, MAE: 0.00293184, R2: 0.99827969
Model saved as student_policy_16.pt

--- ITERATION 18/20 | beta = 0.000 ---
Rollout... 
The Expert was selected 0 times, while the student was selected 2215 times
Collected data - Observations: 43191 Actions: 43191
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 85/85 [00:00<00:00, 304.03it/s]


TRAIN	 Loss: 0.00002100, RMSE: 0.00424380, MAE: 0.00309957, R2: 0.99815178
EPOCH 1:


Elements...: 100%|██████████| 85/85 [00:00<00:00, 317.58it/s]


TRAIN	 Loss: 0.00001666, RMSE: 0.00382689, MAE: 0.00278677, R2: 0.99836802
EPOCH 2:


Elements...: 100%|██████████| 85/85 [00:00<00:00, 296.59it/s]


TRAIN	 Loss: 0.00001632, RMSE: 0.00377852, MAE: 0.00277746, R2: 0.99842316
EPOCH 3:


Elements...: 100%|██████████| 85/85 [00:00<00:00, 273.35it/s]


TRAIN	 Loss: 0.00001643, RMSE: 0.00378474, MAE: 0.00280767, R2: 0.99843323
EPOCH 4:


Elements...: 100%|██████████| 85/85 [00:00<00:00, 286.53it/s]


TRAIN	 Loss: 0.00001660, RMSE: 0.00379282, MAE: 0.00282999, R2: 0.99844617
Model saved as student_policy_17.pt

--- ITERATION 19/20 | beta = 0.000 ---
Rollout... 
The Expert was selected 0 times, while the student was selected 2250 times
Collected data - Observations: 45441 Actions: 45441
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 89/89 [00:00<00:00, 399.14it/s]


TRAIN	 Loss: 0.00001439, RMSE: 0.00356864, MAE: 0.00266625, R2: 0.99858981
EPOCH 1:


Elements...: 100%|██████████| 89/89 [00:00<00:00, 383.35it/s]


TRAIN	 Loss: 0.00001777, RMSE: 0.00388415, MAE: 0.00288229, R2: 0.99851310
EPOCH 2:


Elements...: 100%|██████████| 89/89 [00:00<00:00, 394.74it/s]


TRAIN	 Loss: 0.00001596, RMSE: 0.00370980, MAE: 0.00274949, R2: 0.99857384
EPOCH 3:


Elements...: 100%|██████████| 89/89 [00:00<00:00, 404.91it/s]


TRAIN	 Loss: 0.00001554, RMSE: 0.00366549, MAE: 0.00272630, R2: 0.99858880
EPOCH 4:


Elements...: 100%|██████████| 89/89 [00:00<00:00, 395.06it/s]


TRAIN	 Loss: 0.00001521, RMSE: 0.00364251, MAE: 0.00271957, R2: 0.99858826
Model saved as student_policy_18.pt

--- ITERATION 20/20 | beta = 0.000 ---
Rollout... 
The Expert was selected 0 times, while the student was selected 2250 times
Collected data - Observations: 47691 Actions: 47691
Training Student Policy...
EPOCH 0:


Elements...: 100%|██████████| 94/94 [00:00<00:00, 395.13it/s]


TRAIN	 Loss: 0.00001282, RMSE: 0.00338371, MAE: 0.00255868, R2: 0.99849319
EPOCH 1:


Elements...: 100%|██████████| 94/94 [00:00<00:00, 409.41it/s]


TRAIN	 Loss: 0.00001510, RMSE: 0.00358191, MAE: 0.00271512, R2: 0.99858451
EPOCH 2:


Elements...: 100%|██████████| 94/94 [00:00<00:00, 405.59it/s]


TRAIN	 Loss: 0.00001435, RMSE: 0.00350988, MAE: 0.00264882, R2: 0.99864262
EPOCH 3:


Elements...: 100%|██████████| 94/94 [00:00<00:00, 400.55it/s]


TRAIN	 Loss: 0.00001345, RMSE: 0.00341902, MAE: 0.00257041, R2: 0.99869180
EPOCH 4:


Elements...: 100%|██████████| 94/94 [00:00<00:00, 387.88it/s]


TRAIN	 Loss: 0.00001328, RMSE: 0.00340410, MAE: 0.00256130, R2: 0.99869609
Model saved as student_policy_19.pt
Evaluating the best student...
student_policy_0.pt	 Loss: 0.02963963, RMSE: 0.15996771, MAE: 0.12446148, R2: 0.29197115
student_policy_1.pt	 Loss: 0.01622160, RMSE: 0.11347617, MAE: 0.08077682, R2: 0.54710692
student_policy_2.pt	 Loss: 0.01212965, RMSE: 0.09899024, MAE: 0.06923320, R2: 0.65899110
student_policy_3.pt	 Loss: 0.01005203, RMSE: 0.09263600, MAE: 0.06479467, R2: 0.69711530
student_policy_4.pt	 Loss: 0.00944234, RMSE: 0.09066804, MAE: 0.06328025, R2: 0.71609199
student_policy_5.pt	 Loss: 0.00921655, RMSE: 0.08883385, MAE: 0.06216289, R2: 0.72994620
student_policy_6.pt	 Loss: 0.00841510, RMSE: 0.08795118, MAE: 0.06162636, R2: 0.73700911
student_policy_7.pt	 Loss: 0.00949608, RMSE: 0.08686584, MAE: 0.06086705, R2: 0.74464816
student_policy_8.pt	 Loss: 0.00784993, RMSE: 0.08632495, MAE: 0.06044428, R2: 0.74970537
student_policy_9.pt	 Loss: 0.00850166, RMSE: 0.08601230, 

### Simulation
Use this section to run rollouts in the environment using the newly trained student network. You can customize the simulation with the following parameters:
- `idx`: <mark style="background-color:#ffcccc"><font color="red"><b>IMPORTANT</b></font></mark> 🚨 **Set this to the index of the best-performing model found in the training above** (Ex. "The better student Policy is the student_policy_12.pt" => Then you should put idx = 12)
- `render`: If True, displays on screen the rollouts. **Requires a GPU**
- `framerate_per_episode` : Controls how frequently frames are rendered. Only frames where `(frame_idx % framerate_per_episode == 0)` are shown
- `video_saving`: If True, saves the video of the episodes in `./new_videos` folder. In **Kitchen** environment, this critically increase computational time
- `n_episodes`: Number of episodes to simulate
- `robot_noise`: Magnitude of noise added to the robot’s proprioceptive variables (only for the **Kitchen** environment)

At the end of all the Rollouts, a mean_rewards.json file will be saved, and it will contain:
- The mean rewards for each episode
- `mean_of_means`: The overall mean reward across all episodes

### Using a Saved model

If you want to simulate one of our pretrained models, you can load it manually by changing:
- `path_to_model`: Insert here the complete path to the model provided by `Dagger4Robotics` folder
- `net_type`: The architecture type of the loaded model (`"simple"` or `"deep"`)

In [None]:
#@title Parameters of the experiments { run: "auto" }
# Change this value to the best model found
idx=1 #@param {type:"integer"}
render = True #@param {type:"boolean"}
framerate_per_episode=5  #@param {type:"integer"}
video_saving = False #@param {type:"boolean"}
n_episodes = 1 #@param {type:"integer"}
robot_noise=0.01 #@param {type:"number"}

path_to_model=f"student_policy_{idx}.pt"
net_type=student_type

In [None]:
sim = Simulator(
        env_mode='kitchen',
        net_type=net_type,
        path_to_model=path_to_model,
        n_episodes=n_episodes,
        render=render,
        framerate_per_episode=framerate_per_episode,
        video_saving=video_saving,
        robot_noise=robot_noise,
        device=device
    )
sim.run()