# ROBOTIC

Using stable_baselines3 for robotic use case


## 1. Importing dependancy

* **gym** : Environnments library for reinforcement learning
* **panda-gym** : Open source library for robotic environnment using pybullet
* **stable_baselines3** : reinforcement learning library 

In [None]:
import gym
import panda_gym
import stable_baselines3
from stable_baselines3.common.logger import configure
from stable_baselines3.common.callbacks import CallbackList, CheckpointCallback, EvalCallback
from stable_baselines3 import HerReplayBuffer, DDPG

## 2. Testing the environnment with random variables

In [None]:
env = gym.make('PandaPush-v2', render=True) # Create the environmment with a view

obs = env.reset() # reset the environnment
done = False

while not done: 
    action = env.action_space.sample() # random action
    obs, reward, done, info = env.step(action)

env.close()

## 3. Setting up model with [HER](https://stable-baselines3.readthedocs.io/en/master/modules/her.html) : [DDPG](https://stable-baselines3.readthedocs.io/en/master/modules/ddpg.html)
Setting the model hyper-parameters from community data :https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/her.yml

In [None]:
env = gym.make("PandaPush-v2") # Create the environnment with no rendering
model = DDPG('MultiInputPolicy', 
             env, 
             replay_buffer_class=HerReplayBuffer, 
             replay_buffer_kwargs=dict(
                 n_sampled_goal=4,
                 goal_selection_strategy='future',
                 online_sampling=True,
             ), 
             buffer_size = 1000000, 
             tau = 0.05, 
             learning_rate = 1e-3, 
             verbose=1, 
             batch_size = 2048, 
             gamma = 0.95, 
             policy_kwargs = dict(
                 n_critics=2, 
                 net_arch=[512, 512, 512]
             ), 
             tensorboard_log="logs/tensorboard/") # Create a model with sepcify hyper-parameter

### 3.1. Setting callback

**Saving a version of the model each 1000 steps**

In [None]:
checkpoint_callback = CheckpointCallback(save_freq=1000, 
                                         save_path='.', 
                                         name_prefix='PandaPush-v2')


**Evaluate the model each 1000 steps and save it as "best_model"**

In [None]:
eval_callback = EvalCallback(env, 
                             best_model_save_path='logs/DDPG', 
                             eval_freq=1000)

**Putting the callbacks in a list**

In [None]:
callback_list = CallbackList([checkpoint_callback, eval_callback])

## 4. Training the model
* For 10000 steps
* Logging the state of the model each 1000 steps


In [None]:
model.learn(total_timesteps=10000, 
            callback=callback_list, 
            log_interval=1000, 
            tb_log_name='logs_robotics_PandaPush')

## 5 Saving and cleaning the environnment

In [None]:
model.save("PandaPush-v2-model") # Saving the model

del model #cleaning
del env

## 6 Testing the environnment

In [None]:
env = gym.make("PandaPush-v2", render=True) # creating the environnment with rendering
model = DDPG.load("PandaPush-v2-model", env=env) # load the best version of the model
obs = env.reset()
dones = False

while not dones:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()
env.close()

## 7. Testing community trained model

### 7.1 import sb3_contrib dependancy

In [None]:
import panda_gym
from sb3_contrib import TQC
from stable_baselines3.common.env_util import make_vec_env
from sb3_contrib.common.wrappers import TimeFeatureWrapper
from stable_baselines3.common.logger import configure

### 7.2 loading and running community trained model with [TQC](https://sb3-contrib.readthedocs.io/en/master/modules/tqc.html) model

Since training a robotic model demands a lot of computing power we were not able to creat a satisfying model. Let's try one model given by the community throught sb3_contrib

In [None]:
env = make_vec_env("PandaPush-v2", wrapper_class=TimeFeatureWrapper, env_kwargs={'render':True})
model = TQC.load("logs/TQC/PandaPush-v1", custom_objects={'learning_rate':0.001}, env=env)
obs = env.reset()
dones = False
while not dones:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()
env.close()

## 8. Creating GIF

Exporting a gif of the environnment with the model taken from community.

In [None]:
import imageio
import numpy

gif_env = make_vec_env("PandaPush-v2", wrapper_class=TimeFeatureWrapper, env_kwargs={'render':True})
gif_model = TQC.load("logs/TQC/PandaPush-v1", custom_objects={'learning_rate':0.001}, env=gif_env)
images = []
obs = gif_env.reset()
img = gif_env.render(mode='rgb_array')

for i in range(350):
    images.append(img)
    action, _ = gif_model.predict(obs)
    obs, _, _, _ = gif_env.step(action)
    img = gif_env.render(mode='rgb_array')

imageio.mimsave('test_panda_push.gif',
                [numpy.array(img) for i, img in enumerate(images) if i % 2 == 0],
                fps=29)
env.close()