# Auto Trainer & Stable Baselines Demostration w/ Solo8
A quick example on how to use the auto trainer framework alongisde stable-baslines to quickly monitor and train a model. Note that we will be using Weights and Biases to make the monitoring a bit easier.

## Define the experiment tags
This isn't necessary if you aren't using W&B, but it's a great way to organize your run data!

In [1]:
TAGS = ['demo', 'solov2vanilla', 'cpu']

## Set up the Solo Environment
Note that this is using the Solov2Vanilla environment.

Import all of the required packages

In [2]:
from gym_solo.envs import solo8v2vanilla
from gym_solo.core import obs
from gym_solo.core import rewards
from gym_solo import testing

import gym
import gym_solo



Create the environment config and instaniate the registered environment

In [3]:
env_config = solo8v2vanilla.Solo8VanillaConfig()
env = gym.make('solo8vanilla-v0', config=env_config)

Register all of the observations and rewards. Note that in this case, the observation is just the IMU and the rewards is on how upright the robot is. Modifying these values will probably be the biggest factor in determining convergence.

In [4]:
env.obs_factory.register_observation(obs.TorsoIMU(env.robot))
env.reward_factory.register_reward(1, rewards.UprightReward(env.robot))

And now the environment is all prepped to train on!

## Learning!
As soon as the env is ready, learning *should* be trivial.

Import the parameters and the `auto_trainer` itself. The `BaseParameters` is an extendable class where you can ad custom parameters for specific experiments. In this case, we arne't doing anything special, so we will just use the barebones verison.

In [5]:
from auto_trainer.params import BaseParameters
import auto_trainer

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



Parse the base parameters. Note that if you are performing a Weights & Biases sweep, these values will get overriden by the global organizer.

In [6]:
config = BaseParameters().parse()
config.episodes=5000

config

Namespace(algorithm='PPO2', episodes=5000, policy='MlpPolicy')

And thats it, we should be ready to train!

In [7]:
model, config, run = auto_trainer.train(env, config, TAGS, log_freq=500)

[34m[1mwandb[0m: Currently logged in as: [33magupta231[0m (use `wandb login --relogin` to force relogin)


Wrapping the env in a DummyVecEnv.




Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Please use `layer.__call__` method instead.









INFO:tensorflow:Summary name model/pi_fc0/w:0 is illegal; using model/pi_fc0/w_0 instead.
INFO:tensorflow:Summary name model/pi_fc0/b:0 is illegal; using model/pi_fc0/b_0 instead.
INFO:tensorflow:Summary name model/vf_fc0/w:0 is illegal; using model/vf_fc0/w_0 instead.
INFO:tensorflow:Summary name model/vf_fc0/b:0 is illegal; using model/vf_fc0/b_0 instead.
INFO:tensorflow:Summary name model/pi_fc1/w:0 is illegal; using model/pi_fc1/w_0 instead.
INFO:tensorflow:Summary name model/pi_fc1/b:0 is illegal; using model/pi_fc1/b_0 instead.
INFO:tensorflow:Summary name model/vf_fc1/w:0 is illegal; using model/vf_fc1/w_0 instead.
INFO:tensorflow:Summary name model/vf_fc1/b:0 is illegal; using model/vf_fc1/b_0 instead.
INFO:tensorflow:Summary name model/vf/w:0 is illegal; using model/vf/w_0 instead.
INFO:tensorflow:Summary name model/vf/b:0 is illegal; using model/vf/b_0 instead.
INFO:tensorflow:Summary name model/pi/w:0 is illegal; using model/pi/w_0 instead.
INFO:tensorflow:Summary name mo

VBox(children=(Label(value=' 4.77MB of 44.26MB uploaded (0.00MB deduped)\r'), FloatProgress(value=0.1078675356…

0,1
global_step,5112.0
_timestamp,1605717041.49933
loss/entropy_loss,17.07687
loss/policy_gradient_loss,-0.02098
loss/value_function_loss,0.00018
loss/approximate_kullback-leibler,0.00154
loss/clip_factor,0.03125
loss/loss,-0.19166
input_info/discounted_rewards,0.00792
input_info/learning_rate,0.00025


0,1
global_step,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
loss/entropy_loss,▁▁▁▁▂▂▂▃▄▅▅▅▅▅▅▄▄▄▅▅▅▆▆▇███▇██▇▆▅▅▅▅▅▅▅▅
loss/policy_gradient_loss,▆▆█▇▇▇██▇▆▆▇▆▃▇▆██▇█▆▇█▅█▇▄▅▇▆▅▆▇▅▁▄▂▄▅▆
loss/value_function_loss,▄▃▁▁▁▂▂▁▁█▂▁▁▁▁▁▂▁▁▁▁▁▁▁▂▁▁▇▇▅▃▂▁▃▃▁▂▂▁▁
loss/approximate_kullback-leibler,▁▁▁▁▁▁▁▁▁▃▁▁▂▂▂▁▂▁▁▁▁▂▁▁▁▁▄▂▃▃▃▁▂▄█▃▃▃▁▁
loss/clip_factor,▁▂▂▁▁▁▁▁▁▃▁▁▂▃▂▁▁▁▁▁▁▂▁▁▁▁▇▃█▅▆▁▂██▆█▆▁▂
loss/loss,▆▆█▇▇▇██▇▆▆▇▆▃▇▆██▇█▅▇█▅█▇▃▅▇▆▅▆▇▅▁▄▂▄▅▆
input_info/discounted_rewards,▅▅▅▄▅▄▄▅▅█▇▇▆▇▅▅▆▆▆▆▆▅▅▅▅▅▆▄▂▄▁▂▂▅▄▄▅▆▆▅
input_info/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
