# PPO2 on Solo8 v2 Vanilla w/ Fixed Timestamp
Only use the time-based stopping criteria. This is more of a rudamentary test more than anything.

## Define Experiment Tags

In [1]:
TAGS = ['solov2vanilla', 'gpu']

## Get Solo Environment Configuration

Import the relevant libraries + rewards & observations

In [2]:
from gym_solo.envs import solo8v2vanilla
from gym_solo.core import obs
from gym_solo.core import rewards
from gym_solo.core import termination as terms

import gym
import gym_solo

Create the config for the enviornment

In [3]:
env_config = solo8v2vanilla.Solo8VanillaConfig()

## Parse CLI arguments and register w/ wandb

This experiment will be using the auto trainer to handle all of the hyperparmeter running

In [4]:
from auto_trainer import params
import auto_trainer

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



Create a basic config. Give the robot a total of 60 seconds simulation time to learn how to stand.

In [5]:
config = params.BaseParameters().parse()

config.episodes = 50000
config.episode_length = 60 / env_config.dt

config, run = auto_trainer.get_synced_config(config, TAGS)
config

[34m[1mwandb[0m: Currently logged in as: [33magupta231[0m (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: wandb version 0.10.12 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


{'episodes': 50000, 'episode_length': 60000.0, 'policy': 'MlpPolicy', 'algorithm': 'PPO2'}

## Setup Environment
Add the following inputs to the robot / environment:

**Observations**
- TorsoIMU
- Motor encoder current values

**Reward**
- How upright the TorsoIMU is. Valued in $[-1, 1]$

**Termination Criteria**
- Terminate after $n$ timesteps

In [6]:
env = gym.make('solo8vanilla-v0', config=env_config)

env.obs_factory.register_observation(obs.TorsoIMU(env.robot))
env.obs_factory.register_observation(obs.MotorEncoder(env.robot))

env.reward_factory.register_reward(1, rewards.UprightReward(env.robot))

env.termination_factory.register_termination(terms.TimeBasedTermination(config.episode_length))



## Learning

In [7]:
model, config, run = auto_trainer.train(env, config, TAGS, log_freq=500, 
                                        full_logging=False, run=run)

Wrapping the env in a DummyVecEnv.




Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Please use `layer.__call__` method instead.




Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where




------------------------------------
| approxkl           | 0.008098352 |
| clipfrac           | 0.11328125  |
| explained_variance | -0.482      |
| fps                | 58          |
| n_updates          | 1           |
| policy_entropy     | 17.03219    |
| policy_loss        | -0.02446253 |
| serial_timesteps   | 128         |
| time_elapsed       | 2.07e-05    |
| total_timesteps    | 128         |
| value_loss         | 0.002995892 |
------------------------------------


VBox(children=(Label(value=' 1.97MB of 6.15MB uploaded (0.00MB deduped)\r'), FloatProgress(value=0.32044030368…

0,1
global_step,50040.0
_timestamp,1607875722.53292
loss/entropy_loss,17.57503
loss/policy_gradient_loss,-0.0218
loss/value_function_loss,0.00105
loss/approximate_kullback-leibler,0.02471
loss/clip_factor,0.40625
loss/loss,-0.19703
input_info/discounted_rewards,0.71642
input_info/learning_rate,0.00025


0,1
global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
loss/entropy_loss,▁▁▁▁▁▂▂▂▂▂▃▃▃▃▄▄▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇██████▇██
loss/policy_gradient_loss,▅▅▂▃▁▅▂▆▄▃▃▅▆▆▂▂▆▆▇▃▆▃▂▄▃▅▂▆▁▅▅▅▆▆█▃▇▅▂▆
loss/value_function_loss,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▇▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
loss/approximate_kullback-leibler,▄▁▃▄▅▂▅▂▃▃█▃▂▃▆▄▂▃▁▄▃▃▃▅▇▂▆█▃▃▂▄▃▁▄▄▄█▇▃
loss/clip_factor,▃▁▃▄▆▃▃▂▄▄█▂▂▅▄▅▃▄▁▄▄▄▅▆▇▁▅▇▅▃▁▆▄▁▅▅▄█▇▃
loss/loss,▂▂▁▁▁▂▁▂▂▁▁▂▂▂▁▁█▇▅▃▂▁▁▂▁▂▁▂▁▂▂▂▂▂▂▁▂▂▁▂
input_info/discounted_rewards,▁▂▁▁▁▂▂▁▁▂▂▃▂▂▂▃▆██▆▃▂▄▂▂▂▂▂▂▂▂▂▂▂▂▃▂▂▂▂
input_info/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


In [8]:
import pandas
from pympler import muppy, summary
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
# Prints out a summary of the large objects
summary.print_(sum1)

                                                           types |   # objects |   total size
                                                             str |      177663 |     34.14 MB
                                                            dict |       75459 |     27.19 MB
                                                            code |       62739 |      8.67 MB
                                                            type |        7759 |      7.25 MB
                                                           tuple |       63847 |      4.95 MB
                                                            list |       16510 |      1.84 MB
                                                             set |        3224 |      1.40 MB
                                                         weakref |       11942 |      1.00 MB
                                                            cell |       18070 |    988.20 KB
                                                     abc.ABC