# Non-Mocap Simulation Efforts

This notebook contains the work we've towards simulating realistic gaits since pivoting at the start of Week 9.

To give some context, the focus of these efforts was to determine how visually realistic the learned gaits through non-mocap methods would appear to be for different models. We also experimented with achieving more and more optimal trained agents, which lead to us shifting training from PPO to SAC as discussed in the project report. Below you will find both the scripts we wrote to train our model and data/graphs collected from training.

## Imports

In [1]:
import gym
import numpy as np

from stable_baselines.sac.policies import MlpPolicy
from stable_baselines import SAC

%load_ext tensorboard # Needed for the graphs displayed below

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


## Task: Walk as far forward as possible without falling

In [9]:
# DEFINE THE ENVIRONMENT/MODEL WITH TUNED PARAMETERS
env = gym.make('Humanoid-v2')
policy_kwargs = dict(layers=[256, 256])
model = SAC(MlpPolicy, env, verbose=1, tensorboard_log="./sac_humanoid_walk_tensorboard/", 
            policy_kwargs=policy_kwargs, buffer_size=1000000)









In [10]:
# TRAIN THE MODEL
model.learn(total_timesteps=int(2e6), log_interval=10)
model.save("humanoid_2M")

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 1.0125033  |
| ent_coef_loss           | 0.06119744 |
| entropy                 | 20.704313  |
| episodes                | 10         |
| fps                     | 97         |
| mean 100 episode reward | 98.4       |
| n_updates               | 80         |
| policy_loss             | 22.465284  |
| qf1_loss                | 11.124978  |
| qf2_loss                | 8.203977   |
| time_elapsed            | 1          |
| total timesteps         | 179        |
| value_loss              | 174.56766  |
----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.9544133  |
| ent_coef_loss           | -1.1572626 |
| entropy                 | 20.403168  |
| episodes                | 20         |
| fps                     | 94         |
| mean 100 episode reward | 108        |
| n_updates     

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.21295615 |
| ent_coef_loss           | -29.209871 |
| entropy                 | 18.89505   |
| episodes                | 140        |
| fps                     | 87         |
| mean 100 episode reward | 238        |
| n_updates               | 5479       |
| policy_loss             | -156.94156 |
| qf1_loss                | 16.031916  |
| qf2_loss                | 12.033306  |
| time_elapsed            | 63         |
| total timesteps         | 5578       |
| value_loss              | 33.884926  |
----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.18249518 |
| ent_coef_loss           | -32.414444 |
| entropy                 | 18.825909  |
| episodes                | 150        |
| fps                     | 87         |
| mean 100 episode reward | 252        |
| n_updates     

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.034614637 |
| ent_coef_loss           | -8.215527   |
| entropy                 | 15.240572   |
| episodes                | 270         |
| fps                     | 84          |
| mean 100 episode reward | 335         |
| n_updates               | 13886       |
| policy_loss             | -173.10144  |
| qf1_loss                | 37.088688   |
| qf2_loss                | 41.026405   |
| time_elapsed            | 164         |
| total timesteps         | 13985       |
| value_loss              | 50.987045   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.033174377 |
| ent_coef_loss           | -4.037989   |
| entropy                 | 15.20296    |
| episodes                | 280         |
| fps                     | 84          |
| mean 100 episode reward | 342   

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.026362928 |
| ent_coef_loss           | -9.37888    |
| entropy                 | 14.606434   |
| episodes                | 400         |
| fps                     | 84          |
| mean 100 episode reward | 357         |
| n_updates               | 22589       |
| policy_loss             | -135.58908  |
| qf1_loss                | 12.199048   |
| qf2_loss                | 15.35365    |
| time_elapsed            | 268         |
| total timesteps         | 22688       |
| value_loss              | 30.521551   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.026057212 |
| ent_coef_loss           | -1.4226665  |
| entropy                 | 14.728348   |
| episodes                | 410         |
| fps                     | 84          |
| mean 100 episode reward | 362   

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.02345964 |
| ent_coef_loss           | 8.760165   |
| entropy                 | 14.863079  |
| episodes                | 530        |
| fps                     | 84         |
| mean 100 episode reward | 367        |
| n_updates               | 31817      |
| policy_loss             | -160.20456 |
| qf1_loss                | 10.708256  |
| qf2_loss                | 12.858675  |
| time_elapsed            | 377        |
| total timesteps         | 31916      |
| value_loss              | 9.410799   |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.022857789 |
| ent_coef_loss           | -0.46138573 |
| entropy                 | 14.51716    |
| episodes                | 540         |
| fps                     | 84          |
| mean 100 episode reward | 365         |
| n_upda

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.02111207 |
| ent_coef_loss           | -3.9263606 |
| entropy                 | 14.916639  |
| episodes                | 660        |
| fps                     | 84         |
| mean 100 episode reward | 406        |
| n_updates               | 41678      |
| policy_loss             | -150.51184 |
| qf1_loss                | 6.4043946  |
| qf2_loss                | 8.534687   |
| time_elapsed            | 493        |
| total timesteps         | 41777      |
| value_loss              | 5.54049    |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.020920517 |
| ent_coef_loss           | 6.5396233   |
| entropy                 | 14.575903   |
| episodes                | 670         |
| fps                     | 84          |
| mean 100 episode reward | 402         |
| n_upda

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.0195836  |
| ent_coef_loss           | 5.817676   |
| entropy                 | 14.099024  |
| episodes                | 790        |
| fps                     | 83         |
| mean 100 episode reward | 428        |
| n_updates               | 52117      |
| policy_loss             | -156.18253 |
| qf1_loss                | 9.894957   |
| qf2_loss                | 11.857658  |
| time_elapsed            | 624        |
| total timesteps         | 52216      |
| value_loss              | 6.3636813  |
----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.01969059 |
| ent_coef_loss           | -0.7623328 |
| entropy                 | 14.413048  |
| episodes                | 800        |
| fps                     | 83         |
| mean 100 episode reward | 418        |
| n_updates     

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.020037962 |
| ent_coef_loss           | 6.1514173   |
| entropy                 | 14.525033   |
| episodes                | 920         |
| fps                     | 83          |
| mean 100 episode reward | 459         |
| n_updates               | 63531       |
| policy_loss             | -131.87897  |
| qf1_loss                | 9.435118    |
| qf2_loss                | 9.309918    |
| time_elapsed            | 765         |
| total timesteps         | 63630       |
| value_loss              | 8.996816    |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.019466816 |
| ent_coef_loss           | 1.3782105   |
| entropy                 | 14.206264   |
| episodes                | 930         |
| fps                     | 82          |
| mean 100 episode reward | 456   

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.020101517 |
| ent_coef_loss           | 3.1416376   |
| entropy                 | 13.722512   |
| episodes                | 1050        |
| fps                     | 81          |
| mean 100 episode reward | 418         |
| n_updates               | 74837       |
| policy_loss             | -160.04816  |
| qf1_loss                | 9.237646    |
| qf2_loss                | 11.386145   |
| time_elapsed            | 918         |
| total timesteps         | 74936       |
| value_loss              | 15.129594   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.020263908 |
| ent_coef_loss           | 2.2702003   |
| entropy                 | 13.851796   |
| episodes                | 1060        |
| fps                     | 81          |
| mean 100 episode reward | 419   

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.018651247 |
| ent_coef_loss           | -4.556013   |
| entropy                 | 14.131436   |
| episodes                | 1180        |
| fps                     | 81          |
| mean 100 episode reward | 475         |
| n_updates               | 87279       |
| policy_loss             | -157.82835  |
| qf1_loss                | 12.037783   |
| qf2_loss                | 9.379263    |
| time_elapsed            | 1070        |
| total timesteps         | 87378       |
| value_loss              | 13.330234   |
-----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.01911511 |
| ent_coef_loss           | -3.373487  |
| entropy                 | 14.054893  |
| episodes                | 1190       |
| fps                     | 81         |
| mean 100 episode reward | 469        |


----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.01961873 |
| ent_coef_loss           | -1.9158077 |
| entropy                 | 14.016521  |
| episodes                | 1310       |
| fps                     | 81         |
| mean 100 episode reward | 469        |
| n_updates               | 99210      |
| policy_loss             | -165.45087 |
| qf1_loss                | 8.1846285  |
| qf2_loss                | 10.68044   |
| time_elapsed            | 1216       |
| total timesteps         | 99309      |
| value_loss              | 6.167562   |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.019552207 |
| ent_coef_loss           | 7.219985    |
| entropy                 | 14.475283   |
| episodes                | 1320        |
| fps                     | 81          |
| mean 100 episode reward | 480         |
| n_upda

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.020172188 |
| ent_coef_loss           | -4.1439753  |
| entropy                 | 14.109295   |
| episodes                | 1440        |
| fps                     | 81          |
| mean 100 episode reward | 511         |
| n_updates               | 112782      |
| policy_loss             | -167.24106  |
| qf1_loss                | 9.847499    |
| qf2_loss                | 8.819744    |
| time_elapsed            | 1378        |
| total timesteps         | 112881      |
| value_loss              | 8.573309    |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.020059645 |
| ent_coef_loss           | 3.5800712   |
| entropy                 | 14.011251   |
| episodes                | 1450        |
| fps                     | 81          |
| mean 100 episode reward | 515   

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.021854691 |
| ent_coef_loss           | 10.252174   |
| entropy                 | 14.087208   |
| episodes                | 1570        |
| fps                     | 82          |
| mean 100 episode reward | 584         |
| n_updates               | 127696      |
| policy_loss             | -174.08969  |
| qf1_loss                | 11.697836   |
| qf2_loss                | 11.466039   |
| time_elapsed            | 1546        |
| total timesteps         | 127795      |
| value_loss              | 5.6389265   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.021884589 |
| ent_coef_loss           | -4.7579403  |
| entropy                 | 13.993997   |
| episodes                | 1580        |
| fps                     | 82          |
| mean 100 episode reward | 588   

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.02267449 |
| ent_coef_loss           | 2.6617682  |
| entropy                 | 14.564955  |
| episodes                | 1700       |
| fps                     | 82         |
| mean 100 episode reward | 603        |
| n_updates               | 143355     |
| policy_loss             | -172.85701 |
| qf1_loss                | 12.463733  |
| qf2_loss                | 20.294323  |
| time_elapsed            | 1728       |
| total timesteps         | 143454     |
| value_loss              | 16.317646  |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.023216456 |
| ent_coef_loss           | -3.8325965  |
| entropy                 | 14.862577   |
| episodes                | 1710        |
| fps                     | 83          |
| mean 100 episode reward | 628         |
| n_upda

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.025403362 |
| ent_coef_loss           | 1.0383785   |
| entropy                 | 14.67302    |
| episodes                | 1830        |
| fps                     | 83          |
| mean 100 episode reward | 635         |
| n_updates               | 160096      |
| policy_loss             | -188.39058  |
| qf1_loss                | 10.775896   |
| qf2_loss                | 10.491465   |
| time_elapsed            | 1929        |
| total timesteps         | 160195      |
| value_loss              | 5.5978837   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.024729798 |
| ent_coef_loss           | 7.954749    |
| entropy                 | 14.163939   |
| episodes                | 1840        |
| fps                     | 83          |
| mean 100 episode reward | 635   

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.024816705 |
| ent_coef_loss           | 2.7036355   |
| entropy                 | 14.360016   |
| episodes                | 1960        |
| fps                     | 83          |
| mean 100 episode reward | 688         |
| n_updates               | 177917      |
| policy_loss             | -198.94154  |
| qf1_loss                | 10.875999   |
| qf2_loss                | 10.9692955  |
| time_elapsed            | 2136        |
| total timesteps         | 178016      |
| value_loss              | 8.991541    |
-----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.02555347 |
| ent_coef_loss           | -5.0055532 |
| entropy                 | 14.194661  |
| episodes                | 1970       |
| fps                     | 83         |
| mean 100 episode reward | 687        |


-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.027005613 |
| ent_coef_loss           | 1.356554    |
| entropy                 | 14.211111   |
| episodes                | 2090        |
| fps                     | 83          |
| mean 100 episode reward | 676         |
| n_updates               | 195395      |
| policy_loss             | -196.23822  |
| qf1_loss                | 13.026674   |
| qf2_loss                | 16.747437   |
| time_elapsed            | 2336        |
| total timesteps         | 195494      |
| value_loss              | 8.841707    |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.027444653 |
| ent_coef_loss           | 1.7154655   |
| entropy                 | 14.744289   |
| episodes                | 2100        |
| fps                     | 83          |
| mean 100 episode reward | 702   

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.028630137 |
| ent_coef_loss           | 1.367549    |
| entropy                 | 14.484201   |
| episodes                | 2220        |
| fps                     | 83          |
| mean 100 episode reward | 706         |
| n_updates               | 213626      |
| policy_loss             | -193.60297  |
| qf1_loss                | 22.198212   |
| qf2_loss                | 20.824247   |
| time_elapsed            | 2547        |
| total timesteps         | 213725      |
| value_loss              | 8.71564     |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.029931005 |
| ent_coef_loss           | -2.2275922  |
| entropy                 | 14.862116   |
| episodes                | 2230        |
| fps                     | 83          |
| mean 100 episode reward | 728   

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.032047678 |
| ent_coef_loss           | -0.2857238  |
| entropy                 | 14.773907   |
| episodes                | 2350        |
| fps                     | 84          |
| mean 100 episode reward | 787         |
| n_updates               | 233971      |
| policy_loss             | -202.5552   |
| qf1_loss                | 21.652813   |
| qf2_loss                | 23.207275   |
| time_elapsed            | 2771        |
| total timesteps         | 234070      |
| value_loss              | 19.748571   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.031414926 |
| ent_coef_loss           | 6.5624857   |
| entropy                 | 13.967148   |
| episodes                | 2360        |
| fps                     | 84          |
| mean 100 episode reward | 827   

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03169588 |
| ent_coef_loss           | 8.472141   |
| entropy                 | 14.012999  |
| episodes                | 2480       |
| fps                     | 85         |
| mean 100 episode reward | 845        |
| n_updates               | 256431     |
| policy_loss             | -206.37323 |
| qf1_loss                | 14.906103  |
| qf2_loss                | 16.658484  |
| time_elapsed            | 3011       |
| total timesteps         | 256530     |
| value_loss              | 25.834759  |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.034261048 |
| ent_coef_loss           | 1.6934528   |
| entropy                 | 14.779165   |
| episodes                | 2490        |
| fps                     | 85          |
| mean 100 episode reward | 829         |
| n_upda

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.033204462 |
| ent_coef_loss           | -4.409906   |
| entropy                 | 14.352977   |
| episodes                | 2610        |
| fps                     | 85          |
| mean 100 episode reward | 1.14e+03    |
| n_updates               | 284285      |
| policy_loss             | -228.30515  |
| qf1_loss                | 15.729067   |
| qf2_loss                | 14.471083   |
| time_elapsed            | 3310        |
| total timesteps         | 284384      |
| value_loss              | 15.70075    |
-----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03464734 |
| ent_coef_loss           | 2.9685683  |
| entropy                 | 13.7981    |
| episodes                | 2620       |
| fps                     | 85         |
| mean 100 episode reward | 1.14e+03   |


-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.035443347 |
| ent_coef_loss           | 1.296416    |
| entropy                 | 14.118126   |
| episodes                | 2740        |
| fps                     | 85          |
| mean 100 episode reward | 1.31e+03    |
| n_updates               | 316832      |
| policy_loss             | -250.76245  |
| qf1_loss                | 23.818066   |
| qf2_loss                | 21.024256   |
| time_elapsed            | 3695        |
| total timesteps         | 316931      |
| value_loss              | 10.731155   |
-----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03451925 |
| ent_coef_loss           | -6.885414  |
| entropy                 | 14.260357  |
| episodes                | 2750       |
| fps                     | 85         |
| mean 100 episode reward | 1.3e+03    |


----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03544226 |
| ent_coef_loss           | 2.3151364  |
| entropy                 | 14.032547  |
| episodes                | 2870       |
| fps                     | 85         |
| mean 100 episode reward | 1.32e+03   |
| n_updates               | 351556     |
| policy_loss             | -261.39484 |
| qf1_loss                | 19.124615  |
| qf2_loss                | 27.353382  |
| time_elapsed            | 4097       |
| total timesteps         | 351655     |
| value_loss              | 11.358231  |
----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.0356659  |
| ent_coef_loss           | -1.3157396 |
| entropy                 | 14.046041  |
| episodes                | 2880       |
| fps                     | 85         |
| mean 100 episode reward | 1.34e+03   |
| n_updates     

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03557991 |
| ent_coef_loss           | 11.05124   |
| entropy                 | 13.709602  |
| episodes                | 3000       |
| fps                     | 86         |
| mean 100 episode reward | 1.42e+03   |
| n_updates               | 388695     |
| policy_loss             | -252.17758 |
| qf1_loss                | 43.397682  |
| qf2_loss                | 41.258636  |
| time_elapsed            | 4480       |
| total timesteps         | 388794     |
| value_loss              | 39.501205  |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.035019908 |
| ent_coef_loss           | -5.1553345  |
| entropy                 | 13.97574    |
| episodes                | 3010        |
| fps                     | 86          |
| mean 100 episode reward | 1.46e+03    |
| n_upda

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.035768785 |
| ent_coef_loss           | 4.143008    |
| entropy                 | 13.937021   |
| episodes                | 3130        |
| fps                     | 87          |
| mean 100 episode reward | 1.69e+03    |
| n_updates               | 430240      |
| policy_loss             | -262.41327  |
| qf1_loss                | 63.21901    |
| qf2_loss                | 50.19799    |
| time_elapsed            | 4907        |
| total timesteps         | 430339      |
| value_loss              | 43.217396   |
-----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03629561 |
| ent_coef_loss           | 4.1929073  |
| entropy                 | 14.20621   |
| episodes                | 3140       |
| fps                     | 87         |
| mean 100 episode reward | 1.7e+03    |


-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.037483383 |
| ent_coef_loss           | 1.0392596   |
| entropy                 | 14.298502   |
| episodes                | 3260        |
| fps                     | 88          |
| mean 100 episode reward | 1.62e+03    |
| n_updates               | 471535      |
| policy_loss             | -251.45934  |
| qf1_loss                | 18.404839   |
| qf2_loss                | 20.09607    |
| time_elapsed            | 5332        |
| total timesteps         | 471634      |
| value_loss              | 21.696196   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.035678267 |
| ent_coef_loss           | 9.805456    |
| entropy                 | 13.497698   |
| episodes                | 3270        |
| fps                     | 88          |
| mean 100 episode reward | 1.62e+

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.036740165 |
| ent_coef_loss           | 10.811926   |
| entropy                 | 13.900652   |
| episodes                | 3390        |
| fps                     | 89          |
| mean 100 episode reward | 1.72e+03    |
| n_updates               | 513698      |
| policy_loss             | -272.53268  |
| qf1_loss                | 46.048904   |
| qf2_loss                | 25.930641   |
| time_elapsed            | 5767        |
| total timesteps         | 513797      |
| value_loss              | 20.413235   |
-----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03798117 |
| ent_coef_loss           | -1.7176542 |
| entropy                 | 14.184507  |
| episodes                | 3400       |
| fps                     | 89         |
| mean 100 episode reward | 1.64e+03   |


-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.036207758 |
| ent_coef_loss           | -4.8840065  |
| entropy                 | 14.272851   |
| episodes                | 3520        |
| fps                     | 89          |
| mean 100 episode reward | 1.76e+03    |
| n_updates               | 556902      |
| policy_loss             | -278.7376   |
| qf1_loss                | 23.210829   |
| qf2_loss                | 24.893444   |
| time_elapsed            | 6234        |
| total timesteps         | 557001      |
| value_loss              | 20.583055   |
-----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03762919 |
| ent_coef_loss           | 9.801144   |
| entropy                 | 13.823221  |
| episodes                | 3530       |
| fps                     | 89         |
| mean 100 episode reward | 1.81e+03   |


-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.037362065 |
| ent_coef_loss           | 8.284481    |
| entropy                 | 14.298344   |
| episodes                | 3650        |
| fps                     | 87          |
| mean 100 episode reward | 2.31e+03    |
| n_updates               | 611025      |
| policy_loss             | -271.28485  |
| qf1_loss                | 48.680904   |
| qf2_loss                | 33.29479    |
| time_elapsed            | 6952        |
| total timesteps         | 611124      |
| value_loss              | 25.339237   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.036210943 |
| ent_coef_loss           | 1.5827245   |
| entropy                 | 13.685816   |
| episodes                | 3660        |
| fps                     | 87          |
| mean 100 episode reward | 2.28e+

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.038632784 |
| ent_coef_loss           | -1.827709   |
| entropy                 | 14.673859   |
| episodes                | 3780        |
| fps                     | 86          |
| mean 100 episode reward | 2.59e+03    |
| n_updates               | 673564      |
| policy_loss             | -286.11237  |
| qf1_loss                | 29.005793   |
| qf2_loss                | 16.245358   |
| time_elapsed            | 7745        |
| total timesteps         | 673663      |
| value_loss              | 17.398264   |
-----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.0383379  |
| ent_coef_loss           | -3.291133  |
| entropy                 | 14.480913  |
| episodes                | 3790       |
| fps                     | 86         |
| mean 100 episode reward | 2.67e+03   |


----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03791237 |
| ent_coef_loss           | -2.402586  |
| entropy                 | 13.7119465 |
| episodes                | 3910       |
| fps                     | 85         |
| mean 100 episode reward | 2.89e+03   |
| n_updates               | 743331     |
| policy_loss             | -331.50604 |
| qf1_loss                | 39.811462  |
| qf2_loss                | 53.046463  |
| time_elapsed            | 8741       |
| total timesteps         | 743430     |
| value_loss              | 14.136328  |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.035781983 |
| ent_coef_loss           | -0.16826773 |
| entropy                 | 14.148108   |
| episodes                | 3920        |
| fps                     | 84          |
| mean 100 episode reward | 2.84e+03    |
| n_upda

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.036069762 |
| ent_coef_loss           | 1.3314829   |
| entropy                 | 14.136479   |
| episodes                | 4040        |
| fps                     | 84          |
| mean 100 episode reward | 3.38e+03    |
| n_updates               | 823614      |
| policy_loss             | -353.62585  |
| qf1_loss                | 23.589272   |
| qf2_loss                | 15.734314   |
| time_elapsed            | 9790        |
| total timesteps         | 823713      |
| value_loss              | 10.930536   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.036388386 |
| ent_coef_loss           | 7.2973404   |
| entropy                 | 14.12973    |
| episodes                | 4050        |
| fps                     | 84          |
| mean 100 episode reward | 3.45e+

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.0377008  |
| ent_coef_loss           | 6.1839786  |
| entropy                 | 14.493202  |
| episodes                | 4170       |
| fps                     | 83         |
| mean 100 episode reward | 3.59e+03   |
| n_updates               | 912010     |
| policy_loss             | -366.35202 |
| qf1_loss                | 44.418144  |
| qf2_loss                | 43.077236  |
| time_elapsed            | 10905      |
| total timesteps         | 912109     |
| value_loss              | 10.091852  |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.035857476 |
| ent_coef_loss           | -11.331043  |
| entropy                 | 14.412275   |
| episodes                | 4180        |
| fps                     | 83          |
| mean 100 episode reward | 3.66e+03    |
| n_upda

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.032931622 |
| ent_coef_loss           | 1.09899     |
| entropy                 | 14.206955   |
| episodes                | 4300        |
| fps                     | 83          |
| mean 100 episode reward | 3.95e+03    |
| n_updates               | 1006556     |
| policy_loss             | -382.5041   |
| qf1_loss                | 7.176057    |
| qf2_loss                | 16.449078   |
| time_elapsed            | 12024       |
| total timesteps         | 1006655     |
| value_loss              | 21.143557   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.034302782 |
| ent_coef_loss           | -3.8168266  |
| entropy                 | 15.025036   |
| episodes                | 4310        |
| fps                     | 83          |
| mean 100 episode reward | 3.83e+

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.0333493  |
| ent_coef_loss           | 4.334362   |
| entropy                 | 13.478243  |
| episodes                | 4430       |
| fps                     | 83         |
| mean 100 episode reward | 3.01e+03   |
| n_updates               | 1085252    |
| policy_loss             | -361.03802 |
| qf1_loss                | 23.185667  |
| qf2_loss                | 18.590303  |
| time_elapsed            | 13005      |
| total timesteps         | 1085351    |
| value_loss              | 28.853842  |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.031090148 |
| ent_coef_loss           | 1.4938004   |
| entropy                 | 15.161728   |
| episodes                | 4440        |
| fps                     | 83          |
| mean 100 episode reward | 2.92e+03    |
| n_upda

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.031548403 |
| ent_coef_loss           | -8.923805   |
| entropy                 | 15.036744   |
| episodes                | 4560        |
| fps                     | 83          |
| mean 100 episode reward | 3.22e+03    |
| n_updates               | 1164006     |
| policy_loss             | -389.75464  |
| qf1_loss                | 13.231338   |
| qf2_loss                | 9.818841    |
| time_elapsed            | 13976       |
| total timesteps         | 1164105     |
| value_loss              | 8.287279    |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.030821295 |
| ent_coef_loss           | -10.210688  |
| entropy                 | 14.983618   |
| episodes                | 4570        |
| fps                     | 83          |
| mean 100 episode reward | 3.31e+

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.027718937 |
| ent_coef_loss           | -6.5352225  |
| entropy                 | 13.675625   |
| episodes                | 4690        |
| fps                     | 83          |
| mean 100 episode reward | 3.5e+03     |
| n_updates               | 1247948     |
| policy_loss             | -413.52417  |
| qf1_loss                | 18.187092   |
| qf2_loss                | 26.71135    |
| time_elapsed            | 14991       |
| total timesteps         | 1248047     |
| value_loss              | 17.072006   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.028489698 |
| ent_coef_loss           | -0.1266368  |
| entropy                 | 13.98067    |
| episodes                | 4700        |
| fps                     | 83          |
| mean 100 episode reward | 3.56e+

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.030893821 |
| ent_coef_loss           | 2.0187287   |
| entropy                 | 14.762737   |
| episodes                | 4820        |
| fps                     | 82          |
| mean 100 episode reward | 4.2e+03     |
| n_updates               | 1348624     |
| policy_loss             | -423.39713  |
| qf1_loss                | 26.353664   |
| qf2_loss                | 23.610834   |
| time_elapsed            | 16347       |
| total timesteps         | 1348723     |
| value_loss              | 16.155067   |
-----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03088525 |
| ent_coef_loss           | -12.652441 |
| entropy                 | 15.114635  |
| episodes                | 4830       |
| fps                     | 82         |
| mean 100 episode reward | 4.09e+03   |


-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.027849646 |
| ent_coef_loss           | 16.995588   |
| entropy                 | 13.534981   |
| episodes                | 4950        |
| fps                     | 81          |
| mean 100 episode reward | 3.84e+03    |
| n_updates               | 1437063     |
| policy_loss             | -400.14398  |
| qf1_loss                | 25.95176    |
| qf2_loss                | 22.867792   |
| time_elapsed            | 17536       |
| total timesteps         | 1437162     |
| value_loss              | 12.761326   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.027199231 |
| ent_coef_loss           | 14.929903   |
| entropy                 | 14.32896    |
| episodes                | 4960        |
| fps                     | 81          |
| mean 100 episode reward | 3.83e+

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.027761528 |
| ent_coef_loss           | -11.949507  |
| entropy                 | 15.105493   |
| episodes                | 5080        |
| fps                     | 81          |
| mean 100 episode reward | 3.84e+03    |
| n_updates               | 1529659     |
| policy_loss             | -444.05518  |
| qf1_loss                | 7.627598    |
| qf2_loss                | 6.6715155   |
| time_elapsed            | 18767       |
| total timesteps         | 1529758     |
| value_loss              | 5.973584    |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.026431212 |
| ent_coef_loss           | -10.720377  |
| entropy                 | 14.92044    |
| episodes                | 5090        |
| fps                     | 81          |
| mean 100 episode reward | 3.99e+

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.03022234 |
| ent_coef_loss           | 7.232462   |
| entropy                 | 13.993225  |
| episodes                | 5210       |
| fps                     | 82         |
| mean 100 episode reward | 4.43e+03   |
| n_updates               | 1633095    |
| policy_loss             | -440.28934 |
| qf1_loss                | 15.385001  |
| qf2_loss                | 16.51794   |
| time_elapsed            | 19869      |
| total timesteps         | 1633194    |
| value_loss              | 6.684456   |
----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.02793943 |
| ent_coef_loss           | -3.0425763 |
| entropy                 | 14.281975  |
| episodes                | 5220       |
| fps                     | 82         |
| mean 100 episode reward | 4.25e+03   |
| n_updates     

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.031042188 |
| ent_coef_loss           | -1.7417643  |
| entropy                 | 13.8299885  |
| episodes                | 5340        |
| fps                     | 82          |
| mean 100 episode reward | 3.76e+03    |
| n_updates               | 1719784     |
| policy_loss             | -420.516    |
| qf1_loss                | 41.28412    |
| qf2_loss                | 12.842125   |
| time_elapsed            | 20789       |
| total timesteps         | 1719883     |
| value_loss              | 7.440008    |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.030456392 |
| ent_coef_loss           | 7.4296513   |
| entropy                 | 13.376277   |
| episodes                | 5350        |
| fps                     | 82          |
| mean 100 episode reward | 3.84e+

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.029370407 |
| ent_coef_loss           | -0.60967827 |
| entropy                 | 14.126799   |
| episodes                | 5470        |
| fps                     | 83          |
| mean 100 episode reward | 3.94e+03    |
| n_updates               | 1812632     |
| policy_loss             | -425.5194   |
| qf1_loss                | 16.695164   |
| qf2_loss                | 14.571318   |
| time_elapsed            | 21784       |
| total timesteps         | 1812731     |
| value_loss              | 23.012783   |
-----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.030695682 |
| ent_coef_loss           | -0.45108318 |
| entropy                 | 14.259212   |
| episodes                | 5480        |
| fps                     | 83          |
| mean 100 episode reward | 3.99e+

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.02894218 |
| ent_coef_loss           | -2.8412938 |
| entropy                 | 15.157005  |
| episodes                | 5600       |
| fps                     | 83         |
| mean 100 episode reward | 4.37e+03   |
| n_updates               | 1916039    |
| policy_loss             | -436.90063 |
| qf1_loss                | 15.057048  |
| qf2_loss                | 10.85241   |
| time_elapsed            | 22869      |
| total timesteps         | 1916138    |
| value_loss              | 7.080185   |
----------------------------------------
-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.031102566 |
| ent_coef_loss           | 0.67548156  |
| entropy                 | 14.139357   |
| episodes                | 5610        |
| fps                     | 83          |
| mean 100 episode reward | 4.44e+03    |
| n_upda

In [3]:
# RUN THE SAVED MODEL
env = gym.make('Humanoid-v2')
model = SAC.load("humanoid_2M")

for _ in range(10):
    obs = env.reset()
    for _ in range(1000):
        action, _states = model.predict(obs)
        obs, rewards, dones, info = env.step(action)
        env.render()
env.close()

Loading a model without an environment, this model cannot be trained until it has a valid environment.




Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Use keras.layers.dense instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor








Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Creating window glfw


SystemExit: 0

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [2]:
# DISPLAY THE RESULTS
%tensorboard --logdir sac_humanoid_walk_tensorboard

invalid info file: '/tmp/.tensorboard-info/pid-38328.info'
Traceback (most recent call last):
  File "/home/mike/anaconda3/envs/rl/lib/python3.7/site-packages/tensorboard/manager.py", line 149, in _info_from_string
    json_value = json.loads(info_string)
  File "/home/mike/anaconda3/envs/rl/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/mike/anaconda3/envs/rl/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/mike/anaconda3/envs/rl/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mike/anaconda3/envs/rl/lib/python3.7/site-packages/tensorboard/manager.py", line 316, in get_all
    info = _info_from_st

## Task: Stand up from a flat beginning position

In [3]:
# DEFINE THE ENVIRONMENT/MODEL WITH TUNED PARAMETERS
env = gym.make('HumanoidStandup-v2')
policy_kwargs = dict(layers=[256, 256])
model = SAC(MlpPolicy, env, verbose=1, tensorboard_log="./sac_humanoid_standup_tensorboard/", 
            policy_kwargs=policy_kwargs, buffer_size=1000000)





Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Use keras.layers.dense instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor








Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where




In [None]:
# TRAIN THE MODEL
model.learn(total_timesteps=int(2e6), log_interval=10)
model.save("humanoid_standup_2M")


----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.3715919  |
| ent_coef_loss           | 1.0368845  |
| entropy                 | 15.556219  |
| episodes                | 10         |
| fps                     | 72         |
| mean 100 episode reward | 7.06e+04   |
| n_updates               | 8901       |
| policy_loss             | -2343.0728 |
| qf1_loss                | 442.96002  |
| qf2_loss                | 353.17847  |
| time_elapsed            | 123        |
| total timesteps         | 9000       |
| value_loss              | 398.77277  |
----------------------------------------
----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.6711767  |
| ent_coef_loss           | 0.33287117 |
| entropy                 | 15.5787325 |
| episodes                | 20         |
| fps                     | 81         |
| mean 100 episode reward | 7.52e+04   |
| n_updates    

In [None]:
# RUN THE SAVED MODEL
env = gym.make('HumanoidStandup-v2')
model = SAC.load("humanoid_standup_2M")

for _ in range(10):
    obs = env.reset()
    for _ in range(1000):
        action, _states = model.predict(obs)
        obs, rewards, dones, info = env.step(action)
        env.render()
env.close()

In [None]:
# DISPLAY THE RESULTS
%tensorboard --logdir sac_humanoid_standup_tensorboard

## Task: Teach a 'half-cheetah' a gait to run forward

In [None]:
# DEFINE THE ENVIRONMENT/MODEL WITH TUNED PARAMETERS
env = gym.make('HalfCheetah-v2')
policy_kwargs = dict(layers=[256, 256])
model = SAC(MlpPolicy, env, verbose=1, tensorboard_log="./sac_half_cheetah_tensorboard/", 
            policy_kwargs=policy_kwargs, buffer_size=1000000)

In [None]:
# TRAIN THE MODEL
model.learn(total_timesteps=int(2e6), log_interval=10)
model.save("half_cheetah_2M")

In [None]:
# RUN THE SAVED MODEL
env = gym.make('HumanoidStandup-v2')
model = SAC.load("humanoid_standup_2M")

for _ in range(10):
    obs = env.reset()
    for _ in range(1000):
        action, _states = model.predict(obs)
        obs, rewards, dones, info = env.step(action)
        env.render()
env.close()

In [None]:
# DISPLAY THE RESULTS
%tensorboard --logdir sac_half_cheetah_tensorboard

## Task: Teach an 'ant-like' quadruped a gait to move forward 

In [None]:
# DEFINE THE ENVIRONMENT/MODEL WITH TUNED PARAMETERS
env = gym.make('Ant-v2')
policy_kwargs = dict(layers=[256, 256])
model = SAC(MlpPolicy, env, verbose=1, tensorboard_log="./sac_ant_tensorboard/", 
            policy_kwargs=policy_kwargs, buffer_size=1000000)

In [None]:
# TRAIN THE MODEL
model.learn(total_timesteps=int(2e6), log_interval=10)
model.save("ant_2M")

In [None]:
# RUN THE SAVED MODEL
env = gym.make('Ant-v2')
model = SAC.load("ant_2M")

for _ in range(10):
    obs = env.reset()
    for _ in range(1000):
        action, _states = model.predict(obs)
        obs, rewards, dones, info = env.step(action)
        env.render()
env.close()

In [None]:
# DISPLAY THE RESULTS
%tensorboard --logdir sac_ant_tensorboard