# 01 Deep Q agent hyper-parameter tuning with

<img src="https://optuna.readthedocs.io/en/stable/_static/optuna-logo.png" width="500" height="400" />

<div align="center">
<h1>+</h1>
</div>

<img src="https://camo.githubusercontent.com/51f9ec0b7f6d5a4f51d78a59c860d02c80f58f6d84ae85e6e7122c92f7776346/68747470733a2f2f6e657074756e652e61692f77702d636f6e74656e742f75706c6f6164732f6e657074756e652d6c6f676f2d6c6573732d6d617267696e2d65313631313933393734323638332e706e67" width="500" height="400" />

#### 👉 Let's train a Deep Q agent to solve the `Cart Pole` environment.

![nn](https://github.com/Paulescu/hands-on-rl/blob/main/03_cart_pole/images/neural_net.jpg?raw=true)

In [1]:
%load_ext autoreload
%autoreload 2
%pylab inline
%config InlineBackend.figure_format = 'svg'

Populating the interactive namespace from numpy and matplotlib


## Environment 🌎

In [2]:
import gym
env = gym.make('CartPole-v1')

## Create a Neptune run

In [None]:
# %env NEPTUNE_PROJECT=<>
# %env NEPTUNE_API_TOKEN=<>

In [5]:
import os
import neptune.new as neptune

try:
    NEPTUNE_PROJECT = os.environ['NEPTUNE_PROJECT']
    NEPTUNE_API_TOKEN = os.environ['NEPTUNE_API_TOKEN']
except:
    print('Set environment variables NEPTUNE_PROJECT and NEPTUNE_API_TOKEN')
    raise

run = neptune.init(
    project=NEPTUNE_PROJECT,
    api_token=NEPTUNE_API_TOKEN,
    tags=['hparam_search_with_optuna']
)

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-51


Info (NVML): NVML Shared Library Not Found. GPU usage metrics may not be reported. For more information, see https://docs.neptune.ai/you-should-know/what-can-you-log-and-display#hardware-consumption


Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.


## NeptuneCallback that will integrate with Optuna

In [6]:
import neptune.new.integrations.optuna as optuna_utils

neptune_callback = optuna_utils.NeptuneCallback(run)

## Create an Optuna study

In [7]:
import optuna

study = optuna.create_study(
    study_name='hyperparameters_deep_q_agent',
    direction='maximize',
)

[32m[I 2022-02-15 20:57:11,823][0m A new study created in memory with name: hyperparameters_deep_q_agent[0m


## Objective function we want Optuna to maximize

In [8]:
from src.optimize_hyperparameters import objective
func = lambda trial: objective(trial, force_linear_model=False, n_episodes_to_train=500)

## Let's start the search!

In [9]:
study.optimize(func, n_trials=100, callbacks=[neptune_callback])

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-52
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:02<00:00, 205.84it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:01<00:00, 549.93it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 47 operations to synchronize with Neptune. Do not kill this process.


All 47 operations synced, thanks for waiting!


[32m[I 2022-02-15 20:57:27,277][0m Trial 0 finished with value: 9.361 and parameters: {'learning_rate': 3.125530492111703e-05, 'discount_factor': 0.99, 'batch_size': 128, 'memory_size': 10000, 'freq_steps_train': 16, 'freq_steps_update_target': 1000, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.03325946766499619, 'steps_epsilon_decay': 10000, 'seed': 697068871}. Best is trial 0 with value: 9.361.[0m
[33m[W 2022-02-15 20:57:37,091][0m Param batch_size unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:37,127][0m Param batch_size unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:37,159][0m Param batch_size unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:37,190][0m Param batch_size unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:37,222][0m Param batch_size unique value length is less than 2.

[33m[W 2022-02-15 20:57:39,483][0m Param freq_steps_update_target unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:39,507][0m Param freq_steps_update_target unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:39,532][0m Param freq_steps_update_target unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:39,555][0m Param freq_steps_update_target unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:39,587][0m Param freq_steps_update_target unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:39,622][0m Param freq_steps_update_target unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:39,643][0m Param freq_steps_update_target unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:39,676][0m Param learning_rate unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:39,713][0m Param learning_rate unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:39,746][0m Param learnin

[33m[W 2022-02-15 20:57:42,119][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:42,166][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:42,200][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:42,232][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:42,267][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:42,339][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:42,379][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:42,412][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:42,455][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:57:42,494][0m Param normalize_state unique value length is less than 2.[0m
[

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-53
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:04<00:00, 106.23it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:45<00:00, 21.98it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 19 operations to synchronize with Neptune. Do not kill this process.


All 19 operations synced, thanks for waiting!


[32m[I 2022-02-15 20:58:35,880][0m Trial 1 finished with value: 175.159 and parameters: {'learning_rate': 0.00792430400805493, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 128, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.12528322878186376, 'steps_epsilon_decay': 100000, 'seed': 61323514}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 20:58:46,645][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:46,893][0m Param n_steps_warm_up_memory unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:46,925][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:46,968][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:47,156][0m Param epsilon_start uni

[33m[W 2022-02-15 20:58:52,809][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:52,855][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:52,914][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:52,957][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:53,006][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:53,055][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:53,097][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:53,137][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:53,175][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:58:53,224][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-54
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:04<00:00, 109.34it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:02<00:00, 438.59it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 19 operations to synchronize with Neptune. Do not kill this process.


All 19 operations synced, thanks for waiting!


[32m[I 2022-02-15 20:59:05,101][0m Trial 2 finished with value: 9.376 and parameters: {'learning_rate': 0.1219008154844694, 'discount_factor': 0.99, 'batch_size': 64, 'memory_size': 100000, 'freq_steps_train': 128, 'freq_steps_update_target': 1000, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.01254054889919023, 'steps_epsilon_decay': 10000, 'seed': 135762335}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 20:59:15,495][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:15,747][0m Param n_steps_warm_up_memory unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:15,779][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:15,820][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:16,006][0m Param epsilon_start uni

[33m[W 2022-02-15 20:59:20,895][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:20,931][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:20,977][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:21,015][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:21,049][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:21,084][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:21,121][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:21,169][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:21,212][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:21,269][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-55
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:02<00:00, 224.71it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:02<00:00, 336.60it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 43 operations to synchronize with Neptune. Do not kill this process.


All 43 operations synced, thanks for waiting!


[32m[I 2022-02-15 20:59:31,399][0m Trial 3 finished with value: 9.401 and parameters: {'learning_rate': 0.14112716889048574, 'discount_factor': 0.99, 'batch_size': 64, 'memory_size': 10000, 'freq_steps_train': 256, 'freq_steps_update_target': 1000, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.1995740952439778, 'steps_epsilon_decay': 1000, 'seed': 47644304}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 20:59:44,024][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:44,342][0m Param n_steps_warm_up_memory unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:44,386][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:44,440][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:44,676][0m Param epsilon_start unique 

[33m[W 2022-02-15 20:59:49,957][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:49,994][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:50,043][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:50,080][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:50,116][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:50,153][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:50,189][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:50,223][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:50,259][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 20:59:50,307][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-56
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:02<00:00, 178.56it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 314.07it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 23 operations to synchronize with Neptune. Do not kill this process.


All 23 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:00:00,562][0m Trial 4 finished with value: 9.402 and parameters: {'learning_rate': 0.21103934616164668, 'discount_factor': 0.9, 'batch_size': 128, 'memory_size': 10000, 'freq_steps_train': 8, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.030970262167121357, 'steps_epsilon_decay': 1000, 'seed': 217902796}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 21:00:12,522][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:12,768][0m Param n_steps_warm_up_memory unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:12,799][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:12,837][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:13,016][0m Param epsilon_start unique v

[33m[W 2022-02-15 21:00:17,467][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:17,500][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:17,550][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:17,590][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:17,631][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:17,675][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:17,715][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:17,753][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:17,791][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:17,841][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-57
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:02<00:00, 198.05it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 303.87it/s]

Shutting down background jobs, please wait a moment...





Done!


Waiting for the remaining 34 operations to synchronize with Neptune. Do not kill this process.


All 34 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:00:28,379][0m Trial 5 finished with value: 11.73 and parameters: {'learning_rate': 1.733276458824201e-05, 'discount_factor': 0.9, 'batch_size': 64, 'memory_size': 10000, 'freq_steps_train': 256, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 1000, 'n_gradient_steps': 4, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 10, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.18086250236513282, 'steps_epsilon_decay': 100000, 'seed': 327752222}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 21:00:39,125][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:39,509][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:39,562][0m Param normalize_state unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:39,805][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:00:40,079][0m Param nn_hidden_layers unique val

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-58
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:04<00:00, 108.41it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:14<00:00, 69.54it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 27 operations to synchronize with Neptune. Do not kill this process.


All 27 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:01:13,961][0m Trial 6 finished with value: 56.991 and parameters: {'learning_rate': 0.0004656735987056431, 'discount_factor': 0.9, 'batch_size': 32, 'memory_size': 50000, 'freq_steps_train': 128, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 1000, 'n_gradient_steps': 4, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 10, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.015289348982229911, 'steps_epsilon_decay': 100000, 'seed': 212224373}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 21:01:24,087][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:01:24,375][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:01:24,591][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:01:24,778][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:01:24,957][0m Param epsilon_start unique value

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-59
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:57<00:00,  8.68it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:07<00:00, 140.36it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 12 operations to synchronize with Neptune. Do not kill this process.


All 12 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:02:38,038][0m Trial 7 finished with value: 38.895 and parameters: {'learning_rate': 0.031160106489165222, 'discount_factor': 0.95, 'batch_size': 64, 'memory_size': 10000, 'freq_steps_train': 16, 'freq_steps_update_target': 1000, 'n_steps_warm_up_memory': 1000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 10, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.07101065061827137, 'steps_epsilon_decay': 100000, 'seed': 771270857}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 21:02:46,701][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:02:47,026][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:02:47,264][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:02:47,476][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:02:47,670][0m Param epsilon_start unique val

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-60
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:01<00:00, 258.60it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:01<00:00, 584.69it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 57 operations to synchronize with Neptune. Do not kill this process.


All 57 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:02:59,668][0m Trial 8 finished with value: 9.364 and parameters: {'learning_rate': 0.19417054828896854, 'discount_factor': 0.95, 'batch_size': 128, 'memory_size': 10000, 'freq_steps_train': 128, 'freq_steps_update_target': 1000, 'n_steps_warm_up_memory': 1000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.02711708933747228, 'steps_epsilon_decay': 1000, 'seed': 502616681}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 21:03:08,679][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:03:08,940][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:03:09,152][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:03:09,350][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:03:09,533][0m Param epsilon_start unique value l

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-61
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:01<00:00, 369.79it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:01<00:00, 562.44it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 29 operations to synchronize with Neptune. Do not kill this process.


All 29 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:03:21,329][0m Trial 9 finished with value: 9.315 and parameters: {'learning_rate': 0.00010831908767331051, 'discount_factor': 0.95, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 256, 'freq_steps_update_target': 1000, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 4, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 10, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.013545099644999682, 'steps_epsilon_decay': 100000, 'seed': 164783419}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 21:03:32,273][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:03:32,624][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:03:32,961][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:03:33,246][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:03:33,696][0m Param epsilon_start unique v

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-62
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:49<00:00, 10.11it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:28<00:00, 34.52it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 7 operations to synchronize with Neptune. Do not kill this process.


All 7 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:05:02,115][0m Trial 10 finished with value: 87.822 and parameters: {'learning_rate': 0.004145269817721472, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.12443805664092739, 'steps_epsilon_decay': 100000, 'seed': 1063477976}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 21:05:12,213][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:05:12,516][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:05:12,771][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:05:13,001][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:05:13,304][0m Param epsilon_start unique valu

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-63
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:01<00:00,  8.11it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:14<00:00, 69.82it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 39 operations to synchronize with Neptune. Do not kill this process.


All 39 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:06:38,838][0m Trial 11 finished with value: 71.831 and parameters: {'learning_rate': 0.004753971779997355, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.13334233675540044, 'steps_epsilon_decay': 100000, 'seed': 1071997345}. Best is trial 1 with value: 175.159.[0m
[33m[W 2022-02-15 21:06:48,463][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:06:48,919][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:06:49,217][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:06:49,459][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:06:49,705][0m Param epsilon_start unique valu

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-64
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:27<00:00,  5.69it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [01:35<00:00, 10.47it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 34 operations to synchronize with Neptune. Do not kill this process.


All 34 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:10:03,838][0m Trial 12 finished with value: 313.91 and parameters: {'learning_rate': 0.004996227982462779, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.12262191488796657, 'steps_epsilon_decay': 100000, 'seed': 881993024}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:10:14,860][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:10:15,243][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:10:15,507][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:10:15,735][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:10:15,948][0m Param epsilon_start unique value

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-65
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:07<00:00,  7.38it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:48<00:00, 20.57it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 7 operations to synchronize with Neptune. Do not kill this process.


All 7 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:12:22,730][0m Trial 13 finished with value: 211.612 and parameters: {'learning_rate': 0.0010108577278620198, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.08885340762055942, 'steps_epsilon_decay': 100000, 'seed': 845027190}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:12:39,609][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:12:40,202][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:12:40,580][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:12:41,035][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:12:41,510][0m Param epsilon_start unique val

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-66
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [02:33<00:00,  3.26it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:35<00:00, 27.81it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 36 operations to synchronize with Neptune. Do not kill this process.


All 36 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:16:04,776][0m Trial 14 finished with value: 109.303 and parameters: {'learning_rate': 0.0005717178557321615, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.08199052689598185, 'steps_epsilon_decay': 100000, 'seed': 853752449}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:16:15,553][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:16:15,940][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:16:16,223][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:16:16,510][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:16:16,782][0m Param epsilon_start unique va

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-67
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [05:15<00:00,  1.59it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:41<00:00, 23.84it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 39 operations to synchronize with Neptune. Do not kill this process.


All 39 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:22:26,905][0m Trial 15 finished with value: 99.505 and parameters: {'learning_rate': 0.0011658190898048617, 'discount_factor': 0.9, 'batch_size': 32, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.1562015809686442, 'steps_epsilon_decay': 10000, 'seed': 917945676}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:22:53,074][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:22:53,807][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:22:54,318][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:22:54,764][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:22:55,197][0m Param epsilon_start unique value 

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-68
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:20<00:00,  6.19it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 266.65it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 9 operations to synchronize with Neptune. Do not kill this process.


All 9 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:24:36,267][0m Trial 16 finished with value: 20.462 and parameters: {'learning_rate': 0.014449891232633148, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.08797969651510802, 'steps_epsilon_decay': 100000, 'seed': 632667626}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:24:45,412][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:24:45,706][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:24:45,953][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:24:46,172][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:24:46,377][0m Param epsilon_start unique value l

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-69
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:04<00:00,  7.72it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:41<00:00, 24.30it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 14 operations to synchronize with Neptune. Do not kill this process.


All 14 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:26:40,599][0m Trial 17 finished with value: 198.859 and parameters: {'learning_rate': 0.00013896366736865129, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.058464083807100775, 'steps_epsilon_decay': 100000, 'seed': 491240780}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:26:49,035][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:26:49,295][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:26:49,517][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:26:49,713][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:26:49,890][0m Param epsilon_start unique 

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-70
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [02:32<00:00,  3.27it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:02<00:00, 383.08it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 38 operations to synchronize with Neptune. Do not kill this process.


All 38 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:29:34,480][0m Trial 18 finished with value: 9.345 and parameters: {'learning_rate': 0.7830068803039612, 'discount_factor': 0.95, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 8, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 1000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.10234002782337318, 'steps_epsilon_decay': 10000, 'seed': 950485593}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:29:45,791][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:29:46,163][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:29:46,465][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:29:46,730][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:29:46,971][0m Param epsilon_start unique value len

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-71
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:03<00:00, 146.61it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 191.78it/s]

Shutting down background jobs, please wait a moment...





Done!


Waiting for the remaining 86 operations to synchronize with Neptune. Do not kill this process.


All 86 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:30:08,828][0m Trial 19 finished with value: 10.886 and parameters: {'learning_rate': 0.0015101527728955224, 'discount_factor': 0.99, 'batch_size': 32, 'memory_size': 100000, 'freq_steps_train': 16, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 4, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.10565219537878576, 'steps_epsilon_decay': 1000, 'seed': 614433385}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:30:25,584][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:30:26,022][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:30:26,392][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:30:26,709][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:30:27,059][0m Param epsilon_start unique value l

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-72
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:56<00:00,  4.28it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:32<00:00, 30.64it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 13 operations to synchronize with Neptune. Do not kill this process.


All 13 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:33:08,764][0m Trial 20 finished with value: 159.474 and parameters: {'learning_rate': 0.024084230055697407, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 10, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.1494298475072165, 'steps_epsilon_decay': 100000, 'seed': 790887014}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:33:18,847][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:33:19,120][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:33:19,347][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:33:19,557][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:33:19,766][0m Param epsilon_start unique value

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-73
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:35<00:00,  5.22it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [01:34<00:00, 10.55it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 14 operations to synchronize with Neptune. Do not kill this process.


All 14 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:36:41,520][0m Trial 21 finished with value: 233.736 and parameters: {'learning_rate': 0.00014510654705785056, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.0584730262604114, 'steps_epsilon_decay': 100000, 'seed': 474634611}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:36:57,124][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:36:57,487][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:36:57,794][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:36:58,063][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:36:58,353][0m Param epsilon_start unique va

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-74
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:20<00:00,  6.22it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:50<00:00, 19.98it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 12 operations to synchronize with Neptune. Do not kill this process.


All 12 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:39:22,333][0m Trial 22 finished with value: 198.692 and parameters: {'learning_rate': 0.0001473235375248217, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.0637614550959938, 'steps_epsilon_decay': 100000, 'seed': 382798184}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:39:31,347][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:39:31,631][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:39:31,868][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:39:32,085][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:39:32,284][0m Param epsilon_start unique val

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-75
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:27<00:00,  5.69it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [01:01<00:00, 16.29it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 29 operations to synchronize with Neptune. Do not kill this process.


All 29 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:42:10,857][0m Trial 23 finished with value: 220.236 and parameters: {'learning_rate': 0.0004831353757207155, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.045427999061061534, 'steps_epsilon_decay': 100000, 'seed': 394301809}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:42:18,890][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:42:19,228][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:42:19,473][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:42:19,700][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:42:19,926][0m Param epsilon_start unique v

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-76
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:51<00:00,  9.70it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:25<00:00, 39.94it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 41 operations to synchronize with Neptune. Do not kill this process.


All 41 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:43:45,802][0m Trial 24 finished with value: 161.179 and parameters: {'learning_rate': 0.0003010202199559407, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.04624021259078559, 'steps_epsilon_decay': 100000, 'seed': 432898265}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:43:53,140][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:43:53,380][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:43:53,570][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:43:53,747][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:43:53,911][0m Param epsilon_start unique va

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-77
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:49<00:00, 10.19it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:42<00:00, 23.71it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 24 operations to synchronize with Neptune. Do not kill this process.


All 24 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:45:32,464][0m Trial 25 finished with value: 252.139 and parameters: {'learning_rate': 4.637720898373944e-05, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.05207153677919339, 'steps_epsilon_decay': 100000, 'seed': 280543675}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:45:40,450][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:45:40,721][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:45:40,945][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:45:41,147][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:45:41,335][0m Param epsilon_start unique val

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-78
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:42<00:00,  4.88it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:37<00:00, 26.75it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 30 operations to synchronize with Neptune. Do not kill this process.


All 30 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:48:09,269][0m Trial 26 finished with value: 219.93 and parameters: {'learning_rate': 4.840894244695482e-05, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 1000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.11049880004906254, 'steps_epsilon_decay': 100000, 'seed': 290545103}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:48:16,979][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:48:17,236][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:48:17,440][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:48:17,632][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:48:17,805][0m Param epsilon_start unique valu

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-79
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:08<00:00,  7.33it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:17<00:00, 57.26it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 29 operations to synchronize with Neptune. Do not kill this process.


All 29 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:49:50,767][0m Trial 27 finished with value: 82.186 and parameters: {'learning_rate': 1.003569848960024e-05, 'discount_factor': 0.9, 'batch_size': 128, 'memory_size': 50000, 'freq_steps_train': 16, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.07426087284332114, 'steps_epsilon_decay': 100000, 'seed': 585078892}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:50:03,986][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:50:04,347][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:50:04,674][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:50:04,991][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:50:05,343][0m Param epsilon_start unique va

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-80
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:03<00:00, 159.96it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:04<00:00, 243.15it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 74 operations to synchronize with Neptune. Do not kill this process.


All 74 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:50:27,123][0m Trial 28 finished with value: 9.358 and parameters: {'learning_rate': 5.6910093178772284e-05, 'discount_factor': 0.95, 'batch_size': 32, 'memory_size': 50000, 'freq_steps_train': 256, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 10, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.05394433577650991, 'steps_epsilon_decay': 10000, 'seed': 683544491}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:50:37,739][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:50:38,034][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:50:38,277][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:50:38,506][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:50:38,718][0m Param epsilon_start unique valu

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-81
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:02<00:00, 198.91it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 286.41it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 55 operations to synchronize with Neptune. Do not kill this process.


All 55 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:50:55,658][0m Trial 29 finished with value: 9.383 and parameters: {'learning_rate': 2.9036547081505377e-05, 'discount_factor': 0.99, 'batch_size': 128, 'memory_size': 50000, 'freq_steps_train': 16, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 4, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.002485595143758852, 'steps_epsilon_decay': 1000, 'seed': 295460832}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:51:07,877][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:51:08,187][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:51:08,436][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:51:08,661][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:51:08,868][0m Param epsilon_start unique value

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-82
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:07<00:00, 67.56it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [01:13<00:00, 13.53it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 43 operations to synchronize with Neptune. Do not kill this process.


All 43 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:52:40,210][0m Trial 30 finished with value: 250.141 and parameters: {'learning_rate': 0.002383449442370523, 'discount_factor': 0.99, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.0439323913918396, 'steps_epsilon_decay': 10000, 'seed': 727801766}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:52:49,776][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:52:50,091][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:52:50,410][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:52:50,664][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:52:50,981][0m Param epsilon_start unique value le

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-84
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:05<00:00, 89.84it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:02<00:00, 383.83it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 19 operations to synchronize with Neptune. Do not kill this process.


All 19 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:53:14,621][0m Trial 31 finished with value: 9.359 and parameters: {'learning_rate': 0.0002431859556890333, 'discount_factor': 0.99, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.04086537348054936, 'steps_epsilon_decay': 10000, 'seed': 725151846}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:53:26,958][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:53:27,338][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:53:27,668][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:53:27,972][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:53:28,278][0m Param epsilon_start unique value le

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-85
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:12<00:00, 38.65it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:38<00:00, 25.92it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 10 operations to synchronize with Neptune. Do not kill this process.


All 10 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:54:30,610][0m Trial 32 finished with value: 192.7 and parameters: {'learning_rate': 0.0027649466547607832, 'discount_factor': 0.99, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.03325255013299335, 'steps_epsilon_decay': 10000, 'seed': 536751552}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:54:39,794][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:54:40,114][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:54:40,375][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:54:40,617][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:54:40,849][0m Param epsilon_start unique value le

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-86
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:10<00:00, 46.69it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:33<00:00, 29.52it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 12 operations to synchronize with Neptune. Do not kill this process.


All 12 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:55:35,902][0m Trial 33 finished with value: 164.743 and parameters: {'learning_rate': 0.009576718636345977, 'discount_factor': 0.99, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.06102531712945983, 'steps_epsilon_decay': 10000, 'seed': 933399286}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:55:44,961][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:55:45,282][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:55:45,541][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:55:45,783][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:55:46,006][0m Param epsilon_start unique value l

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-87
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:02<00:00, 175.08it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 280.82it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 72 operations to synchronize with Neptune. Do not kill this process.


All 72 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:56:06,169][0m Trial 34 finished with value: 9.39 and parameters: {'learning_rate': 0.03906562318165239, 'discount_factor': 0.99, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 128, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.11805516308287226, 'steps_epsilon_decay': 10000, 'seed': 703742235}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:56:19,886][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:56:20,306][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:56:20,645][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:56:20,956][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:56:21,248][0m Param epsilon_start unique value len

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-88
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:45<00:00, 10.97it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [01:01<00:00, 16.15it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 27 operations to synchronize with Neptune. Do not kill this process.


All 27 operations synced, thanks for waiting!


[32m[I 2022-02-15 21:58:23,790][0m Trial 35 finished with value: 140.105 and parameters: {'learning_rate': 0.0022393221133350924, 'discount_factor': 0.99, 'batch_size': 64, 'memory_size': 50000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.09415513569180767, 'steps_epsilon_decay': 10000, 'seed': 469519591}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 21:58:48,351][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:58:48,995][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:58:49,355][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 21:58:49,700][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 21:58:50,003][0m Param epsilon_start unique valu

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-89
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:30<00:00,  5.54it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:38<00:00, 26.29it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 19 operations to synchronize with Neptune. Do not kill this process.


All 19 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:01:12,772][0m Trial 36 finished with value: 207.017 and parameters: {'learning_rate': 8.38678054826598e-05, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 50000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.1408639168924345, 'steps_epsilon_decay': 100000, 'seed': 560250966}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:01:20,630][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:01:20,893][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:01:21,109][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:01:21,310][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:01:21,497][0m Param epsilon_start unique value

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-90
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:01<00:00, 324.18it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:01<00:00, 591.87it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 21 operations to synchronize with Neptune. Do not kill this process.


All 21 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:01:33,459][0m Trial 37 finished with value: 9.341 and parameters: {'learning_rate': 2.3867489698067538e-05, 'discount_factor': 0.99, 'batch_size': 16, 'memory_size': 10000, 'freq_steps_train': 256, 'freq_steps_update_target': 1000, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 1, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.024082361465325473, 'steps_epsilon_decay': 1000, 'seed': 9238404}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:01:41,410][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:01:41,687][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:01:41,920][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:01:42,133][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:01:42,326][0m Param epsilon_start unique value

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-91
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:07<00:00, 67.59it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:07<00:00, 127.07it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 27 operations to synchronize with Neptune. Do not kill this process.


All 27 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:02:06,659][0m Trial 38 finished with value: 43.921 and parameters: {'learning_rate': 0.05159014832677878, 'discount_factor': 0.9, 'batch_size': 64, 'memory_size': 100000, 'freq_steps_train': 128, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 1000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.07372785993130004, 'steps_epsilon_decay': 100000, 'seed': 98803646}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:02:14,684][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:02:14,966][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:02:15,194][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:02:15,407][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:02:15,601][0m Param epsilon_start unique valu

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-92
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:16<00:00,  6.54it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:18<00:00, 54.69it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 47 operations to synchronize with Neptune. Do not kill this process.


All 47 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:03:59,600][0m Trial 39 finished with value: 91.619 and parameters: {'learning_rate': 0.007445996457903688, 'discount_factor': 0.9, 'batch_size': 128, 'memory_size': 10000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 4, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 10, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.17063247500820308, 'steps_epsilon_decay': 10000, 'seed': 654021306}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:04:08,824][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:04:09,163][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:04:09,447][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:04:09,701][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:04:09,931][0m Param epsilon_start unique value 

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-93
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|█████████████████████████████████████████████████████████████████████████| 500/500 [00:01<00:00, 310.88it/s]
100%|███████████████████████████████████████████████████████████████████████| 1000/1000 [00:04<00:00, 209.92it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 38 operations to synchronize with Neptune. Do not kill this process.


All 38 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:04:26,426][0m Trial 40 finished with value: 20.522 and parameters: {'learning_rate': 0.00020868352884135572, 'discount_factor': 0.99, 'batch_size': 32, 'memory_size': 50000, 'freq_steps_train': 256, 'freq_steps_update_target': 1000, 'n_steps_warm_up_memory': 1000, 'n_gradient_steps': 1, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.0011129286649947737, 'steps_epsilon_decay': 100000, 'seed': 254969340}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:04:40,032][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:04:40,610][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:04:41,106][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:04:41,612][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:04:42,048][0m Param epsilon_start uniqu

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-94
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:59<00:00,  4.19it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:34<00:00, 28.93it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 28 operations to synchronize with Neptune. Do not kill this process.


All 28 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:07:51,096][0m Trial 41 finished with value: 141.091 and parameters: {'learning_rate': 0.0006101638234434191, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.044739745628764434, 'steps_epsilon_decay': 100000, 'seed': 361034218}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:08:00,281][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:08:00,610][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:08:00,910][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:08:01,166][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:08:01,404][0m Param epsilon_start unique v

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-95
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:13<00:00,  6.77it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:29<00:00, 33.62it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 46 operations to synchronize with Neptune. Do not kill this process.


All 46 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:09:55,969][0m Trial 42 finished with value: 146.043 and parameters: {'learning_rate': 0.00039715905844779284, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.022285897561266088, 'steps_epsilon_decay': 100000, 'seed': 397043711}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:10:05,588][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:10:05,937][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:10:06,219][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:10:06,486][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:10:06,741][0m Param epsilon_start unique 

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-96
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:57<00:00,  8.64it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:35<00:00, 28.27it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 43 operations to synchronize with Neptune. Do not kill this process.


All 43 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:11:50,457][0m Trial 43 finished with value: 96.21 and parameters: {'learning_rate': 0.0007397509335778098, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.05186038707043972, 'steps_epsilon_decay': 100000, 'seed': 995792998}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:12:11,140][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:12:11,660][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:12:12,082][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:12:12,482][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:12:12,844][0m Param epsilon_start unique valu

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-97
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [01:14<00:00,  6.67it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:49<00:00, 20.17it/s]

Shutting down background jobs, please wait a moment...
Done!



Waiting for the remaining 30 operations to synchronize with Neptune. Do not kill this process.


All 30 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:14:31,947][0m Trial 44 finished with value: 206.698 and parameters: {'learning_rate': 6.57083386465854e-05, 'discount_factor': 0.9, 'batch_size': 16, 'memory_size': 100000, 'freq_steps_train': 8, 'freq_steps_update_target': 100, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 100, 'normalize_state': False, 'epsilon_start': 0.9, 'epsilon_end': 0.06722847326406202, 'steps_epsilon_decay': 100000, 'seed': 223834774}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:14:45,989][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:14:46,659][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:14:47,175][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:14:47,697][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:14:48,170][0m Param epsilon_start unique va

https://app.neptune.ai/plabartabajo/parametric-q-learning-cart-pole/e/PAR-98
Remember to stop your run once you’ve finished logging your metadata (https://docs.neptune.ai/api-reference/run#.stop). It will be stopped automatically only when the notebook kernel/interactive console is terminated.
67,586 parameters


100%|██████████████████████████████████████████████████████████████████████████| 500/500 [00:09<00:00, 54.80it/s]
100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:10<00:00, 96.73it/s]

Shutting down background jobs, please wait a moment...





Done!


Waiting for the remaining 46 operations to synchronize with Neptune. Do not kill this process.


All 46 operations synced, thanks for waiting!


[32m[I 2022-02-15 22:15:31,067][0m Trial 45 finished with value: 13.083 and parameters: {'learning_rate': 0.00587318958300871, 'discount_factor': 0.95, 'batch_size': 64, 'memory_size': 100000, 'freq_steps_train': 128, 'freq_steps_update_target': 1000, 'n_steps_warm_up_memory': 5000, 'n_gradient_steps': 16, 'nn_hidden_layers': '[256, 256]', 'max_grad_norm': 10, 'normalize_state': True, 'epsilon_start': 0.9, 'epsilon_end': 0.03672878605311394, 'steps_epsilon_decay': 100000, 'seed': 445341479}. Best is trial 12 with value: 313.91.[0m
[33m[W 2022-02-15 22:15:47,497][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:15:48,112][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:15:48,554][0m Param epsilon_start unique value length is less than 2.[0m
[33m[W 2022-02-15 22:15:48,998][0m Param nn_hidden_layers unique value length is less than 2.[0m
[33m[W 2022-02-15 22:15:49,460][0m Param epsilon_start unique va

KeyboardInterrupt: 

In [10]:
run.stop()

Shutting down background jobs, please wait a moment...
Done!


Waiting for the remaining 14 operations to synchronize with Neptune. Do not kill this process.


All 14 operations synced, thanks for waiting!
