# Oil Environment Code Demo

This problem, adaptved from [here](https://www.pnas.org/content/109/3/764) is a continuous variant of the “Grid World” environment. It comprises of an agent surveying a d-dimensional map in search of hidden “oil deposits”. The world is endowed with an unknown survey function which encodes the probability of observing oil at that specific location. For agents to move to a new location they pay a cost proportional to the distance moved, and surveying the land produces noisy estimates of the true value of that location. In addition, due to varying terrain the true location the agent moves to is perturbed as a function of the state and action.


There is a $d$-dimensional reinforcement learning environment in the space $X = [0, 1]^d$.  The action space $A = [0,1]^d$ corresponding to the ability to attempt to move to any desired location within the state space.  On top of that, there is a corresponding reward function $f_h(x,a)$ for the reward for moving the agent to that location.  Moving also causes an additional cost $\alpha d(x,a)$ scaling with respect to the distance moved.

Here is an example of the oil discovery problem 

In this notebook we run a sample experiment for the setting when $d = 1$ and the reward function is taken to be a quadratic.  We compare several heuristics to existing reinforcement learning algorithms.

### Package Installation

In [19]:
import or_suite
import numpy as np

import copy

import os
from stable_baselines3.common.monitor import Monitor
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
import pandas as pd


import gym

### Experiment Parameters

Here we use the oil environment as outlined in `or_suite/envs/oil_discovery/oil_environment.py`.  The package has default specifications for all of the environments in the file `or_suite/envs/env_configs.py`, and so we use one the defaults.

In addition, we need to specify the number of episodes for learning, and the number of iterations (in order to plot average results with confidence intervals).

In [20]:
CONFIG =  or_suite.envs.env_configs.oil_environment_default_config

epLen = CONFIG['epLen']
nEps = 50
numIters = 2

epsilon = (nEps * epLen)**(-1 / 4)
action_net = np.arange(start=0, stop=1, step=epsilon)
state_net = np.arange(start=0, stop=1, step=epsilon)

scaling_list = [0.1, 0.3, 1, 5]

DEFAULT_SETTINGS = {'seed': 1, 
                    'recFreq': 1, 
                    'dirPath': '../data/oil/', 
                    'deBug': False, 
                    'nEps': nEps, 
                    'numIters': numIters, 
                    'saveTrajectory': True, 
                    'epLen' : 5,
                    'render': False,
                    'pickle': False
                    }



### Specifying Agent

We specify 6 different agents to compare the effectiveness of each.

* `SB PPO` is Proximal Policy Optimization. When policy is updated, there is a parameter that “clips” each policy update so that action update does not go too far
* `Random` implements the randomized RL algorithm, which selects an action uniformly at random from the action space. In particular, the algorithm stores an internal copy of the environment’s action space and samples uniformly at random from it.
* `AdaQL` is an Adaptive Discretization Model-Free Agent, implemented for enviroments with continuous states and actions using the metric induced by the l_inf norm.
* `AdaMB` is an Adaptive Discretizaiton Model-Based Agent, implemented for enviroments with continuous states and actions using the metric induced by the l_inf norm.
* `Unif QL` is an eNet Model-Based Agent, implemented for enviroments with continuous states and actions using the metric induces by the l_inf norm.
* `Unif MB` is a eNet Model-Free Agent, implemented for enviroments with continuous states and actions using the metric induces by the l_inf norm.

In [23]:
oil_env = gym.make('Oil-v0', config=CONFIG)
mon_env = Monitor(oil_env)
dim = CONFIG['dim']
cost_param = CONFIG['cost_param']
prob = CONFIG['oil_prob']

agents = { 'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
'Random': or_suite.agents.rl.random.randomAgent(),
'AdaQL': or_suite.agents.rl.ada_ql.AdaptiveDiscretizationQL(epLen, scaling_list[0], True, dim*2),
'AdaMB': or_suite.agents.rl.ada_mb.AdaptiveDiscretizationMB(epLen, scaling_list[0], 0, 2, True, True, dim, dim),
'Unif QL': or_suite.agents.rl.enet_ql.eNetQL(action_net, state_net, epLen, scaling_list[0], (dim,dim)),
'Unif MB': or_suite.agents.rl.enet_mb.eNetMB(action_net, state_net, epLen, scaling_list[0], (dim,dim), 0, False),
}

We recommend using a `batch_size` that is a multiple of `n_steps * n_envs`.
Info: (n_steps=5 and n_envs=1)


### Running Algorithm

In [24]:
path_list_line = []
algo_list_line = []
path_list_radar = []
algo_list_radar= []

for agent in agents:
    print(agent)
    DEFAULT_SETTINGS['dirPath'] = '../data/oil_metric_'+str(agent)+'_'+str(dim)+'_'+str(cost_param)+'_'+str(prob.__name__)+'/'
    if agent == 'SB PPO':
        or_suite.utils.run_single_sb_algo(mon_env, agents[agent], DEFAULT_SETTINGS)
    elif agent == 'AdaQL' or agent == 'Unif QL' or agent == 'AdaMB' or agent == 'Unif MB':
        or_suite.utils.run_single_algo_tune(oil_env, agents[agent], scaling_list, DEFAULT_SETTINGS)
    else:
        or_suite.utils.run_single_algo(oil_env, agents[agent], DEFAULT_SETTINGS)

    path_list_line.append('../data/oil_metric_'+str(agent)+'_'+str(dim)+'_'+str(cost_param)+'_'+str(prob.__name__))
    algo_list_line.append(str(agent))
    if agent != 'SB PPO':
        path_list_radar.append('../data/oil_metric_'+str(agent)+'_'+str(dim)+'_'+str(cost_param)+'_'+str(prob.__name__))
        algo_list_radar.append(str(agent))

fig_path = '../figures/'
fig_name = 'oil_metric'+'_'+str(dim)+'_'+str(cost_param)+'_'+str(prob.__name__)+'_line_plot'+'.pdf'
or_suite.plots.plot_line_plots(path_list_line, algo_list_line, fig_path, fig_name, int(nEps / 40)+1)

additional_metric = {}
fig_name = 'oil_metric'+'_'+str(dim)+'_'+str(cost_param)+'_'+str(prob.__name__)+'_radar_plot'+'.pdf'
or_suite.plots.plot_radar_plots(path_list_radar, algo_list_radar,
fig_path, fig_name,
additional_metric
)

SB PPO
New Experiment Run
Iteration: 0
Iteration: 1
[3.724073352353859, 3.277192355871049, 3.0170077085104774, 3.616607795859035, 4.321952176285109, 3.862295333499586, 3.71105293282902, 4.096212791997065, 3.0865772805106735, 5.0, 3.1743407714222234, 2.5152341901044637, 3.648680062153219, 2.9370053265445293, 4.333806080489662, 5.0, 3.601817322045906, 3.7034196232366794, 4.246525126945727, 2.9235048752929322, 3.7357588823428847, 3.1123353322604568, 3.0275922007479616, 3.7357588823428847, 3.8537478455985426, 3.065924264880546, 3.2070980711453783, 3.9892367417912857, 4.60716227535491, 3.7077338111062668, 4.279270616763453, 3.193327939287183, 4.3071012335381305, 5.0, 4.797579265703985, 5.0, 4.9138835452560325, 3.803335756227189, 4.994924321114546, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.752929232341586, 5.0, 5.0, 4.220734244555915, 5.0, 4.908306063821534, 4.367879441171443, 5.0, 3.9380751601602184, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5

{'iter': 0, 'episode': 36, 'step': 0, 'oldState': array([0]), 'action': array([0.15045229], dtype=float32), 'reward': 0.8603187782739846, 'newState': array([0.15045229]), 'info': {}}
{'iter': 0, 'episode': 36, 'step': 1, 'oldState': array([0.15045229]), 'action': array([0.8603503], dtype=float32), 'reward': 0.49169433553396547, 'newState': array([0.86035031]), 'info': {}}
{'iter': 0, 'episode': 36, 'step': 2, 'oldState': array([0.86035031]), 'action': array([0.7813542], dtype=float32), 'reward': 0.9240435079866126, 'newState': array([0.78135419]), 'info': {}}
{'iter': 0, 'episode': 36, 'step': 3, 'oldState': array([0.78135419]), 'action': array([0.20657972], dtype=float32), 'reward': 0.562831787749567, 'newState': array([0.20657972]), 'info': {}}
{'iter': 0, 'episode': 36, 'step': 4, 'oldState': array([0.20657972]), 'action': array([0.6092777], dtype=float32), 'reward': 0.6685139532339903, 'newState': array([0.60927773]), 'info': {}}
{'iter': 0, 'episode': 37, 'step': 0, 'oldState': ar

{'iter': 1, 'episode': 21, 'step': 4, 'oldState': array([0.27880442]), 'action': array([0.30338508], dtype=float32), 'reward': 0.9757189867077378, 'newState': array([0.30338508]), 'info': {}}
{'iter': 1, 'episode': 22, 'step': 0, 'oldState': array([0]), 'action': array([0.08264601], dtype=float32), 'reward': 0.9206769978703605, 'newState': array([0.08264601]), 'info': {}}
{'iter': 1, 'episode': 22, 'step': 1, 'oldState': array([0.08264601]), 'action': array([0.8128407], dtype=float32), 'reward': 0.48181517740063534, 'newState': array([0.8128407]), 'info': {}}
{'iter': 1, 'episode': 22, 'step': 2, 'oldState': array([0.8128407]), 'action': array([0.73127705], dtype=float32), 'reward': 0.9216740419945711, 'newState': array([0.73127705]), 'info': {}}
{'iter': 1, 'episode': 22, 'step': 3, 'oldState': array([0.73127705]), 'action': array([0.3610246], dtype=float32), 'reward': 0.6905599696920848, 'newState': array([0.36102459]), 'info': {}}
{'iter': 1, 'episode': 22, 'step': 4, 'oldState': ar

{'iter': 0, 'episode': 6, 'step': 4, 'oldState': array([0.86299902]), 'action': array([0.94165305]), 'reward': 0.9243596505940529, 'newState': array([0.94165307]), 'info': {}}
{'iter': 0, 'episode': 7, 'step': 0, 'oldState': array([0]), 'action': array([0.08722459]), 'reward': 0.916471244259377, 'newState': array([0.08722459]), 'info': {}}
{'iter': 0, 'episode': 7, 'step': 1, 'oldState': array([0.08722459]), 'action': array([0.06748197]), 'reward': 0.9804509937019296, 'newState': array([0.06748197]), 'info': {}}
{'iter': 0, 'episode': 7, 'step': 2, 'oldState': array([0.06748197]), 'action': array([0.48242002]), 'reward': 0.6603811867645892, 'newState': array([0.48242003]), 'info': {}}
{'iter': 0, 'episode': 7, 'step': 3, 'oldState': array([0.48242003]), 'action': array([0.83172075]), 'reward': 0.7051810207205578, 'newState': array([0.83172077]), 'info': {}}
{'iter': 0, 'episode': 7, 'step': 4, 'oldState': array([0.83172077]), 'action': array([0.97474463]), 'reward': 0.86673339832841, '

{'iter': 0, 'episode': 38, 'step': 0, 'oldState': array([0]), 'action': array([0.36310121]), 'reward': 0.6955160338580287, 'newState': array([0.36310121]), 'info': {}}
{'iter': 0, 'episode': 38, 'step': 1, 'oldState': array([0.36310121]), 'action': array([0.86850098]), 'reward': 0.6032643461134486, 'newState': array([0.86850101]), 'info': {}}
{'iter': 0, 'episode': 38, 'step': 2, 'oldState': array([0.86850101]), 'action': array([0.45079084]), 'reward': 0.6585530592919884, 'newState': array([0.45079082]), 'info': {}}
{'iter': 0, 'episode': 38, 'step': 3, 'oldState': array([0.45079082]), 'action': array([0.37963263]), 'reward': 0.9313145460437147, 'newState': array([0.37963262]), 'info': {}}
{'iter': 0, 'episode': 38, 'step': 4, 'oldState': array([0.37963262]), 'action': array([0.34673429]), 'reward': 0.967636927801415, 'newState': array([0.34673429]), 'info': {}}
{'iter': 0, 'episode': 39, 'step': 0, 'oldState': array([0]), 'action': array([0.00129553]), 'reward': 0.9987053061581984, 'n

{'iter': 1, 'episode': 19, 'step': 2, 'oldState': array([0.58256924]), 'action': array([0.7020695]), 'reward': 0.8873637598659999, 'newState': array([0.70206952]), 'info': {}}
{'iter': 1, 'episode': 19, 'step': 3, 'oldState': array([0.70206952]), 'action': array([0.5738222]), 'reward': 0.879635800140242, 'newState': array([0.5738222]), 'info': {}}
{'iter': 1, 'episode': 19, 'step': 4, 'oldState': array([0.5738222]), 'action': array([0.9638391]), 'reward': 0.6770454232517839, 'newState': array([0.96383911]), 'info': {}}
{'iter': 1, 'episode': 20, 'step': 0, 'oldState': array([0]), 'action': array([0.1110483]), 'reward': 0.8948955212593614, 'newState': array([0.1110483]), 'info': {}}
{'iter': 1, 'episode': 20, 'step': 1, 'oldState': array([0.1110483]), 'action': array([0.47455126]), 'reward': 0.6952366714491588, 'newState': array([0.47455126]), 'info': {}}
{'iter': 1, 'episode': 20, 'step': 2, 'oldState': array([0.47455126]), 'action': array([0.1115293]), 'reward': 0.695571162140275, 'ne

{'iter': 0, 'episode': 0, 'step': 0, 'oldState': array([0]), 'action': array([0.417022]), 'reward': 0.6590064283504757, 'newState': array([0.41702199]), 'info': {}}
{'iter': 0, 'episode': 0, 'step': 1, 'oldState': array([0.41702199]), 'action': array([0.09233859]), 'reward': 0.7227561434570038, 'newState': array([0.09233859]), 'info': {}}
{'iter': 0, 'episode': 0, 'step': 2, 'oldState': array([0.09233859]), 'action': array([0.18626021]), 'reward': 0.9103541160281918, 'newState': array([0.18626021]), 'info': {}}
{'iter': 0, 'episode': 0, 'step': 3, 'oldState': array([0.18626021]), 'action': array([0.53881673]), 'reward': 0.7028888262549421, 'newState': array([0.53881675]), 'info': {}}
{'iter': 0, 'episode': 0, 'step': 4, 'oldState': array([0.53881675]), 'action': array([0.41919451]), 'reward': 0.8872555514747358, 'newState': array([0.41919452]), 'info': {}}
{'iter': 0, 'episode': 1, 'step': 0, 'oldState': array([0]), 'action': array([0.93905872]), 'reward': 0.3909956982147831, 'newState

{'iter': 0, 'episode': 30, 'step': 0, 'oldState': array([0]), 'action': array([0.44632401]), 'reward': 0.6399763729188606, 'newState': array([0.44632402]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 1, 'oldState': array([0.44632402]), 'action': array([0.2374515]), 'reward': 0.8114986733149634, 'newState': array([0.23745149]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 2, 'oldState': array([0.23745149]), 'action': array([0.45660175]), 'reward': 0.8032010319962034, 'newState': array([0.45660174]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 3, 'oldState': array([0.45660174]), 'action': array([0.4025602]), 'reward': 0.9473927560464627, 'newState': array([0.4025602]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 4, 'oldState': array([0.4025602]), 'action': array([0.44990382]), 'reward': 0.9537596182989309, 'newState': array([0.44990382]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 0, 'oldState': array([0]), 'action': array([0.31426732]), 'reward': 0.7303237592610902, 'newS

{'iter': 1, 'episode': 10, 'step': 1, 'oldState': array([0.00077821]), 'action': array([0.61965785]), 'reward': 0.5385474561974614, 'newState': array([0.61965787]), 'info': {}}
{'iter': 1, 'episode': 10, 'step': 2, 'oldState': array([0.61965787]), 'action': array([0.56038359]), 'reward': 0.9424482614034615, 'newState': array([0.56038362]), 'info': {}}
{'iter': 1, 'episode': 10, 'step': 3, 'oldState': array([0.56038362]), 'action': array([0.44151425]), 'reward': 0.8879237877653947, 'newState': array([0.44151425]), 'info': {}}
{'iter': 1, 'episode': 10, 'step': 4, 'oldState': array([0.44151425]), 'action': array([0.27158322]), 'reward': 0.8437230109637041, 'newState': array([0.27158323]), 'info': {}}
{'iter': 1, 'episode': 11, 'step': 0, 'oldState': array([0]), 'action': array([0.07253612]), 'reward': 0.9300321548466914, 'newState': array([0.07253612]), 'info': {}}
{'iter': 1, 'episode': 11, 'step': 1, 'oldState': array([0.07253612]), 'action': array([0.15346777]), 'reward': 0.9222567199

{'iter': 1, 'episode': 40, 'step': 0, 'oldState': array([0]), 'action': array([0.23413352]), 'reward': 0.7912561668224352, 'newState': array([0.23413351]), 'info': {}}
{'iter': 1, 'episode': 40, 'step': 1, 'oldState': array([0.23413351]), 'action': array([0.23794392]), 'reward': 0.9961968446528253, 'newState': array([0.23794392]), 'info': {}}
{'iter': 1, 'episode': 40, 'step': 2, 'oldState': array([0.23794392]), 'action': array([0.2627281]), 'reward': 0.9755204285273961, 'newState': array([0.2627281]), 'info': {}}
{'iter': 1, 'episode': 40, 'step': 3, 'oldState': array([0.2627281]), 'action': array([0.270406]), 'reward': 0.9923514871683535, 'newState': array([0.27040601]), 'info': {}}
{'iter': 1, 'episode': 40, 'step': 4, 'oldState': array([0.27040601]), 'action': array([0.11827799]), 'reward': 0.8588783214410208, 'newState': array([0.11827799]), 'info': {}}
{'iter': 1, 'episode': 41, 'step': 0, 'oldState': array([0]), 'action': array([0.10517777]), 'reward': 0.9001644883042386, 'newSt

{'iter': 0, 'episode': 17, 'step': 0, 'oldState': array([0]), 'action': array([0.23725408]), 'reward': 0.7887908440864233, 'newState': array([0.23725408]), 'info': {}}
{'iter': 0, 'episode': 17, 'step': 1, 'oldState': array([0.23725408]), 'action': array([0.23754403]), 'reward': 0.9997100952359996, 'newState': array([0.23754403]), 'info': {}}
{'iter': 0, 'episode': 17, 'step': 2, 'oldState': array([0.23754403]), 'action': array([0.66039155]), 'reward': 0.6551785145469405, 'newState': array([0.66039157]), 'info': {}}
{'iter': 0, 'episode': 17, 'step': 3, 'oldState': array([0.66039157]), 'action': array([0.59750193]), 'reward': 0.9390471053755999, 'newState': array([0.59750193]), 'info': {}}
{'iter': 0, 'episode': 17, 'step': 4, 'oldState': array([0.59750193]), 'action': array([0.63738698]), 'reward': 0.9608998939176207, 'newState': array([0.63738698]), 'info': {}}
{'iter': 0, 'episode': 18, 'step': 0, 'oldState': array([0]), 'action': array([0.96309071]), 'reward': 0.38171129903628637, 

{'iter': 0, 'episode': 47, 'step': 1, 'oldState': array([0.38097402]), 'action': array([0.18872286]), 'reward': 0.8250996098358697, 'newState': array([0.18872286]), 'info': {}}
{'iter': 0, 'episode': 47, 'step': 2, 'oldState': array([0.18872286]), 'action': array([0.07812534]), 'reward': 0.8952990154170721, 'newState': array([0.07812534]), 'info': {}}
{'iter': 0, 'episode': 47, 'step': 3, 'oldState': array([0.07812534]), 'action': array([0.12612979]), 'reward': 0.9531295471533969, 'newState': array([0.12612979]), 'info': {}}
{'iter': 0, 'episode': 47, 'step': 4, 'oldState': array([0.12612979]), 'action': array([0.03258395]), 'reward': 0.9106962641043433, 'newState': array([0.03258394]), 'info': {}}
{'iter': 0, 'episode': 48, 'step': 0, 'oldState': array([0]), 'action': array([0.12786439]), 'reward': 0.8799727036138049, 'newState': array([0.12786439]), 'info': {}}
{'iter': 0, 'episode': 48, 'step': 1, 'oldState': array([0.12786439]), 'action': array([0.12657547]), 'reward': 0.9987119096

{'iter': 1, 'episode': 31, 'step': 4, 'oldState': array([0.4714995]), 'action': array([0.63444601]), 'reward': 0.8496366238666533, 'newState': array([0.63444602]), 'info': {}}
{'iter': 1, 'episode': 32, 'step': 0, 'oldState': array([0]), 'action': array([0.0090015]), 'reward': 0.9910388876250686, 'newState': array([0.0090015]), 'info': {}}
{'iter': 1, 'episode': 32, 'step': 1, 'oldState': array([0.0090015]), 'action': array([0.07278849]), 'reward': 0.9382048348601709, 'newState': array([0.07278848]), 'info': {}}
{'iter': 1, 'episode': 32, 'step': 2, 'oldState': array([0.07278848]), 'action': array([0.62587738]), 'reward': 0.5751704220201115, 'newState': array([0.62587738]), 'info': {}}
{'iter': 1, 'episode': 32, 'step': 3, 'oldState': array([0.62587738]), 'action': array([0.69427456]), 'reward': 0.933889492303508, 'newState': array([0.69427454]), 'info': {}}
{'iter': 1, 'episode': 32, 'step': 4, 'oldState': array([0.69427454]), 'action': array([0.53590175]), 'reward': 0.853531509517625

{'iter': 0, 'episode': 12, 'step': 4, 'oldState': array([0.10164662]), 'action': array([0.62616287]), 'reward': 0.591841586922466, 'newState': array([0.62616289]), 'info': {}}
{'iter': 0, 'episode': 13, 'step': 0, 'oldState': array([0]), 'action': array([0.14533973]), 'reward': 0.8647284712471563, 'newState': array([0.14533973]), 'info': {}}
{'iter': 0, 'episode': 13, 'step': 1, 'oldState': array([0.14533973]), 'action': array([0.492505]), 'reward': 0.7066885335651529, 'newState': array([0.49250498]), 'info': {}}
{'iter': 0, 'episode': 13, 'step': 2, 'oldState': array([0.49250498]), 'action': array([0.37344243]), 'reward': 0.8877522764101706, 'newState': array([0.37344244]), 'info': {}}
{'iter': 0, 'episode': 13, 'step': 3, 'oldState': array([0.37344244]), 'action': array([0.40498893]), 'reward': 0.9689459250958993, 'newState': array([0.40498891]), 'info': {}}
{'iter': 0, 'episode': 13, 'step': 4, 'oldState': array([0.40498891]), 'action': array([0.25464405]), 'reward': 0.8604111879859

{'iter': 0, 'episode': 43, 'step': 0, 'oldState': array([0]), 'action': array([0.02841372]), 'reward': 0.971986156697719, 'newState': array([0.02841372]), 'info': {}}
{'iter': 0, 'episode': 43, 'step': 1, 'oldState': array([0.02841372]), 'action': array([0.07859415]), 'reward': 0.9510578011882725, 'newState': array([0.07859416]), 'info': {}}
{'iter': 0, 'episode': 43, 'step': 2, 'oldState': array([0.07859416]), 'action': array([0.1033966]), 'reward': 0.975502607079535, 'newState': array([0.1033966]), 'info': {}}
{'iter': 0, 'episode': 43, 'step': 3, 'oldState': array([0.1033966]), 'action': array([0.95408788]), 'reward': 0.4271195828463949, 'newState': array([0.95408785]), 'info': {}}
{'iter': 0, 'episode': 43, 'step': 4, 'oldState': array([0.95408785]), 'action': array([0.31008891]), 'reward': 0.5251880181365182, 'newState': array([0.3100889]), 'info': {}}
{'iter': 0, 'episode': 44, 'step': 0, 'oldState': array([0]), 'action': array([0.64355477]), 'reward': 0.5254213603747242, 'newSta

{'iter': 1, 'episode': 25, 'step': 2, 'oldState': array([0.61764699]), 'action': array([0.65978171]), 'reward': 0.9587406273969068, 'newState': array([0.65978169]), 'info': {}}
{'iter': 1, 'episode': 25, 'step': 3, 'oldState': array([0.65978169]), 'action': array([0.39141553]), 'reward': 0.7646277552629721, 'newState': array([0.39141554]), 'info': {}}
{'iter': 1, 'episode': 25, 'step': 4, 'oldState': array([0.39141554]), 'action': array([0.89981744]), 'reward': 0.6014559797718533, 'newState': array([0.89981747]), 'info': {}}
{'iter': 1, 'episode': 26, 'step': 0, 'oldState': array([0]), 'action': array([0.46249247]), 'reward': 0.6297121517621547, 'newState': array([0.46249247]), 'info': {}}
{'iter': 1, 'episode': 26, 'step': 1, 'oldState': array([0.46249247]), 'action': array([0.73117605]), 'reward': 0.7643851049504207, 'newState': array([0.73117602]), 'info': {}}
{'iter': 1, 'episode': 26, 'step': 2, 'oldState': array([0.73117602]), 'action': array([0.266185]), 'reward': 0.628140737653

{'iter': 0, 'episode': 3, 'step': 0, 'oldState': array([0]), 'action': array([0.41731284]), 'reward': 0.6588147902208036, 'newState': array([0.41731283]), 'info': {}}
{'iter': 0, 'episode': 3, 'step': 1, 'oldState': array([0.41731283]), 'action': array([0.00914414]), 'reward': 0.6648667115722334, 'newState': array([0.00914414]), 'info': {}}
{'iter': 0, 'episode': 3, 'step': 2, 'oldState': array([0.00914414]), 'action': array([0.89463966]), 'reward': 0.4125097107369324, 'newState': array([0.89463967]), 'info': {}}
{'iter': 0, 'episode': 3, 'step': 3, 'oldState': array([0.89463967]), 'action': array([0.051613]), 'reward': 0.43040585447026825, 'newState': array([0.051613]), 'info': {}}
{'iter': 0, 'episode': 3, 'step': 4, 'oldState': array([0.051613]), 'action': array([0.64680707]), 'reward': 0.5514555336623543, 'newState': array([0.64680707]), 'info': {}}
{'iter': 0, 'episode': 4, 'step': 0, 'oldState': array([0]), 'action': array([0.14388767]), 'reward': 0.8659850196906295, 'newState': 

{'iter': 0, 'episode': 28, 'step': 3, 'oldState': array([0.06200839]), 'action': array([0.40417127]), 'reward': 0.7102325198099988, 'newState': array([0.40417126]), 'info': {}}
{'iter': 0, 'episode': 28, 'step': 4, 'oldState': array([0.40417126]), 'action': array([0.40420827]), 'reward': 0.9999629862006221, 'newState': array([0.40420827]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 0, 'oldState': array([0]), 'action': array([0.18464262]), 'reward': 0.8314013504676673, 'newState': array([0.18464263]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 1, 'oldState': array([0.18464263]), 'action': array([0.03816791]), 'reward': 0.8637475716376622, 'newState': array([0.03816791]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 2, 'oldState': array([0.03816791]), 'action': array([0.16353434]), 'reward': 0.8821735878405337, 'newState': array([0.16353434]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 3, 'oldState': array([0.16353434]), 'action': array([0.28613639]), 'reward': 0.8846156292

{'iter': 1, 'episode': 4, 'step': 0, 'oldState': array([0]), 'action': array([0.37715298]), 'reward': 0.6858111509101966, 'newState': array([0.37715298]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 1, 'oldState': array([0.37715298]), 'action': array([0.37446419]), 'reward': 0.9973148162443665, 'newState': array([0.37446418]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 2, 'oldState': array([0.37446418]), 'action': array([0.31910936]), 'reward': 0.946149362795791, 'newState': array([0.31910935]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 3, 'oldState': array([0.31910935]), 'action': array([0.86580324]), 'reward': 0.578860425786448, 'newState': array([0.86580324]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 4, 'oldState': array([0.86580324]), 'action': array([0.97265422]), 'reward': 0.8986595797245402, 'newState': array([0.97265422]), 'info': {}}
{'iter': 1, 'episode': 5, 'step': 0, 'oldState': array([0]), 'action': array([0.26403526]), 'reward': 0.7679464654442492, 'newState

{'iter': 1, 'episode': 27, 'step': 0, 'oldState': array([0]), 'action': array([0.45855702]), 'reward': 0.6321952392381388, 'newState': array([0.45855701]), 'info': {}}
{'iter': 1, 'episode': 27, 'step': 1, 'oldState': array([0.45855701]), 'action': array([0.42232257]), 'reward': 0.964414171244391, 'newState': array([0.42232257]), 'info': {}}
{'iter': 1, 'episode': 27, 'step': 2, 'oldState': array([0.42232257]), 'action': array([0.41754741]), 'reward': 0.9952362164703186, 'newState': array([0.4175474]), 'info': {}}
{'iter': 1, 'episode': 27, 'step': 3, 'oldState': array([0.4175474]), 'action': array([0.47441265]), 'reward': 0.9447213666190293, 'newState': array([0.47441265]), 'info': {}}
{'iter': 1, 'episode': 27, 'step': 4, 'oldState': array([0.47441265]), 'action': array([0.37856136]), 'reward': 0.9085991127615488, 'newState': array([0.37856135]), 'info': {}}
{'iter': 1, 'episode': 28, 'step': 0, 'oldState': array([0]), 'action': array([0.455529]), 'reward': 0.6341124309806758, 'newSt

{'iter': 0, 'episode': 1, 'step': 2, 'oldState': array([0.51369381]), 'action': array([0.77934491]), 'reward': 0.766706579527823, 'newState': array([0.77934492]), 'info': {}}
{'iter': 0, 'episode': 1, 'step': 3, 'oldState': array([0.77934492]), 'action': array([0.57019347]), 'reward': 0.8112723617204409, 'newState': array([0.57019347]), 'info': {}}
{'iter': 0, 'episode': 1, 'step': 4, 'oldState': array([0.57019347]), 'action': array([0.98413079]), 'reward': 0.6610423773310079, 'newState': array([0.9841308]), 'info': {}}
{'iter': 0, 'episode': 2, 'step': 0, 'oldState': array([0]), 'action': array([0.65671209]), 'reward': 0.5185534790338584, 'newState': array([0.65671211]), 'info': {}}
{'iter': 0, 'episode': 2, 'step': 1, 'oldState': array([0.65671211]), 'action': array([0.44730333]), 'reward': 0.8110636132745311, 'newState': array([0.44730332]), 'info': {}}
{'iter': 0, 'episode': 2, 'step': 2, 'oldState': array([0.44730332]), 'action': array([0.54252211]), 'reward': 0.9091739786545702, 

{'iter': 0, 'episode': 28, 'step': 0, 'oldState': array([0]), 'action': array([0.02412034]), 'reward': 0.9761682286653816, 'newState': array([0.02412034]), 'info': {}}
{'iter': 0, 'episode': 28, 'step': 1, 'oldState': array([0.02412034]), 'action': array([0.1510243]), 'reward': 0.8808182536938218, 'newState': array([0.15102431]), 'info': {}}
{'iter': 0, 'episode': 28, 'step': 2, 'oldState': array([0.15102431]), 'action': array([0.15600419]), 'reward': 0.9950325003745095, 'newState': array([0.15600419]), 'info': {}}
{'iter': 0, 'episode': 28, 'step': 3, 'oldState': array([0.15600419]), 'action': array([0.15417127]), 'reward': 0.9981687614344879, 'newState': array([0.15417127]), 'info': {}}
{'iter': 0, 'episode': 28, 'step': 4, 'oldState': array([0.15417127]), 'action': array([0.05841653]), 'reward': 0.9086868542809013, 'newState': array([0.05841653]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 0, 'oldState': array([0]), 'action': array([0.05964262]), 'reward': 0.9421011576697077, 'n

{'iter': 1, 'episode': 3, 'step': 1, 'oldState': array([0.23901565]), 'action': array([0.06310505]), 'reward': 0.8386929595305537, 'newState': array([0.06310505]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 2, 'oldState': array([0.06310505]), 'action': array([0.70210095]), 'reward': 0.5278221556693561, 'newState': array([0.70210093]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 3, 'oldState': array([0.70210093]), 'action': array([0.57337574]), 'reward': 0.8792155662010818, 'newState': array([0.57337576]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 4, 'oldState': array([0.57337576]), 'action': array([0.82194782]), 'reward': 0.7799136648416428, 'newState': array([0.82194781]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 0, 'oldState': array([0]), 'action': array([0.37715298]), 'reward': 0.6858111509101966, 'newState': array([0.37715298]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 1, 'oldState': array([0.37715298]), 'action': array([0.37446419]), 'reward': 0.9973148162443665

{'iter': 1, 'episode': 30, 'step': 2, 'oldState': array([0.37261546]), 'action': array([0.35936282]), 'reward': 0.9868347839275351, 'newState': array([0.35936281]), 'info': {}}
{'iter': 1, 'episode': 30, 'step': 3, 'oldState': array([0.35936281]), 'action': array([0.34718359]), 'reward': 0.9878946409025255, 'newState': array([0.34718359]), 'info': {}}
{'iter': 1, 'episode': 30, 'step': 4, 'oldState': array([0.34718359]), 'action': array([0.31185539]), 'reward': 0.9652885474855881, 'newState': array([0.31185538]), 'info': {}}
{'iter': 1, 'episode': 31, 'step': 0, 'oldState': array([0]), 'action': array([0.29617499]), 'reward': 0.7436572743393948, 'newState': array([0.296175]), 'info': {}}
{'iter': 1, 'episode': 31, 'step': 1, 'oldState': array([0.296175]), 'action': array([0.26777766]), 'reward': 0.9720020635760032, 'newState': array([0.26777765]), 'info': {}}
{'iter': 1, 'episode': 31, 'step': 2, 'oldState': array([0.26777765]), 'action': array([0.30610985]), 'reward': 0.96239318645518

{'iter': 0, 'episode': 4, 'step': 4, 'oldState': array([0.07336429]), 'action': array([0.29465277]), 'reward': 0.8014854404988285, 'newState': array([0.29465276]), 'info': {}}
{'iter': 0, 'episode': 5, 'step': 0, 'oldState': array([0]), 'action': array([0.853514]), 'reward': 0.4259156250151621, 'newState': array([0.85351402]), 'info': {}}
{'iter': 0, 'episode': 5, 'step': 1, 'oldState': array([0.85351402]), 'action': array([0.92360004]), 'reward': 0.9323136352945294, 'newState': array([0.92360002]), 'info': {}}
{'iter': 0, 'episode': 5, 'step': 2, 'oldState': array([0.92360002]), 'action': array([0.2679482]), 'reward': 0.5191035965261697, 'newState': array([0.26794821]), 'info': {}}
{'iter': 0, 'episode': 5, 'step': 3, 'oldState': array([0.26794821]), 'action': array([0.33189732]), 'reward': 0.9380527375797891, 'newState': array([0.33189732]), 'info': {}}
{'iter': 0, 'episode': 5, 'step': 4, 'oldState': array([0.33189732]), 'action': array([0.29327752]), 'reward': 0.9621164492739844, '

{'iter': 0, 'episode': 29, 'step': 0, 'oldState': array([0]), 'action': array([0.93464262]), 'reward': 0.39272619571671247, 'newState': array([0.93464261]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 1, 'oldState': array([0.93464261]), 'action': array([0.28816791]), 'reward': 0.5238893981120568, 'newState': array([0.28816792]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 2, 'oldState': array([0.28816792]), 'action': array([0.16353434]), 'reward': 0.8828203259565373, 'newState': array([0.16353434]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 3, 'oldState': array([0.16353434]), 'action': array([0.03613639]), 'reward': 0.8803832471515621, 'newState': array([0.03613639]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 4, 'oldState': array([0.03613639]), 'action': array([0.12983796]), 'reward': 0.9105544539561093, 'newState': array([0.12983796]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 0, 'oldState': array([0]), 'action': array([0.973162]), 'reward': 0.37788627235758326, '

{'iter': 1, 'episode': 1, 'step': 1, 'oldState': array([0.86662251]), 'action': array([0.87223658]), 'reward': 0.9944016870045425, 'newState': array([0.87223655]), 'info': {}}
{'iter': 1, 'episode': 1, 'step': 2, 'oldState': array([0.87223655]), 'action': array([0.59947396]), 'reward': 0.7612734950883799, 'newState': array([0.59947395]), 'info': {}}
{'iter': 1, 'episode': 1, 'step': 3, 'oldState': array([0.59947395]), 'action': array([0.57125917]), 'reward': 0.9721795082738379, 'newState': array([0.57125914]), 'info': {}}
{'iter': 1, 'episode': 1, 'step': 4, 'oldState': array([0.57125914]), 'action': array([0.55546018]), 'reward': 0.9843251633496082, 'newState': array([0.55546016]), 'info': {}}
{'iter': 1, 'episode': 2, 'step': 0, 'oldState': array([0]), 'action': array([0.83728201]), 'reward': 0.4328855078596726, 'newState': array([0.837282]), 'info': {}}
{'iter': 1, 'episode': 2, 'step': 1, 'oldState': array([0.837282]), 'action': array([0.95866678]), 'reward': 0.8856930797749683, 'n

{'iter': 1, 'episode': 25, 'step': 3, 'oldState': array([0.65978169]), 'action': array([0.89141553]), 'reward': 0.793236520042402, 'newState': array([0.89141554]), 'info': {}}
{'iter': 1, 'episode': 25, 'step': 4, 'oldState': array([0.89141554]), 'action': array([0.64981744]), 'reward': 0.785371778862677, 'newState': array([0.64981747]), 'info': {}}
{'iter': 1, 'episode': 26, 'step': 0, 'oldState': array([0]), 'action': array([0.98124624]), 'reward': 0.37484366510600375, 'newState': array([0.98124623]), 'info': {}}
{'iter': 1, 'episode': 26, 'step': 1, 'oldState': array([0.98124623]), 'action': array([0.86558802]), 'reward': 0.890779618702374, 'newState': array([0.86558801]), 'info': {}}
{'iter': 1, 'episode': 26, 'step': 2, 'oldState': array([0.86558801]), 'action': array([0.266185]), 'reward': 0.5491393614354738, 'newState': array([0.26618499]), 'info': {}}
{'iter': 1, 'episode': 26, 'step': 3, 'oldState': array([0.26618499]), 'action': array([0.02687722]), 'reward': 0.78717257951791

{'iter': 0, 'episode': 2, 'step': 0, 'oldState': array([0]), 'action': array([0.65671209]), 'reward': 0.5185534790338584, 'newState': array([0.65671211]), 'info': {}}
{'iter': 0, 'episode': 2, 'step': 1, 'oldState': array([0.65671211]), 'action': array([0.94730333]), 'reward': 0.7478212950373243, 'newState': array([0.94730335]), 'info': {}}
{'iter': 0, 'episode': 2, 'step': 2, 'oldState': array([0.94730335]), 'action': array([0.54252211]), 'reward': 0.6671227464452715, 'newState': array([0.54252213]), 'info': {}}
{'iter': 0, 'episode': 2, 'step': 3, 'oldState': array([0.54252213]), 'action': array([0.76658264]), 'reward': 0.7992667489478056, 'newState': array([0.76658267]), 'info': {}}
{'iter': 0, 'episode': 2, 'step': 4, 'oldState': array([0.76658267]), 'action': array([0.34593856]), 'reward': 0.656623749279796, 'newState': array([0.34593856]), 'info': {}}
{'iter': 0, 'episode': 3, 'step': 0, 'oldState': array([0]), 'action': array([0.91731284]), 'reward': 0.39959135743231355, 'newSta

{'iter': 0, 'episode': 28, 'step': 3, 'oldState': array([0.90600419]), 'action': array([0.65417127]), 'reward': 0.777374624626378, 'newState': array([0.65417129]), 'info': {}}
{'iter': 0, 'episode': 28, 'step': 4, 'oldState': array([0.65417129]), 'action': array([0.80841653]), 'reward': 0.857061794504528, 'newState': array([0.80841655]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 0, 'oldState': array([0]), 'action': array([0.93464262]), 'reward': 0.39272619571671247, 'newState': array([0.93464261]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 1, 'oldState': array([0.93464261]), 'action': array([0.89408396]), 'reward': 0.9602528572930541, 'newState': array([0.89408398]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 2, 'oldState': array([0.89408398]), 'action': array([0.95676717]), 'reward': 0.9392410109083085, 'newState': array([0.95676714]), 'info': {}}
{'iter': 0, 'episode': 29, 'step': 3, 'oldState': array([0.95676714]), 'action': array([0.07227277]), 'reward': 0.41292290600

{'iter': 1, 'episode': 2, 'step': 4, 'oldState': array([0.917427]), 'action': array([0.21765298]), 'reward': 0.49669753140619305, 'newState': array([0.21765298]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 0, 'oldState': array([0]), 'action': array([0.73901565]), 'reward': 0.4775837991636264, 'newState': array([0.73901564]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 1, 'oldState': array([0.73901564]), 'action': array([0.56310505]), 'reward': 0.8386929657793032, 'newState': array([0.56310505]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 2, 'oldState': array([0.56310505]), 'action': array([0.70210095]), 'reward': 0.8702316087151308, 'newState': array([0.70210093]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 3, 'oldState': array([0.70210093]), 'action': array([0.57337574]), 'reward': 0.8792155662010818, 'newState': array([0.57337576]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 4, 'oldState': array([0.57337576]), 'action': array([0.82194782]), 'reward': 0.7799136648416428,

{'iter': 1, 'episode': 28, 'step': 0, 'oldState': array([0]), 'action': array([0.955529]), 'reward': 0.3846086425569216, 'newState': array([0.95552897]), 'info': {}}
{'iter': 1, 'episode': 28, 'step': 1, 'oldState': array([0.95552897]), 'action': array([0.92923542]), 'reward': 0.974049090423984, 'newState': array([0.9292354]), 'info': {}}
{'iter': 1, 'episode': 28, 'step': 2, 'oldState': array([0.9292354]), 'action': array([0.92280843]), 'reward': 0.9935936188934845, 'newState': array([0.92280841]), 'info': {}}
{'iter': 1, 'episode': 28, 'step': 3, 'oldState': array([0.92280841]), 'action': array([0.59109452]), 'reward': 0.7176926146716709, 'newState': array([0.59109449]), 'info': {}}
{'iter': 1, 'episode': 28, 'step': 4, 'oldState': array([0.59109449]), 'action': array([0.94360253]), 'reward': 0.7029228985463453, 'newState': array([0.94360256]), 'info': {}}
{'iter': 1, 'episode': 29, 'step': 0, 'oldState': array([0]), 'action': array([0.91936911]), 'reward': 0.3987705454685747, 'newSt

{'iter': 0, 'episode': 3, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 3, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 3, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.25148669]), 'reward': 0.6047299115000284, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 3, 'step': 3, 'oldState': array([0.25148669]), 'action': array([0.]), 'reward': 0.7776438088724638, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 3, 'step': 4, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 4, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'epis

{'iter': 0, 'episode': 35, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 35, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 35, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 35, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.]), 'reward': 0.6047298934776729, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 35, 'step': 4, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 36, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0,

{'iter': 1, 'episode': 14, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 14, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 14, 'step': 4, 'oldState': array([0.25148669]), 'action': array([0.25148669]), 'reward': 1.0, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 15, 'step': 0, 'oldState': array([0]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 15, 'step': 1, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 15, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info'

{'iter': 1, 'episode': 45, 'step': 4, 'oldState': array([0.25148669]), 'action': array([0.25148669]), 'reward': 1.0, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 46, 'step': 0, 'oldState': array([0]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 46, 'step': 1, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 46, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 46, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 46, 'step': 4, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 

{'iter': 0, 'episode': 30, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.]), 'reward': 0.47026447171799, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 3, 'oldState': array([0.]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 4, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.]), 'reward': 0.47026447171799, 'newState': array([0.]), 'info': {}}
{'iter

{'iter': 1, 'episode': 14, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 14, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 14, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 14, 'step': 4, 'oldState': array([0.25148669]), 'action': array([0.25148669]), 'reward': 1.0, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 15, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 15, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newSta

{'iter': 1, 'episode': 47, 'step': 4, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 48, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 48, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 48, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.75446006]), 'reward': 1.0, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 48, 'step': 3, 'oldState': array([0.75446004]), 'action': array([0.75446006]), 'reward': 1.0, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 48, 'step': 4, 'oldState': array([0.75446004]), 'action': array([0.75446006]), 'reward': 1.0, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 49, 'ste

{'iter': 0, 'episode': 30, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.75446006]), 'reward': 1.0, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 3, 'oldState': array([0.75446004]), 'action': array([0.75446006]), 'reward': 1.0, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 4, 'oldState': array([0.75446004]), 'action': array([0.75446006]), 'reward': 1.0, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{

{'iter': 1, 'episode': 12, 'step': 0, 'oldState': array([0]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 12, 'step': 1, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 12, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 12, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 12, 'step': 4, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 13, 'step': 0, 'oldState': array([0]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0

{'iter': 1, 'episode': 43, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 43, 'step': 3, 'oldState': array([0.25148669]), 'action': array([0.25148669]), 'reward': 1.0, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 43, 'step': 4, 'oldState': array([0.25148669]), 'action': array([0.]), 'reward': 0.7776438088724638, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 44, 'step': 0, 'oldState': array([0]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 44, 'step': 1, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 44, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.2514

{'iter': 0, 'episode': 23, 'step': 4, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 24, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 24, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 24, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.25148669]), 'reward': 0.6047299115000284, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 24, 'step': 3, 'oldState': array([0.25148669]), 'action': array([0.]), 'reward': 0.7776438088724638, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 24, 'step': 4, 'oldState': array([0.]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0

{'iter': 1, 'episode': 2, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 2, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.75446006]), 'reward': 0.7776438320480555, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 2, 'step': 4, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 3, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555

{'iter': 1, 'episode': 30, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 30, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 30, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 30, 'step': 4, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 31, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 31, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newSta

{'iter': 0, 'episode': 8, 'step': 3, 'oldState': array([0.75446004]), 'action': array([0.]), 'reward': 0.47026447171799, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 8, 'step': 4, 'oldState': array([0.]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 9, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 9, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.50297337]), 'reward': 0.7776438088724638, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 9, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 9, 'step': 3, 'oldState': array([0.25148669]), 'action': array([0.]), 'reward': 0.7776438088724638, 'newState': array([0.]), 'info': {

{'iter': 0, 'episode': 34, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 34, 'step': 1, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 34, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 34, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 34, 'step': 4, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 35, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'epi

{'iter': 1, 'episode': 10, 'step': 3, 'oldState': array([0.25148669]), 'action': array([0.50297337]), 'reward': 0.7776438088724638, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 10, 'step': 4, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 11, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 11, 'step': 1, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 11, 'step': 2, 'oldState': array([0.25148669]), 'action': array([0.50297337]), 'reward': 0.7776438088724638, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 11, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newSta

{'iter': 1, 'episode': 35, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 35, 'step': 1, 'oldState': array([0.50297338]), 'action': array([0.]), 'reward': 0.6047298934776729, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 35, 'step': 2, 'oldState': array([0.]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 35, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 35, 'step': 4, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 36, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1

{'iter': 0, 'episode': 10, 'step': 0, 'oldState': array([0]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 10, 'step': 1, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 10, 'step': 2, 'oldState': array([0.]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 10, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.]), 'reward': 0.6047298934776729, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 10, 'step': 4, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 11, 'step': 0, 'oldState': array([0]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 11, 'step': 1, 'oldState': array([0.]), 'action': array([0.25148669]), 'reward': 0.77764380887246

{'iter': 0, 'episode': 33, 'step': 3, 'oldState': array([0.75446004]), 'action': array([0.75446006]), 'reward': 1.0, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 33, 'step': 4, 'oldState': array([0.75446004]), 'action': array([0.75446006]), 'reward': 1.0, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 34, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 34, 'step': 1, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 34, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 34, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode':

{'iter': 1, 'episode': 7, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 7, 'step': 4, 'oldState': array([0.25148669]), 'action': array([0.50297337]), 'reward': 0.7776438088724638, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 8, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 8, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.]), 'reward': 0.7776438088724638, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 8, 'step': 2, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 8, 'step': 3, 'oldState': array([0.]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'epi

{'iter': 1, 'episode': 30, 'step': 2, 'oldState': array([0.25148669]), 'action': array([0.25148669]), 'reward': 1.0, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 30, 'step': 3, 'oldState': array([0.25148669]), 'action': array([0.25148669]), 'reward': 1.0, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 30, 'step': 4, 'oldState': array([0.25148669]), 'action': array([0.25148669]), 'reward': 1.0, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 31, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 31, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.50297337]), 'reward': 0.7776438088724638, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 31, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'inf

{'iter': 0, 'episode': 5, 'step': 1, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 5, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 5, 'step': 3, 'oldState': array([0.25148669]), 'action': array([0.25148669]), 'reward': 1.0, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 5, 'step': 4, 'oldState': array([0.25148669]), 'action': array([0.50297337]), 'reward': 0.7776438088724638, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 6, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 6, 'step': 1, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': a

{'iter': 0, 'episode': 31, 'step': 2, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 3, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 4, 'oldState': array([0.]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 32, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 32, 'step': 1, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 32, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 32, 'step': 3, 'oldState': array([0.5029

{'iter': 1, 'episode': 6, 'step': 3, 'oldState': array([0.25148669]), 'action': array([0.25148669]), 'reward': 1.0, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 6, 'step': 4, 'oldState': array([0.25148669]), 'action': array([0.75446006]), 'reward': 0.6047299115000284, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 7, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 7, 'step': 1, 'oldState': array([0.50297338]), 'action': array([0.]), 'reward': 0.6047298934776729, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 7, 'step': 2, 'oldState': array([0.]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 7, 'step': 3, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info'

{'iter': 1, 'episode': 32, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 32, 'step': 1, 'oldState': array([0.50297338]), 'action': array([0.]), 'reward': 0.6047298934776729, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 32, 'step': 2, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 32, 'step': 3, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 32, 'step': 4, 'oldState': array([0.]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 33, 'step': 0, 'oldState': array([0]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 33, 'step': 1, 'oldState': array([0.7

{'iter': 0, 'episode': 6, 'step': 0, 'oldState': array([0]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 6, 'step': 1, 'oldState': array([0.75446004]), 'action': array([0.75446006]), 'reward': 1.0, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 6, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.25148669]), 'reward': 0.6047299115000284, 'newState': array([0.25148669]), 'info': {}}
{'iter': 0, 'episode': 6, 'step': 3, 'oldState': array([0.25148669]), 'action': array([0.50297337]), 'reward': 0.7776438088724638, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 6, 'step': 4, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 7, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info'

{'iter': 0, 'episode': 30, 'step': 2, 'oldState': array([0.25148669]), 'action': array([0.50297337]), 'reward': 0.7776438088724638, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.50297337]), 'reward': 1.0, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 30, 'step': 4, 'oldState': array([0.50297338]), 'action': array([0.75446006]), 'reward': 0.7776438320480555, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 0, 'oldState': array([0]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 1, 'oldState': array([0.50297338]), 'action': array([0.75446006]), 'reward': 0.7776438320480555, 'newState': array([0.75446004]), 'info': {}}
{'iter': 0, 'episode': 31, 'step': 2, 'oldState': array([0.75446004]), 'action': array([0.25148669]), 'reward': 0.6047299115000284, 'newSta

{'iter': 1, 'episode': 4, 'step': 0, 'oldState': array([0]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 1, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 2, 'oldState': array([0.]), 'action': array([0.50297337]), 'reward': 0.6047298934776729, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 3, 'oldState': array([0.50297338]), 'action': array([0.]), 'reward': 0.6047298934776729, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 4, 'step': 4, 'oldState': array([0.]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), 'info': {}}
{'iter': 1, 'episode': 5, 'step': 0, 'oldState': array([0]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 5, 'step': 1, 'oldState': array([0.25148669]), 'action': array([0.754460

{'iter': 1, 'episode': 27, 'step': 0, 'oldState': array([0]), 'action': array([0.75446006]), 'reward': 0.47026447171799, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 27, 'step': 1, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 27, 'step': 2, 'oldState': array([0.50297338]), 'action': array([0.75446006]), 'reward': 0.7776438320480555, 'newState': array([0.75446004]), 'info': {}}
{'iter': 1, 'episode': 27, 'step': 3, 'oldState': array([0.75446004]), 'action': array([0.50297337]), 'reward': 0.7776438320480555, 'newState': array([0.50297338]), 'info': {}}
{'iter': 1, 'episode': 27, 'step': 4, 'oldState': array([0.50297338]), 'action': array([0.25148669]), 'reward': 0.7776438088724638, 'newState': array([0.25148669]), 'info': {}}
{'iter': 1, 'episode': 28, 'step': 0, 'oldState': array([0]), 'action': array([0.]), 'reward': 1.0, 'newState': array([0.]), '

0.1
  Algorithm    Reward      Time   Space
0    Random  3.665350  5.092520 -8459.5
1     AdaQL  4.528145  4.938095 -9191.5
2     AdaMB  4.725755  5.018150 -7295.0
3   Unif QL  4.290420  5.119655 -7701.5
4   Unif MB  5.000000  4.824885 -7705.0


Here we see the uniform discretization model based algorithm performs the best with a minimal time complexity for evaluating the algorithm.