## Introduction

For this notebook we will use a dedicated environment called "l2rpn_neurips_2020_track1", that has 36 substations. Grid2op comes with many different environments, with different problems etc. In this notebook, we will only mention and explain this specific environment.

The approach in this notebook is a loss minimization approach. We have used SARSA algorithm with Q-learning network and this new approach is called Deep SARSA Algorithm.

The algoritm takes current action $a$ with $\epsilon$-greedy algorithm on current state $S$. The reward $r$ and next state $S'$ is observed.

Later the current state $S$, current action $a$, reward $r$ and next state $S'$ is stored in buffer replay memory. The main idea behind the buffer replay memory is to train q-network on the experiences. The q-network will calculate q-values based on these experiences $Q(S,a)$.

A copy of q-network is used as target network to calculate the q-values of next action $a'$ on next state $S'$ where the $a'$ is selected using $\epsilon$-greedy algorithm. The idea behind this network is that it will predict next q-values $\hat{Q}(S',a')$ based on next actions $a'$ predicted by $\epsilon$-greedy algorithm and the next states $S'$. These q-values are then used to calculate next state-action values as $r+\gamma.\hat{Q}(S',a')$ which are then used to calculate loss function.

The loss function that we used is mean square error loss (MSE) which is calculated as $L = \frac{1}{|K|}\sum_{i=1}^{|K|}[(r+\gamma.\hat{Q}(S',a'))-Q(S,a)]^2$ Deep SARSA Algorithm will try to minimise this loss by adjusting the weights of the q-network accordingly.

The Structure of this notebook is as follows:
1. Importing necessary Libraries
2. Preprocess environment
3. Process replay memory
4. Create Deep SARSA Agent
5. Evaluate Agent

## Importing Necessary Libraries

In [2]:
import os
import warnings
import numpy as np
import copy
import argparse
from tqdm import tqdm
import json
import warnings
from abc import ABC, abstractmethod
from collections.abc import Iterable

import grid2op
from grid2op.Exceptions import Grid2OpException
from grid2op.Agent import AgentWithConverter
from grid2op.Converter import IdToAct
from grid2op.MakeEnv import make
from grid2op.Runner import Runner

from l2rpn_baselines.utils.replayBuffer import ReplayBuffer
from l2rpn_baselines.utils.trainingParam import TrainingParam
from l2rpn_baselines.utils.save_log_gif import save_log_gif
from grid2op.Reward import L2RPNReward
from l2rpn_baselines.utils.waring_msgs import _WARN_GPU_MEMORY

In [3]:
try:
    from grid2op.Chronics import MultifolderWithCache
    _CACHE_AVAILABLE_DEEPQAGENT = True
except ImportError:
    _CACHE_AVAILABLE_DEEPQAGENT = False

try:
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", category=FutureWarning)
        import tensorflow as tf
        import tensorflow.keras.optimizers as tfko
        from tensorflow.keras.models import Sequential, Model
        from tensorflow.keras.layers import Activation, Dense
        from tensorflow.keras.layers import Input
    _CAN_USE_TENSORFLOW = True
except ImportError:
    _CAN_USE_TENSORFLOW = False

## Setting up defaults

In [4]:
DEFAULT_LOGS_DIR = "./logs-eval/dsarsa_baseline"
DEFAULT_NB_EPISODE = 1
DEFAULT_NB_PROCESS = 1
DEFAULT_MAX_STEPS = -1
DEFAULT_NAME = "Deep_SARSA"

In [5]:
class DeepQAgent(AgentWithConverter):
    """
    This class allows to train and log the training of different Q learning algorithm.

    .. warning::
        This baseline recodes entire the RL training procedure. You can use it if you
        want to have a deeper look at Deep Q Learning algorithm and a possible (non 
        optimized, slow, etc. implementation ).
        
        For a much better implementation, you can reuse the code of "PPO_RLLIB" 
        or the "PPO_SB3" baseline.
        
        Prefer to use the :class:`GymAgent` class and the :class:`GymEnvWithHeuristics`
        classes to train agent interacting with grid2op and fully compatible
        with gym framework.	
        
    It is not meant to be the state of the art implement of some baseline. It is rather meant to be a set of
    useful functions that allows to easily develop an environment if we want to get started in RL using grid2op.

    It derives from :class:`grid2op.Agent.AgentWithConverter` and as such implements the :func:`DeepQAgent.convert_obs`
    and :func:`DeepQAgent.my_act`

    It is suppose to be a Baseline, so it implements also the

    - :func:`DeepQAgent.load`: to load the agent
    - :func:`DeepQAgent.save`: to save the agent
    - :func:`DeepQAgent.train`: to train the agent

    TODO description of the training scheme!

    Attributes
    ----------
    filter_action_fun: ``callable``
        The function used to filter the action of the action space. See the documentation of grid2op:
        :class:`grid2op.Converter.IdToAct`
        `here <https://grid2op.readthedocs.io/en/v0.9.3/converter.html#grid2op.Converter.IdToAct>`_ for more
        information.

    replay_buffer:
        The experience replay buffer

    deep_q: :class:`BaseDeepQ`
        The neural network, represented as a :class:`BaseDeepQ` object.

    name: ``str``
        The name of the Agent

    store_action: ``bool``
        Whether you want to register which action your agent took or not. Saving the action can slow down a bit
        the computation (less than 1%) but can help understand what your agent is doing during its learning process.

    dict_action: ``str``
        The action taken by the agent, represented as a dictionnary. This can be useful to know which type of actions
        is taken by your agent. Only filled if :attr:DeepQAgent.store_action` is ``True``

    istraining: ``bool``
        Whether you are training this agent or not. No more really used. Mainly used for backward compatibility.

    epsilon: ``float``
        The epsilon greedy exploration parameter.

    nb_injection: ``int``
        Number of action tagged as "injection". See the
        `official grid2op documentation <https://grid2op.readthedocs.io/en/v0.9.3/action.html?highlight=get_types#grid2op.Action.BaseAction.get_types>`_
        for more information.

    nb_voltage: ``int``
        Number of action tagged as "voltage". See the
        `official grid2op documentation <https://grid2op.readthedocs.io/en/v0.9.3/action.html?highlight=get_types#grid2op.Action.BaseAction.get_types>`_
        for more information.

    nb_topology: ``int``
        Number of action tagged as "topology". See the
        `official grid2op documentation <https://grid2op.readthedocs.io/en/v0.9.3/action.html?highlight=get_types#grid2op.Action.BaseAction.get_types>`_
        for more information.

    nb_redispatching: ``int``
        Number of action tagged as "redispatching". See the
        `official grid2op documentation <https://grid2op.readthedocs.io/en/v0.9.3/action.html?highlight=get_types#grid2op.Action.BaseAction.get_types>`_
        for more information.

    nb_storage: ``int``
        Number of action tagged as "storage". See the
        `official grid2op documentation <https://grid2op.readthedocs.io/en/v0.9.3/action.html?highlight=get_types#grid2op.Action.BaseAction.get_types>`_
        for more information.
        
    nb_curtail: ``int``
        Number of action tagged as "curtailment". See the
        `official grid2op documentation <https://grid2op.readthedocs.io/en/v0.9.3/action.html?highlight=get_types#grid2op.Action.BaseAction.get_types>`_
        for more information.

    nb_do_nothing: ``int``
        Number of action tagged as "do_nothing", *ie* when an action is not modifiying the state of the grid. See the
        `official grid2op documentation <https://grid2op.readthedocs.io/en/v0.9.3/action.html?highlight=get_types#grid2op.Action.BaseAction.get_types>`_
        for more information.

    verbose: ``bool``
        An effort will be made on the logging (outside of trensorboard) of the training. For now: verbose=True will
        allow some printing on the command prompt, and verbose=False will drastically reduce the amount of information
        printed during training.

    """
    def __init__(self,
                 action_space,
                 nn_archi,
                 name="DeepQAgent",
                 store_action=True,
                 istraining=False,
                 filter_action_fun=None,
                 verbose=False,
                 observation_space=None,
                 **kwargs_converters):
        if not _CAN_USE_TENSORFLOW:
            raise RuntimeError("Cannot import tensorflow, this function cannot be used.")
        
        AgentWithConverter.__init__(self, action_space, action_space_converter=IdToAct, **kwargs_converters)
        self.filter_action_fun = filter_action_fun
        if self.filter_action_fun is not None:
            self.action_space.filter_action(self.filter_action_fun)

        # and now back to the origin implementation
        self.replay_buffer = None
        self.__nb_env = None

        self.deep_q = None
        self._training_param = None
        self._tf_writer = None
        self.name = name
        self._losses = None
        self.__graph_saved = False
        self.store_action = store_action
        self.dict_action = {}
        self.istraining = istraining
        self.epsilon = 1.0

        # for tensorbaord
        self._train_lr = None

        self._reset_num = None

        self._max_iter_env_ = 1000000
        self._curr_iter_env = 0
        self._max_reward = 0.

        # action type
        self.nb_injection = 0
        self.nb_voltage = 0
        self.nb_topology = 0
        self.nb_line = 0
        self.nb_redispatching = 0
        self.nb_curtail = 0
        self.nb_storage = 0
        self.nb_do_nothing = 0

        # for over sampling the hard scenarios
        self._prev_obs_num = 0
        self._time_step_lived = None
        self._nb_chosen = None
        self._proba = None
        self._prev_id = 0
        # this is for the "limit the episode length" depending on your previous success
        self._total_sucesses = 0

        # neural network architecture
        self._nn_archi = nn_archi

        # observation tranformers
        self._obs_as_vect = None
        self._tmp_obs = None
        self._indx_obs = None
        self.verbose = verbose
        if observation_space is None:
            pass
        else:
            self.init_obs_extraction(observation_space)

        # for the frequency of action type
        self.current_ = 0
        self.nb_ = 10
        self._nb_this_time = np.zeros((self.nb_, 8), dtype=int)

        #
        self._vector_size = None
        self._actions_per_ksteps = None
        self._illegal_actions_per_ksteps = None
        self._ambiguous_actions_per_ksteps = None

    def _fill_vectors(self, training_param):
        self._vector_size  = self.nb_ * training_param.update_tensorboard_freq
        self._actions_per_ksteps = np.zeros((self._vector_size, self.action_space.size()), dtype=np.int)
        self._illegal_actions_per_ksteps = np.zeros(self._vector_size, dtype=np.int)
        self._ambiguous_actions_per_ksteps = np.zeros(self._vector_size, dtype=np.int)

    # grid2op.Agent interface
    def convert_obs(self, observation):
        """
        Generic way to convert an observation. This transform it to a vector and the select the attributes that were
        selected in :attr:`l2rpn_baselines.utils.NNParams.list_attr_obs` (that have been extracted once and for all
        in the :attr:`DeepQAgent._indx_obs` vector).

        Parameters
        ----------
        observation: :class:`grid2op.Observation.BaseObservation`
            The current observation sent by the environment

        Returns
        -------
        _tmp_obs: ``numpy.ndarray``
            The observation as vector with only the proper attribute selected (TODO scaling will be available
            in future version)

        """
        obs_as_vect = observation.to_vect()
        self._tmp_obs[:] = obs_as_vect[self._indx_obs]
        return self._tmp_obs

    def my_act(self, transformed_observation, reward, done=False):
        """
        This function will return the action (its id) selected by the underlying :attr:`DeepQAgent.deep_q` network.

        Before being used, this method require that the :attr:`DeepQAgent.deep_q` is created. To that end a call
        to :func:`DeepQAgent.init_deep_q` needs to have been performed (this is automatically done if you use
        baseline we provide and their `evaluate` and `train` scripts).

        Parameters
        ----------
        transformed_observation: ``numpy.ndarray``
            The observation, as transformed after :func:`DeepQAgent.convert_obs`

        reward: ``float``
            The reward of the last time step. Ignored by this method. Here for retro compatibility with openAI
            gym interface.

        done: ``bool``
            Whether the episode is over or not. This is not used, and is only present to be compliant with
            open AI gym interface

        Returns
        -------
        res: ``int``
            The id the action taken.

        """
        predict_movement_int, *_ = self.deep_q.predict_movement(transformed_observation,
                                                                epsilon=0.0,
                                                                training=False)
        res = int(predict_movement_int)
        self._store_action_played(res)
        return res

    @staticmethod
    def get_action_size(action_space, filter_fun, kwargs_converters):
        """
        This function allows to get the size of the action space if we were to built a :class:`DeepQAgent`
        with this parameters.

        Parameters
        ----------
        action_space: :class:`grid2op.ActionSpace`
            The grid2op action space used.

        filter_fun: ``callable``
            see :attr:`DeepQAgent.filter_fun` for more information

        kwargs_converters: ``dict``
            see the documentation of grid2op for more information:
            `here <https://grid2op.readthedocs.io/en/v0.9.3/converter.html?highlight=idToAct#grid2op.Converter.IdToAct.init_converter>`_


        See Also
        --------
            The official documentation of grid2Op, and especially its class "IdToAct" at this address
            `IdToAct <https://grid2op.readthedocs.io/en/v0.9.3/converter.html?highlight=idToAct#grid2op.Converter.IdToAct>`_

        """
        converter = IdToAct(action_space)
        converter.init_converter(**kwargs_converters)
        if filter_fun is not None:
            converter.filter_action(filter_fun)
        return converter.n

    def init_obs_extraction(self, observation_space):
        """
        This method should be called to initialize the observation (feed as a vector in the neural network)
        from its description as a list of its attribute names.
        """
        tmp = np.zeros(0, dtype=np.uint)  # TODO platform independant
        for obs_attr_name in self._nn_archi.get_obs_attr():
            beg_, end_, dtype_ = observation_space.get_indx_extract(obs_attr_name)
            tmp = np.concatenate((tmp, np.arange(beg_, end_, dtype=np.uint)))
        self._indx_obs = tmp
        self._tmp_obs = np.zeros((1, tmp.shape[0]), dtype=np.float32)

    # baseline interface
    def load(self, path):
        """
        Part of the l2rpn_baselines interface, this function allows to read back a trained model, to continue the
        training or to evaluate its performance for example.

        **NB** To reload an agent, it must have exactly the same name and have been saved at the right location.

        Parameters
        ----------
        path: ``str``
            The path where the agent has previously beens saved.

        """
        # not modified compare to original implementation
        tmp_me = os.path.join(path, self.name)
        if not os.path.exists(tmp_me):
            raise RuntimeError("The model should be stored in \"{}\". But this appears to be empty".format(tmp_me))
        self._load_action_space(tmp_me)

        # TODO handle case where training param class has been overidden
        self._training_param = TrainingParam.from_json(os.path.join(tmp_me, "training_params.json".format(self.name)))
        self.deep_q = self._nn_archi.make_nn(self._training_param)
        try:
            self.deep_q.load_network(tmp_me, name=self.name)
        except Exception as e:
            raise RuntimeError("Impossible to load the model located at \"{}\" with error \n{}".format(path, e))

        for nm_attr in ["_time_step_lived", "_nb_chosen", "_proba"]:
            conv_path = os.path.join(tmp_me, "{}.npy".format(nm_attr))
            if os.path.exists(conv_path):
                setattr(self, nm_attr, np.load(file=conv_path))

    def save(self, path):
        """
        Part of the l2rpn_baselines interface, this allows to save a model. Its name is used at saving time. The
        same name must be reused when loading it back.

        Parameters
        ----------
        path: ``str``
            The path where to save the agent.

        """
        if path is not None:
            tmp_me = os.path.join(path, self.name)
            if not os.path.exists(tmp_me):
                os.mkdir(tmp_me)
            nm_conv = "action_space.npy"
            conv_path = os.path.join(tmp_me, nm_conv)
            if not os.path.exists(conv_path):
                self.action_space.save(path=tmp_me, name=nm_conv)

            self._training_param.save_as_json(tmp_me, name="training_params.json")
            self._nn_archi.save_as_json(tmp_me, "nn_architecture.json")
            self.deep_q.save_network(tmp_me, name=self.name)

            # TODO save the "oversampling" part, and all the other info
            for nm_attr in ["_time_step_lived", "_nb_chosen", "_proba"]:
                conv_path = os.path.join(tmp_me, "{}.npy".format(nm_attr))
                attr_ = getattr(self, nm_attr)
                if attr_ is not None:
                    np.save(arr=attr_, file=conv_path)

    def train(self,
              env,
              iterations,
              save_path,
              logdir,
              training_param=None):
        """
        Part of the public l2rpn-baselines interface, this function allows to train the baseline.

        If `save_path` is not None, the the model is saved regularly, and also at the end of training.

        TODO explain a bit more how you can train it.

        Parameters
        ----------
        env: :class:`grid2op.Environment.Environment` or :class:`grid2op.Environment.MultiEnvironment`
            The environment used to train your model.

        iterations: ``int``
            The number of training iteration. NB when reloading a model, this is **NOT** the training steps that will
            be used when re training. Indeed, if `iterations` is 1000 and the model was already trained for 750 time
            steps, then when reloaded, the training will occur on 250 (=1000 - 750) time steps only.

        save_path: ``str``
            Location at which to save the model

        logdir: ``str``
            Location at which tensorboard related information will be kept.

        training_param: :class:`l2rpn_baselines.utils.TrainingParam`
            The meta parameters for the training procedure. This is currently ignored if the model is reloaded (in that
            case the parameters used when first created will be used)

        """

        if training_param is None:
            training_param = TrainingParam()

        self._train_lr = training_param.lr

        if self._training_param is None:
            self._training_param = training_param
        else:
            training_param = self._training_param
        self._init_deep_q(self._training_param, env)
        self._fill_vectors(self._training_param)

        self._init_replay_buffer()

        # efficient reading of the data (read them by chunk of roughly 1 day
        nb_ts_one_day = 24 * 60 / 5  # number of time steps per day
        self._set_chunk(env, nb_ts_one_day)

        # Create file system related vars
        if save_path is not None:
            save_path = os.path.abspath(save_path)
            os.makedirs(save_path, exist_ok=True)

        if logdir is not None:
            logpath = os.path.join(logdir, self.name)
            self._tf_writer = tf.summary.create_file_writer(logpath, name=self.name)
        else:
            logpath = None
            self._tf_writer = None
        UPDATE_FREQ = training_param.update_tensorboard_freq  # update tensorboard every "UPDATE_FREQ" steps
        SAVING_NUM = training_param.save_model_each

        if hasattr(env, "nb_env"):
            nb_env = env.nb_env
            warnings.warn("Training using {} environments".format(nb_env))
            self.__nb_env = nb_env
        else:
            self.__nb_env = 1
        # if isinstance(env, grid2op.Environment.Environment):
        #     self.__nb_env = 1
        # else:
        #     import warnings
        #     nb_env = env.nb_env
        #     warnings.warn("Training using {} environments".format(nb_env))
        #     self.__nb_env = nb_env

        self.init_obs_extraction(env.observation_space)

        training_step = self._training_param.last_step

        # some parameters have been move to a class named "training_param" for convenience
        self.epsilon = self._training_param.initial_epsilon

        # now the number of alive frames and total reward depends on the "underlying environment". It is vector instead
        # of scalar
        alive_frame, total_reward = self._init_global_train_loop()
        reward, done = self._init_local_train_loop()
        epoch_num = 0
        self._losses = np.zeros(iterations)
        alive_frames = np.zeros(iterations)
        total_rewards = np.zeros(iterations)
        new_state = None
        self._reset_num = 0
        self._curr_iter_env = 0
        self._max_reward = env.reward_range[1]

        # action types
        # injection, voltage, topology, line, redispatching = action.get_types()
        self.nb_injection = 0
        self.nb_voltage = 0
        self.nb_topology = 0
        self.nb_line = 0
        self.nb_redispatching = 0
        self.nb_curtail = 0
        self.nb_storage = 0
        self.nb_do_nothing = 0

        # for non uniform random sampling of the scenarios
        th_size = None
        self._prev_obs_num = 0
        if self.__nb_env == 1:
            # TODO make this available for multi env too
            if _CACHE_AVAILABLE_DEEPQAGENT:
                if isinstance(env.chronics_handler.real_data, MultifolderWithCache):
                    th_size = env.chronics_handler.real_data.cache_size
            if th_size is None:
                th_size = len(env.chronics_handler.real_data.subpaths)

            # number of time step lived per possible scenarios
            if self._time_step_lived is None or self._time_step_lived.shape[0] != th_size:
                self._time_step_lived = np.zeros(th_size, dtype=np.uint64)
            # number of time a given scenario has been played
            if self._nb_chosen is None or self._nb_chosen.shape[0] != th_size:
                self._nb_chosen = np.zeros(th_size, dtype=np.uint)
            # number of time a given scenario has been played
            if self._proba is None or self._proba.shape[0] != th_size:
                self._proba = np.ones(th_size, dtype=np.float64)

        self._prev_id = 0
        # this is for the "limit the episode length" depending on your previous success
        self._total_sucesses = 0

        with tqdm(total=iterations - training_step, disable=not self.verbose) as pbar:
            while training_step < iterations:
                # reset or build the environment
                initial_state = self._need_reset(env, training_step, epoch_num, done, new_state)

                # Slowly decay the exploration parameter epsilon
                # if self.epsilon > training_param.FINAL_EPSILON:
                self.epsilon = self._training_param.get_next_epsilon(current_step=training_step)

                # then we need to predict the next moves. Agents have been adapted to predict a batch of data
                pm_i, pq_v, act = self._next_move(initial_state, self.epsilon, training_step)
                EPS = self.epsilon
#                 print(f'Epsilon in DQA train is {self.epsilon}')
#                 print(f'Epsilon stored in DQA train is {EPS}')

                # todo store the illegal / ambiguous / ... actions
                reward, done = self._init_local_train_loop()
                if self.__nb_env == 1:
                    # still the "hack" to have same interface between multi env and env...
                    # yeah it's a pain
                    act = act[0]

                temp_observation_obj, temp_reward, temp_done, info = env.step(act)
                if self.__nb_env == 1:
                    # dirty hack to wrap them into list
                    temp_observation_obj = [temp_observation_obj]
                    temp_reward = np.array([temp_reward], dtype=np.float32)
                    temp_done = np.array([temp_done], dtype=np.bool)
                    info = [info]

                new_state = self._convert_obs_train(temp_observation_obj)
                self._updage_illegal_ambiguous(training_step, info)
                done, reward, total_reward, alive_frame, epoch_num \
                    = self._update_loop(done, temp_reward, temp_done, alive_frame, total_reward, reward, epoch_num)

                # update the replay buffer
                self._store_new_state(initial_state, pm_i, reward, done, new_state)

#                 print(f'<<<<<<<< Should we not train the model? {self._train_model(training_step)}>>>>>>>>>>')
                # now train the model
                if not self._train_model(training_step):
                    # infinite loss in this case
                    raise RuntimeError("ERROR INFINITE LOSS")

                # Save the network every 1000 iterations
                if training_step % SAVING_NUM == 0 or training_step == iterations - 1:
                    self.save(save_path)

                # save some information to tensorboard
                alive_frames[epoch_num] = np.mean(alive_frame)
                total_rewards[epoch_num] = np.mean(total_reward)
                self._store_action_played_train(training_step, pm_i)
                self._save_tensorboard(training_step, epoch_num, UPDATE_FREQ, total_rewards, alive_frames)
                training_step += 1
                pbar.update(1)

        self.save(save_path)

    # auxiliary functions
    # two below function: to train with multiple environments
    def _convert_obs_train(self, observations):
        """ create the observations that are used for training."""
        if self._obs_as_vect is None:
            size_obs = self.convert_obs(observations[0]).shape[1]
            self._obs_as_vect = np.zeros((self.__nb_env, size_obs), dtype=np.float32)

        for i, obs in enumerate(observations):
            self._obs_as_vect[i, :] = self.convert_obs(obs).reshape(-1)
        return self._obs_as_vect

    def _create_action_if_not_registered(self, action_int):
        """make sure that `action_int` is present in dict_action"""
        if action_int not in self.dict_action:
            act = self.action_space.all_actions[action_int]
            is_inj, is_volt, is_topo, is_line_status, is_redisp, is_storage, is_dn, is_curtail = \
                False, False, False, False, False, False, False, False
            try:
                # feature unavailble in grid2op <= 0.9.2
                try:
                    # storage introduced in grid2op 1.5.0 so if below it is not supported
                    is_inj, is_volt, is_topo, is_line_status, is_redisp = act.get_types()
                except ValueError as exc_:
                    try:
                        is_inj, is_volt, is_topo, is_line_status, is_redisp, is_storage = act.get_types()
                    except ValueError as exc_:
                        is_inj, is_volt, is_topo, is_line_status, is_redisp, is_storage, is_curtail = act.get_types()

                is_dn = (not is_inj) and (not is_volt) and (not is_topo) and (not is_line_status) and (not is_redisp)
                is_dn = is_dn and (not is_storage)
                is_dn = is_dn and (not is_curtail)
            except Exception as exc_:
                pass

            self.dict_action[action_int] = [0, act,
                                            (is_inj, is_volt, is_topo, is_line_status, is_redisp, is_storage, is_curtail, is_dn)]

    def _store_action_played(self, action_int):
        """if activated, this function will store the action taken by the agent."""
        if self.store_action:
            self._create_action_if_not_registered(action_int)

            self.dict_action[action_int][0] += 1
            (is_inj, is_volt, is_topo, is_line_status, is_redisp, is_storage, is_curtail, is_dn) = self.dict_action[action_int][2]
            if is_inj:
                self.nb_injection += 1
            if is_volt:
                self.nb_voltage += 1
            if is_topo:
                self.nb_topology += 1
            if is_line_status:
                self.nb_line += 1
            if is_redisp:
                self.nb_redispatching += 1
            if is_storage:
                self.nb_storage += 1
                self.nb_redispatching += 1
            if is_curtail:
                self.nb_curtail += 1
            if is_dn:
                self.nb_do_nothing += 1

    def _convert_all_act(self, act_as_integer):
        """this function converts the action given as a list of integer. It ouputs a list of valid grid2op Action"""
        res = []
        for act_id in act_as_integer:
            res.append(self.convert_act(act_id))
            self._store_action_played(act_id)
        return res

    def _load_action_space(self, path):
        """ load the action space in case the model is reloaded"""
        if not os.path.exists(path):
            raise RuntimeError("The model should be stored in \"{}\". But this appears to be empty".format(path))
        try:
            self.action_space.init_converter(
                all_actions=os.path.join(path, "action_space.npy".format(self.name)))
        except Exception as e:
            raise RuntimeError("Impossible to reload converter action space with error \n{}".format(e))

    # utilities for data reading
    def _set_chunk(self, env, nb):
        """
        to optimize the data reading process. See the official grid2op documentation for the effect of setting
        the chunk size for the environment.
        """
        env.set_chunk_size(int(max(100, nb)))

    def _train_model(self, training_step):
        """train the deep q networks."""
#         print('<<<<<<<<<< train model from  deepQagent called >>>>>>>>>>')
#         print(f'<<<<<<<<<< min observation is {self._training_param.min_observation} and min batch size is {self._training_param.minibatch_size}')
#         print(f'<<<<<<<<<< training step is {training_step}')
        self._training_param.tell_step(training_step)
        if training_step > max(self._training_param.min_observation, self._training_param.minibatch_size) and \
            self._training_param.do_train():

            # train the model
            s_batch, a_batch, r_batch, d_batch, s2_batch = self.replay_buffer.sample(self._training_param.minibatch_size)
            tf_writer = None
            if self.__graph_saved is False:
                tf_writer = self._tf_writer
            
#             print(f'the epsilon in _train_model is {self.epsilon}')
            loss = self.deep_q.train(s_batch, a_batch, r_batch, d_batch, s2_batch, self.epsilon, 
                                     tf_writer)
            # save learning rate for later
            self._train_lr = self.deep_q._optimizer_model._decayed_lr('float32').numpy()
            self.__graph_saved = True
            if not np.all(np.isfinite(loss)):
                # if the loss is not finite i stop the learning
                return False
            self.deep_q.target_train()
            self._losses[training_step:] = np.sum(loss)
        return True

    def _updage_illegal_ambiguous(self, curr_step, info):
        """update the conunt of illegal and ambiguous actions"""
        tmp_ = curr_step % self._vector_size
        self._illegal_actions_per_ksteps[tmp_] = np.sum([el["is_illegal"] for el in info])
        self._ambiguous_actions_per_ksteps[tmp_] = np.sum([el["is_ambiguous"] for el in info])

    def _store_action_played_train(self, training_step, action_id):
        """store which action were played, for tensorboard only."""
        which_row = training_step % self._vector_size
        self._actions_per_ksteps[which_row, :] = 0
        self._actions_per_ksteps[which_row, action_id] += 1

    def _fast_forward_env(self, env, time=7*24*60/5):
        """use this functio to skip some time steps when environment is reset."""
        my_int = np.random.randint(0, min(time, env.chronics_handler.max_timestep()))
        env.fast_forward_chronics(my_int)

    def _reset_env_clean_state(self, env):
        """
        reset this environment to a proper state. This should rather be integrated in grid2op. And will probably
        be integrated partially starting from grid2op 1.0.0
        """
        # /!\ DO NOT ATTEMPT TO MODIFY OTHERWISE IT WILL PROBABLY CRASH /!\
        # /!\ THIS WILL BE PART OF THE ENVIRONMENT IN FUTURE GRID2OP RELEASE (>= 1.0.0) /!\
        # AND OF COURSE USING THIS METHOD DURING THE EVALUATION IS COMPLETELY FORBIDDEN
        if self.__nb_env > 1:
            return
        env.current_obs = None
        env.env_modification = None
        env._reset_maintenance()
        env._reset_redispatching()
        env._reset_vectors_and_timings()
        _backend_action = env._backend_action_class()
        _backend_action.all_changed()
        env._backend_action =_backend_action
        env.backend.apply_action(_backend_action)
        _backend_action.reset()
        *_, fail_to_start, info = env.step(env.action_space())
        if fail_to_start:
            # this is happening because not enough care has been taken to handle these problems
            # more care will be taken when this feature will be available in grid2op directly.
            raise Grid2OpException("Impossible to initialize the powergrid, the powerflow diverge at iteration 0. "
                                   "Available information are: {}".format(info))
        env._reset_vectors_and_timings()

    def _need_reset(self, env, observation_num, epoch_num, done, new_state):
        """perform the proper reset of the environment"""
        if self._training_param.step_increase_nb_iter is not None and \
           self._training_param.step_increase_nb_iter > 0:
            self._max_iter_env(min(max(self._training_param.min_iter,
                                       self._training_param.max_iter_fun(self._total_sucesses)),
                                   self._training_param.max_iter))  # TODO
        self._curr_iter_env += 1
        if new_state is None:
            # it's the first ever loop
            obs = env.reset()
            if self.__nb_env == 1:
                # still hack to have same program interface between multi env and not multi env
                obs = [obs]
            new_state = self._convert_obs_train(obs)
        elif self.__nb_env > 1:
            # in multi env this is automatically handled
            pass
        elif done[0]:
            nb_ts_one_day = 24*60/5
            if False:
                # the 3-4 lines below allow to reuse the loaded dataset and continue further up in the
                try:
                    self._reset_env_clean_state(env)
                    # random fast forward between now and next day
                    self._fast_forward_env(env, time=nb_ts_one_day)
                except (StopIteration, Grid2OpException):
                    env.reset()
                    # random fast forward between now and next week
                    self._fast_forward_env(env, time=7*nb_ts_one_day)

            # update the number of time steps it has live
            ts_lived = observation_num - self._prev_obs_num
            if self._time_step_lived is not None:
                self._time_step_lived[self._prev_id] += ts_lived
            self._prev_obs_num = observation_num
            if self._training_param.oversampling_rate is not None:
                # proba = np.sqrt(1. / (self._time_step_lived +1))
                # # over sampling some kind of "UCB like" stuff
                # # https://banditalgs.com/2016/09/18/the-upper-confidence-bound-algorithm/

                # proba = 1. / (self._time_step_lived + 1)
                self._proba[:] = 1. / (self._time_step_lived ** self._training_param.oversampling_rate + 1)
                self._proba /= np.sum(self._proba)

            _prev_id = self._prev_id
            self._prev_id = None
            if _CACHE_AVAILABLE_DEEPQAGENT:
                if isinstance(env.chronics_handler.real_data, MultifolderWithCache):
                    self._prev_id = env.chronics_handler.real_data.sample_next_chronics(self._proba)
            if self._prev_id is None:
                self._prev_id = _prev_id + 1
                self._prev_id %= self._time_step_lived.shape[0]

            obs = self._reset_env(env, epoch_num)
            if self._training_param.sample_one_random_action_begin is not None and \
                    observation_num < self._training_param.sample_one_random_action_begin:
                done = True
                while done:
                    act = env.action_space(env.action_space._sample_set_bus())
                    obs, reward, done, info = env.step(act)
                    if info["is_illegal"] or info["is_ambiguous"]:
                        # there are no guarantee that sampled action are legal nor perfectly
                        # correct.
                        # if that is the case, i "simply" restart the process, as if the action
                        # broke everything
                        done = True

                    if done:
                        obs = self._reset_env(env, epoch_num)
                    else:
                        if self.verbose:
                            print("step {}: {}".format(observation_num, act))

                obs = [obs]  # for compatibility with multi env...
            new_state = self._convert_obs_train(obs)
        return new_state

    def _reset_env(self, env, epoch_num):
        env.reset()
        if self._nb_chosen is not None:
            self._nb_chosen[self._prev_id] += 1

        # random fast forward between now and next week
        if self._training_param.random_sample_datetime_start is not None:
            self._fast_forward_env(env, time=self._training_param.random_sample_datetime_start)

        self._curr_iter_env = 0
        obs = [env.current_obs]
        if epoch_num % len(env.chronics_handler.real_data.subpaths) == 0:
            # re shuffle the data
            env.chronics_handler.shuffle(lambda x: x[np.random.choice(len(x), size=len(x), replace=False)])
        return obs

    def _init_replay_buffer(self):
        """create and initialized the replay buffer"""
        self.replay_buffer = ReplayBuffer(self._training_param.buffer_size)

    def _store_new_state(self, initial_state, predict_movement_int, reward, done, new_state):
        """store the new state in the replay buffer"""
        # vectorized version of the previous code
        for i_s, pm_i, reward, done, ns in zip(initial_state, predict_movement_int, reward, done, new_state):
            self.replay_buffer.add(i_s,
                                   pm_i,
                                   reward,
                                   done,
                                   ns)

    def _max_iter_env(self, new_max_iter):
        """update the number of maximum iteration allowed."""
        self._max_iter_env_ = new_max_iter

    def _next_move(self, curr_state, epsilon, training_step):
        # supposes that 0 encodes for do nothing, otherwise it will NOT work (for the observer)
        pm_i, pq_v, q_actions = self.deep_q.predict_movement(curr_state, epsilon, training=True)
        # TODO implement the "max XXX random action per scenarios"
        pm_i, pq_v = self._short_circuit_actions(training_step, pm_i, pq_v, q_actions)
        act = self._convert_all_act(pm_i)
        return pm_i, pq_v, act

    def _short_circuit_actions(self, training_step, pm_i, pq_v, q_actions):
        if self._training_param.min_observe is not None and \
                training_step < self._training_param.min_observe:
            # action is replaced by do nothing due to the "observe only" specification
            pm_i[:] = 0
            pq_v[:] = q_actions[:, 0]
        return pm_i, pq_v

    def _init_global_train_loop(self):
        alive_frame = np.zeros(self.__nb_env, dtype=np.int)
        total_reward = np.zeros(self.__nb_env, dtype=np.float32)
        return alive_frame, total_reward

    def _update_loop(self, done, temp_reward, temp_done, alive_frame, total_reward, reward, epoch_num):
        if self.__nb_env == 1:
            # force end of episode at early stage of learning
            if self._curr_iter_env >= self._max_iter_env_:
                temp_done[0] = True
                temp_reward[0] = self._max_reward
                self._total_sucesses += 1

        done = temp_done
        alive_frame[done] = 0
        total_reward[done] = 0.
        self._reset_num += np.sum(done)
        if self._reset_num >= self.__nb_env:
            # increase the "global epoch num" represented by "epoch_num" only when on average
            # all environments are "done"
            epoch_num += 1
            self._reset_num = 0

        total_reward[~done] += temp_reward[~done]
        alive_frame[~done] += 1
        return done, temp_reward, total_reward, alive_frame, epoch_num

    def _init_local_train_loop(self):
        # reward, done = np.zeros(self.nb_process), np.full(self.nb_process, fill_value=False, dtype=np.bool)
        reward = np.zeros(self.__nb_env, dtype=np.float32)
        done = np.full(self.__nb_env, fill_value=False, dtype=np.bool)
        return reward, done

    def _init_deep_q(self, training_param, env):
        """
        This function serves as initializin the neural network.
        """
        if self.deep_q is None:
            self.deep_q = self._nn_archi.make_nn(training_param)
        self.init_obs_extraction(env.observation_space)

    def _save_tensorboard(self, step, epoch_num, UPDATE_FREQ, epoch_rewards, epoch_alive):
        """save all the informations needed in tensorboard."""
        if self._tf_writer is None:
            return

        # Log some useful metrics every even updates
        if step % UPDATE_FREQ == 0 and epoch_num > 0:
            if step % (10 * UPDATE_FREQ) == 0:
                # print the top k scenarios the "hardest" (ie chosen the most number of times
                if self.verbose:
                    top_k = 10
                    if self._nb_chosen is not None:
                        array_ = np.argsort(self._nb_chosen)[-top_k:][::-1]
                        print("hardest scenarios\n{}".format(array_))
                        print("They have been chosen respectively\n{}".format(self._nb_chosen[array_]))
                        # print("Associated proba are\n{}".format(self._proba[array_]))
                        print("The number of timesteps played is\n{}".format(self._time_step_lived[array_]))
                        print("avg (accross all scenarios) number of timsteps played {}"
                              "".format(np.mean(self._time_step_lived)))
                        print("Time alive: {}".format(self._time_step_lived[array_] / (self._nb_chosen[array_] + 1)))
                        print("Avg time alive: {}".format(np.mean(self._time_step_lived / (self._nb_chosen + 1 ))))

            with self._tf_writer.as_default():
                last_alive = epoch_alive[(epoch_num-1)]
                last_reward = epoch_rewards[(epoch_num-1)]

                mean_reward = np.nanmean(epoch_rewards[:epoch_num])
                mean_alive = np.nanmean(epoch_alive[:epoch_num])

                mean_reward_30 = mean_reward
                mean_alive_30 = mean_alive
                mean_reward_100 = mean_reward
                mean_alive_100 = mean_alive

                tmp = self._actions_per_ksteps > 0
                tmp = tmp.sum(axis=0)
                nb_action_taken_last_kstep = np.sum(tmp > 0)

                nb_illegal_act = np.sum(self._illegal_actions_per_ksteps)
                nb_ambiguous_act = np.sum(self._ambiguous_actions_per_ksteps)

                if epoch_num >= 100:
                    mean_reward_100 = np.nanmean(epoch_rewards[(epoch_num-100):epoch_num])
                    mean_alive_100 = np.nanmean(epoch_alive[(epoch_num-100):epoch_num])

                if epoch_num >= 30:
                    mean_reward_30 = np.nanmean(epoch_rewards[(epoch_num-30):epoch_num])
                    mean_alive_30 = np.nanmean(epoch_alive[(epoch_num-30):epoch_num])

                # to ensure "fair" comparison between single env and multi env
                step_tb = step  # * self.__nb_env
                # if multiply by the number of env we have "trouble" with random exploration at the beginning
                # because it lasts the same number of "real" steps

                # show first the Mean reward and mine time alive (hence the upper case)
                tf.summary.scalar("Mean_alive_30", mean_alive_30, step_tb,
                                  description="Average number of steps (per episode) made over the last 30 "
                                              "completed episodes")
                tf.summary.scalar("Mean_reward_30", mean_reward_30, step_tb,
                                  description="Average (final) reward obtained over the last 30 completed episodes")

                # then it's alpha numerical order, hence the "z_" in front of some information
                tf.summary.scalar("loss", self._losses[step], step_tb,
                                  description="Training loss (for the last training batch)")

                tf.summary.scalar("last_alive", last_alive, step_tb,
                                  description="Final number of steps for the last complete episode")
                tf.summary.scalar("last_reward", last_reward, step_tb,
                                  description="Final reward over the last complete episode")

                tf.summary.scalar("mean_reward", mean_reward, step_tb,
                                  description="Average reward over the whole episodes played")
                tf.summary.scalar("mean_alive", mean_alive, step_tb,
                                  description="Average time alive over the whole episodes played")

                tf.summary.scalar("mean_reward_100", mean_reward_100, step_tb,
                                  description="Average number of steps (per episode) made over the last 100 "
                                              "completed episodes")
                tf.summary.scalar("mean_alive_100", mean_alive_100, step_tb,
                                  description="Average (final) reward obtained over the last 100 completed episodes")

                tf.summary.scalar("nb_different_action_taken", nb_action_taken_last_kstep, step_tb,
                                  description="Number of different actions played the last "
                                              "{} steps".format(self.nb_ * UPDATE_FREQ))
                tf.summary.scalar("nb_illegal_act", nb_illegal_act, step_tb,
                                  description="Number of illegal actions played the last "
                                              "{} steps".format(self.nb_ * UPDATE_FREQ))
                tf.summary.scalar("nb_ambiguous_act", nb_ambiguous_act, step_tb,
                                  description="Number of ambiguous actions played the last "
                                              "{} steps".format(self.nb_ * UPDATE_FREQ))
                tf.summary.scalar("nb_total_success", self._total_sucesses, step_tb,
                                  description="Number of times the episode was completed entirely "
                                              "(no game over)")

                tf.summary.scalar("z_lr", self._train_lr, step_tb,
                                  description="Current learning rate")
                tf.summary.scalar("z_epsilon", self.epsilon, step_tb,
                                  description="Current epsilon (from the epsilon greedy)")
                tf.summary.scalar("z_max_iter", self._max_iter_env_, step_tb,
                                  description="Maximum number of time steps before deciding a scenario "
                                              "is over (=win)")
                tf.summary.scalar("z_total_episode", epoch_num, step_tb,
                                  description="Total number of episode played (number of \"reset\")")

                self.deep_q.save_tensorboard(step_tb)

                if self.store_action:
                    self._store_frequency_action_type(UPDATE_FREQ, step_tb)

                # if self._time_step_lived is not None:
                #     tf.summary.histogram(
                #         "timestep_lived", self._time_step_lived, step=step_tb, buckets=None,
                #         description="Number of time steps lived for all scenarios"
                #     )
                # if self._nb_chosen is not None:
                #     tf.summary.histogram(
                #         "nb_chosen", self._nb_chosen, step=step_tb, buckets=None,
                #         description="Number of times this scenarios has been played"
                #     )

    def _store_frequency_action_type(self, UPDATE_FREQ, step_tb):
        self.current_ += 1
        self.current_ %= self.nb_
        nb_inj, nb_volt, nb_topo, nb_line, nb_redisp, nb_storage, nb_curtail, nb_dn = self._nb_this_time[self.current_, :]
        self._nb_this_time[self.current_, :] = [self.nb_injection,
                                                self.nb_voltage,
                                                self.nb_topology,
                                                self.nb_line,
                                                self.nb_redispatching,
                                                self.nb_storage,
                                                self.nb_curtail,
                                                self.nb_do_nothing]

        curr_inj = self.nb_injection - nb_inj
        curr_volt = self.nb_voltage - nb_volt
        curr_topo = self.nb_topology - nb_topo
        curr_line = self.nb_line - nb_line
        curr_redisp = self.nb_redispatching - nb_redisp
        curr_storage = self.nb_storage - nb_storage
        curr_curtail = self.nb_curtail - nb_curtail
        curr_dn = self.nb_do_nothing - nb_dn

        total_act_num = curr_inj + curr_volt + curr_topo + curr_line + curr_redisp + curr_dn + curr_storage
        tf.summary.scalar("zz_freq_inj",
                          curr_inj / total_act_num,
                          step_tb,
                          description="Frequency of \"injection\" actions "
                                      "type played over the last {} actions"
                                      "".format(self.nb_ * UPDATE_FREQ))
        tf.summary.scalar("zz_freq_voltage",
                          curr_volt / total_act_num,
                          step_tb,
                          description="Frequency of \"voltage\" actions "
                                      "type played over the last {} actions"
                                      "".format(self.nb_ * UPDATE_FREQ))
        tf.summary.scalar("z_freq_topo",
                          curr_topo / total_act_num,
                          step_tb,
                          description="Frequency of \"topo\" actions "
                                      "type played over the last {} actions"
                                      "".format(self.nb_ * UPDATE_FREQ))
        tf.summary.scalar("z_freq_line_status",
                          curr_line / total_act_num,
                          step_tb,
                          description="Frequency of \"line status\" actions "
                                      "type played over the last {} actions"
                                      "".format(self.nb_ * UPDATE_FREQ))
        tf.summary.scalar("z_freq_redisp",
                          curr_redisp / total_act_num,
                          step_tb,
                          description="Frequency of \"redispatching\" actions "
                                      "type played over the last {} actions"
                                      "".format(self.nb_ * UPDATE_FREQ))
        tf.summary.scalar("z_freq_do_nothing",
                          curr_dn / total_act_num,
                          step_tb,
                          description="Frequency of \"do nothing\" actions "
                                      "type played over the last {} actions"
                                      "".format(self.nb_ * UPDATE_FREQ))
        tf.summary.scalar("z_freq_storage",
                          curr_storage / total_act_num,
                          step_tb,
                          description="Frequency of \"storage\" actions "
                                      "type played over the last {} actions"
                                      "".format(self.nb_ * UPDATE_FREQ))
        tf.summary.scalar("z_freq_curtail",
                          curr_curtail / total_act_num,
                          step_tb,
                          description="Frequency of \"curtailment\" actions "
                                      "type played over the last {} actions"
                                      "".format(self.nb_ * UPDATE_FREQ))

In [6]:
class BaseDeepQ(ABC):
    """
    This class aims at representing the Q value (or more in case of SAC) parametrization by
    a neural network.

    .. warning::
        This baseline recodes entire the RL training procedure. You can use it if you
        want to have a deeper look at Deep Q Learning algorithm and a possible (non 
        optimized, slow, etc. implementation ).
        
        For a much better implementation, you can reuse the code of "PPO_RLLIB" 
        or the "PPO_SB3" baseline.
        
        Prefer to use the :class:`GymAgent` class and the :class:`GymEnvWithHeuristics`
        classes to train agent interacting with grid2op and fully compatible
        with gym framework.	
        
    It is composed of 2 different networks:

    - model: which is the main model
    - target_model: which has the same architecture and same initial weights as "model" but is updated less frequently
      to stabilize training

    It has basic methods to make predictions, to train the model, and train the target model.

    This class is abstraction and need to be overide in order to create object from this class. The only pure virtual
    function is :func:`BaseDeepQ.construct_q_network` that creates the neural network from the nn_params
    (:class:`NNParam`) provided as input

    Attributes
    ----------
    _action_size: ``int``
        Total number of actions

    _observation_size: ``int``
        Size of the observation space considered

    _nn_archi: :class:`NNParam`
        The parameters of the neural networks that will be created

    _training_param: :class:`TrainingParam`
        The meta parameters for the training scheme (used especially for learning rate or gradient clipping for example)

    _lr: ``float``
        The  initial learning rate

    _lr_decay_steps: ``float``
        The decay step of the learning rate

    _lr_decay_rate: ``float``
        The rate at which the learning rate will decay

    _model:
        Main neural network model, here a keras Model object.

    _target_model:
        a copy of the main neural network that will be updated less frequently (also known as "target model" in RL
        community)


    """

    def __init__(self,
                 nn_params,
                 training_param=None,
                 verbose=False):
        if not _CAN_USE_TENSORFLOW:
            raise RuntimeError("Cannot import tensorflow, this function cannot be used.")
#         print('<<<<<<<<< Atleast class is getting initialised>>>>>>>')
        self._action_size = nn_params.action_size
        self._observation_size = nn_params.observation_size
        self._nn_archi = nn_params
        self.verbose = verbose

        if training_param is None:
            self._training_param = TrainingParam()
        else:
            self._training_param = training_param

        self._lr = training_param.lr
        self._lr_decay_steps = training_param.lr_decay_steps
        self._lr_decay_rate = training_param.lr_decay_rate

        self._model = None
        self._target_model = None
        self._schedule_model = None
        self._optimizer_model = None
        self._custom_objects = None  # to be able to load other keras layers type

    def make_optimiser(self):
        """
        helper function to create the proper optimizer (Adam) with the learning rates and its decay
        parameters.
        """
#         print('<<<<<<<<< Make optimiser is getting called >>>>>>>')
        schedule = tfko.schedules.InverseTimeDecay(self._lr, self._lr_decay_steps, self._lr_decay_rate)
        return schedule, tfko.Adam(learning_rate=schedule)

    @abstractmethod
    def construct_q_network(self):
        """
         Abstract method that need to be overide.

         It should create :attr:`BaseDeepQ._model` and :attr:`BaseDeepQ._target_model`
        """
        raise NotImplementedError("Not implemented")

    def predict_movement(self, data, epsilon, batch_size=None, training=False):
        """
        Predict movement of game controler where is epsilon probability randomly move.
        """
#         print('<<<<<<<<< Predict Movement is getting called >>>>>>>')
        if batch_size is None:
            batch_size = data.shape[0]

        # q_actions = self._model.predict(data, batch_size=batch_size)  # q value of each action
        q_actions = self._model(data, training=training).numpy()
        opt_policy = np.argmax(q_actions, axis=-1)
        if epsilon > 0.:
#             print('<<<<<<< Randomisation is being used >>>>>>>>')
            rand_val = np.random.random(batch_size)
            print(f'<<<<<<<<<<<<<< opt_policy_rand is {opt_policy[rand_val < epsilon]} and comparison is {rand_val < epsilon}')
            opt_policy[rand_val < epsilon] = np.random.randint(0, self._action_size, size=(np.sum(rand_val < epsilon)))
        return opt_policy, q_actions[np.arange(batch_size), opt_policy], q_actions

    def train(self, s_batch, a_batch, r_batch, d_batch, s2_batch, epsilon_tr, tf_writer=None, batch_size=None):
        """
        Trains network to fit given parameters:
        
        .. seealso::
            https://towardsdatascience.com/dueling-double-deep-q-learning-using-tensorflow-2-x-7bbbcec06a2a
            for the update rules
        
        Parameters
        ----------
        s_batch:
            the state vector (before the action is taken)
        a_batch:
            the action taken
        s2_batch:
            the state vector (after the action is taken)
        d_batch:
            says whether or not the episode was over
        r_batch:
            the reward obtained this step
        """
#         print('<<<<<<<<< Train is getting called >>>>>>>')
        if batch_size is None:
            batch_size = s_batch.shape[0]

        # Save the graph just the first time
        if tf_writer is not None:
            tf.summary.trace_on()
        target = self._model(s_batch, training=True).numpy()
        # this fut_action should come from epsilon policy
        next_a, fut_actions_3, fut_action_2 = self.predict_movement(s2_batch,epsilon=epsilon_tr,training=True)
        fut_action = self._model(s2_batch, training=True).numpy()
        
#         print(f'<<<<<<<<<<<<<<<<<<<<<<Fut_action is {fut_action} with shape as {fut_action.shape}>>>>>>>>>>>>>>>>>>>>>>>>')
#         print(f'<<<<<<<<<<<<<<<<<<<<<<Fut_action_2 is {fut_action_2} with shape as {fut_action_2.shape}>>>>>>>>>>>>>>>>>>>>>>>')
#         print(f'<<<<<<<<<<<<<<<<<<<<<<Fut_action_3 is {fut_actions_3} with shape as {fut_actions_3.shape}>>>>>>>>>>>>>>>>>>>>>>>')
        if tf_writer is not None:
            with tf_writer.as_default():
                tf.summary.trace_export("model-graph", 0)
            tf.summary.trace_off()
        target_next = self._target_model(s2_batch, training=True).numpy()

        idx = np.arange(batch_size)
        target[idx, a_batch] = r_batch
        # update the value for not done episode
        nd_batch = ~d_batch  # update with this rule only batch that did not game over
        next_action = np.argmax(fut_action, axis=-1)  # compute the future action i will take in the next state
        fut_Q = target_next[idx, next_a]  # get its Q value
        fut_Q_new = target_next[idx, next_action]  # get its Q value
        print(f'<<<<< Epsilon in training is {epsilon_tr}>>>>>>>')
        print(f'<<<<< next_action is {next_action}>>>>>>>')
        print(f'<<<<< next_a is {next_a}>>>>>>>')
        print(f'<<<<< fut_q is {fut_Q}>>>>>>>')
        print(f'<<<<< fut_Q_new is {fut_Q_new}>>>>>>>')
        
        target[nd_batch, a_batch[nd_batch]] += self._training_param.discount_factor * fut_Q[nd_batch]
        loss = self.train_on_batch(self._model, self._optimizer_model, s_batch, target)
        return loss

    def train_on_batch(self, model, optimizer_model, x, y_true):
        """train the model on a batch of example. This can be overide"""
#         print('<<<<<<<<< Train On Batch is getting called >>>>>>>')
        loss = model.train_on_batch(x, y_true)
        return loss

    @staticmethod
    def get_path_model(path, name=None):
        """
        Get the location at which the neural networks will be saved.

        Returns
        -------
        path_model: ``str``
            The path at which the model will be saved (path include both path and name, it is the full path at which
            the neural networks are saved)

        path_target_model: ``str``
            The path at which the target model will be saved
        """
#         print('<<<<<<<<< Get Path Model is getting called >>>>>>>')
        if name is None:
            path_model = path
        else:
            path_model = os.path.join(path, name)
        path_target_model = "{}_target".format(path_model)
        return path_model, path_target_model

    def save_network(self, path, name=None, ext="h5"):
        """
        save the neural networks.

        Parameters
        ----------
        path: ``str``
            The path at which the models need to be saved
        name: ``str``
            The name given to this model

        ext: ``str``
            The file extension (by default h5)
        """
        # Saves model at specified path as h5 file
        # nothing has changed
#         print('<<<<<<<<< Save Network is getting called >>>>>>>')
        path_model, path_target_model = self.get_path_model(path, name)
        self._model.save('{}.{}'.format(path_model, ext))
        self._target_model.save('{}.{}'.format(path_target_model, ext))

    def load_network(self, path, name=None, ext="h5"):
        """
        Load the neural networks.
        Parameters
        ----------
        path: ``str``
            The path at which the models need to be saved
        name: ``str``
            The name given to this model

        ext: ``str``
            The file extension (by default h5)
        """
#         print('<<<<<<<<< Load Network is getting called >>>>>>>')
        path_model, path_target_model = self.get_path_model(path, name)
        # fix for issue https://github.com/keras-team/keras/issues/7440
        self.construct_q_network()

        self._model.load_weights('{}.{}'.format(path_model, ext))

        with warnings.catch_warnings():
            warnings.filterwarnings("ignore")
            self._target_model.load_weights('{}.{}'.format(path_target_model, ext))
        if self.verbose:
            print("Succesfully loaded network.")

    def target_train(self, tau=None):
        """
        update the target model with the parameters given in the :attr:`BaseDeepQ._training_param`.
        """
        print('<<<<<<<<< Target Train is getting called >>>>>>>')
        if tau is None:
            tau = self._training_param.tau
        tau_inv = 1.0 - tau

        target_params = self._target_model.trainable_variables
        source_params = self._model.trainable_variables
        for src, dest in zip(source_params, target_params):
            # Polyak averaging
            var_update = src.value() * tau
            var_persist = dest.value() * tau_inv
            dest.assign(var_update + var_persist)

    def save_tensorboard(self, current_step):
        """function used to save other information to tensorboard"""
        pass

In [7]:
class NNParam(object):
    """
    This class provides an easy way to save and restore, as json, the shape of your neural networks
    (number of layers, non linearities, size of each layers etc.)

    It is recommended to overload this class for each specific model.

    .. warning::
        This baseline recodes entire the RL training procedure. You can use it if you
        want to have a deeper look at Deep Q Learning algorithm and a possible (non 
        optimized, slow, etc. implementation ).
        
        For a much better implementation, you can reuse the code of "PPO_RLLIB" 
        or the "PPO_SB3" baseline.
        
        Prefer to use the :class:`GymAgent` class and the :class:`GymEnvWithHeuristics`
        classes to train agent interacting with grid2op and fully compatible
        with gym framework.	
        
    Attributes
    ----------

    nn_class: :class:`l2rpn_baselines.BaseDeepQ`
        The neural network class that will be created with each call of :func:`l2rpn_baselines.make_nn`

    observation_size: ``int``
        The size of the observation space.

    action_size: ``int``
        The size of the action space.

    sizes: ``list``
        A list of integer, each will represent the number of hidden units. The number of hidden layer is given by
        the size / length of this list.

    activs: ``list``
        List of activation functions (given as string). It should have the same length as the :attr:`NNParam.sizes`.
        This function should be name of keras activation function.

    list_attr_obs: ``list``
        List of the attributes that will be used from the observation and concatenated to be fed to the neural network.

    """

    _int_attr = ["action_size", "observation_size"]
    _float_attr = []
    _str_attr = []
    _list_float = []
    _list_str = ["activs", "list_attr_obs"]
    _list_int = ["sizes"]
    nn_class = BaseDeepQ

    def __init__(self,
                 action_size,
                 observation_size,
                 sizes,
                 activs,
                 list_attr_obs,
                 ):
        self.observation_size = observation_size
        self.action_size = action_size
        self.sizes = [int(el) for el in sizes]
        self.activs = [str(el) for el in activs]
        if len(self.sizes) != len(self.activs):
            raise RuntimeError("\"sizes\" and \"activs\" lists have not the same size. It's not clear how many layers "
                               "you want your neural network to have.")
        self.list_attr_obs = [str(el) for el in list_attr_obs]

    @classmethod
    def get_path_model(cls, path, name=None):
        """get the path at which the model will be saved"""
        return cls.nn_class.get_path_model(path, name=name)

    def make_nn(self, training_param):
        """build the appropriate BaseDeepQ"""
        res = self.nn_class(self, training_param)
        return res

    @staticmethod
    def get_obs_size(env, list_attr_name):
        """get the size of the flatten observation"""
        res = 0
        for obs_attr_name in list_attr_name:
            beg_, end_, dtype_ = env.observation_space.get_indx_extract(obs_attr_name)
            res += end_ - beg_  # no "+1" needed because "end_" is exclude by python convention
        return res

    def get_obs_attr(self):
        """get the names of the observation attributes that will be extracted """
        return self.list_attr_obs

    # utilitaries, do not change
    def to_dict(self):
        """convert this instance to a dictionnary"""
        # TODO copy and paste from TrainingParam
        res = {}
        for attr_nm in self._int_attr:
            tmp = getattr(self, attr_nm)
            if tmp is not None:
                res[attr_nm] = int(tmp)
            else:
                res[attr_nm] = None
        for attr_nm in self._float_attr:
            tmp = getattr(self, attr_nm)
            if tmp is not None:
                res[attr_nm] = float(tmp)
            else:
                res[attr_nm] = None
        for attr_nm in self._str_attr:
            tmp = getattr(self, attr_nm)
            if tmp is not None:
                res[attr_nm] = str(tmp)
            else:
                res[attr_nm] = None

        for attr_nm in self._list_float:
            tmp = getattr(self, attr_nm)
            res[attr_nm] = self._convert_list_to_json(tmp, float)
        for attr_nm in self._list_int:
            tmp = getattr(self, attr_nm)
            res[attr_nm] = self._convert_list_to_json(tmp, int)
        for attr_nm in self._list_str:
            tmp = getattr(self, attr_nm)
            res[attr_nm] = self._convert_list_to_json(tmp, str)
        return res

    @classmethod
    def _convert_list_to_json(cls, obj, type_):
        if isinstance(obj, type_):
            res = obj
        elif isinstance(obj, np.ndarray):
            if len(obj.shape) == 1:
                res = [type_(el) for el in obj]
            else:
                res = [cls._convert_list_to_json(el, type_) for el in obj]
        elif isinstance(obj, Iterable):
            res = [cls._convert_list_to_json(el, type_) for el in obj]
        else:
            res = type_(obj)
        return res

    @classmethod
    def _attr_from_json(cls, json, type_):
        if isinstance(json, type_):
            res = json
        elif isinstance(json, list):
            res = [cls._convert_list_to_json(obj=el, type_=type_) for el in json]
        else:
            res = type_(json)
        return res

    @classmethod
    def from_dict(cls, tmp):
        """load from a dictionnary"""
        # TODO copy and paste from TrainingParam (more or less)
        cls_as_dict = {}
        for attr_nm in cls._int_attr:
            if attr_nm in tmp:
                tmp_ = tmp[attr_nm]
                if tmp_ is not None:
                    cls_as_dict[attr_nm] = int(tmp_)
                else:
                    cls_as_dict[attr_nm] = None

        for attr_nm in cls._float_attr:
            if attr_nm in tmp:
                tmp_ = tmp[attr_nm]
                if tmp_ is not None:
                    cls_as_dict[attr_nm] = float(tmp_)
                else:
                    cls_as_dict[attr_nm] = None

        for attr_nm in cls._str_attr:
            if attr_nm in tmp:
                tmp_ = tmp[attr_nm]
                if tmp_ is not None:
                    cls_as_dict[attr_nm] = str(tmp_)
                else:
                    cls_as_dict[attr_nm] = None

        for attr_nm in cls._list_float:
            if attr_nm in tmp:
                cls_as_dict[attr_nm] = cls._attr_from_json(tmp[attr_nm], float)
        for attr_nm in cls._list_int:
            if attr_nm in tmp:
                cls_as_dict[attr_nm] = cls._attr_from_json(tmp[attr_nm], int)
        for attr_nm in cls._list_str:
            if attr_nm in tmp:
                cls_as_dict[attr_nm] = cls._attr_from_json(tmp[attr_nm], str)

        res = cls(**cls_as_dict)
        return res

    @classmethod
    def from_json(cls, json_path):
        """load from a json file"""
        # TODO copy and paste from TrainingParam
        if not os.path.exists(json_path):
            raise FileNotFoundError("No path are located at \"{}\"".format(json_path))
        with open(json_path, "r") as f:
            dict_ = json.load(f)
        return cls.from_dict(dict_)

    def save_as_json(self, path, name=None):
        """save as a json file"""
        # TODO copy and paste from TrainingParam
        res = self.to_dict()
        if name is None:
            name = "neural_net_parameters.json"
        if not os.path.exists(path):
            raise RuntimeError("Directory \"{}\" not found to save the NN parameters".format(path))
        if not os.path.isdir(path):
            raise NotADirectoryError("\"{}\" should be a directory".format(path))
        path_out = os.path.join(path, name)
        with open(path_out, "w", encoding="utf-8") as f:
            json.dump(res, fp=f, indent=4, sort_keys=True)

    def center_reduce(self, env):
        """currently not implemented for this class, "coming soon" as we might say"""
        # TODO see TestLeapNet for this feature
        self._center_reduce_vect(env.get_obs(), "x")

    def _get_adds_mults_from_name(self, obs, attr_nm):
        if attr_nm in ["prod_p"]:
            add_tmp = np.array([-0.5 * (pmax + pmin) for pmin, pmax in zip(obs.gen_pmin, obs.gen_pmax)])
            mult_tmp = np.array([1. / max((pmax - pmin), 0.) for pmin, pmax in zip(obs.gen_pmin, obs.gen_pmax)])
        elif attr_nm in ["prod_q"]:
            add_tmp = 0.
            mult_tmp = np.array([1. / max(abs(val), 1.0) for val in obs.prod_q])
        elif attr_nm in ["load_p", "load_q"]:
            add_tmp = np.array([-val for val in getattr(obs, attr_nm)])
            mult_tmp = 0.5
        elif attr_nm in ["load_v", "prod_v", "v_or", "v_ex"]:
            add_tmp = 0.
            mult_tmp = np.array([1. / val for val in getattr(obs, attr_nm)])
        elif attr_nm == "hour_of_day":
            add_tmp = -12.
            mult_tmp = 1.0 / 12
        elif attr_nm == "minute_of_hour":
            add_tmp = -30.
            mult_tmp = 1.0 / 30
        elif attr_nm == "day_of_week":
            add_tmp = -4.
            mult_tmp = 1.0 / 4
        elif attr_nm == "day":
            add_tmp = -15.
            mult_tmp = 1.0 / 15.
        elif attr_nm in ["target_dispatch", "actual_dispatch"]:
            add_tmp = 0.
            mult_tmp = np.array([1. / (pmax - pmin) for pmin, pmax in zip(obs.gen_pmin, obs.gen_pmax)])
        elif attr_nm in ["a_or", "a_ex", "p_or", "p_ex", "q_or", "q_ex"]:
            add_tmp = 0.
            mult_tmp = np.array([1.0 / max(val, 1.0) for val in getattr(obs, attr_nm)])
        else:
            add_tmp = 0.
            mult_tmp = 1.0
        return add_tmp, mult_tmp

    def _center_reduce_vect(self, obs, nn_part):
        """
        compute the xxxx_adds and xxxx_mults for one part of the neural network called nn_part,
        depending on what attribute of the observation is extracted
        """
        if not isinstance(obs, grid2op.Observation.BaseObservation):
            # in multi processing i receive a set of observation there so i might need
            # to extract only the first one
            obs = obs[0]

        li_attr_obs = getattr(self, "list_attr_obs_{}".format(nn_part))
        adds = []
        mults = []
        for attr_nm in li_attr_obs:
            add_tmp, mult_tmp = self._get_adds_mults_from_name(obs, attr_nm)
            mults.append(mult_tmp)
            adds.append(add_tmp)
        setattr(self, "{}_adds".format(nn_part), adds)
        setattr(self, "{}_mults".format(nn_part), mults)

In [8]:
class TrainingParam(object):
    """
    A class to store the training parameters of the models. It was hard coded in the getting_started/notebook 3
    of grid2op and put in this repository instead.

    .. warning::
        This baseline recodes entire the RL training procedure. You can use it if you
        want to have a deeper look at Deep Q Learning algorithm and a possible (non 
        optimized, slow, etc. implementation ).
        
        For a much better implementation, you can reuse the code of "PPO_RLLIB" 
        or the "PPO_SB3" baseline.
        
        Prefer to use the :class:`GymAgent` class and the :class:`GymEnvWithHeuristics`
        classes to train agent interacting with grid2op and fully compatible
        with gym framework.	
        
    Attributes
    ----------
    buffer_size: ``int``
        Size of the replay buffer

    minibatch_size: ``int``
        Size of the training minibatch
    update_freq: ``int``
        Frequency at which the model is trained. Model is trained once every `update_freq` steps using `minibatch_size`
        from an experience replay buffer.

    final_epsilon: ``float``
        value for the final epsilon (for the e-greedy)
    initial_epsilon: ``float``
        value for the initial epsilon (for the e-greedy)
    step_for_final_epsilon: ``int``
        number of step at which the final epsilon (for the epsilon greedy exploration) will be reached

    min_observation: ``int``
        number of observations before starting to train the neural nets. Before this number of iterations, the agent
        will simply interact with the environment.

    lr: ``float``
        The initial learning rate

    lr_decay_steps: ``int``
        The learning rate decay step

    lr_decay_rate: ``float``
        The learning rate decay rate

    num_frames: ``int``
        Currently not used

    discount_factor: ``float``
        The discount factor (a high discount factor is in favor of longer episode, a small one not really). This is
        often called "gamma" in some RL paper. It's the gamma in: "RL wants to minize the sum of the dicounted reward,
        which are sum_{t >= t_0} \gamma^{t - t_0} r_t

    tau: ``float``
        Update the target model. Target model is updated according to
        $target_model_weights[i] = self.training_param.tau * model_weights[i] + (1 - self.training_param.tau) * \
                                              target_model_weights[i]$

    min_iter: ``int``
        It is possible in the training schedule to limit the number of time steps an episode can last. This is mainly
        useful at beginning of training, to not get in a state where the grid has been modified so much the agent
        will never get into a state resembling this one ever again). Stopping the episode before this happens can
        help the learning.

    max_iter: ``int``
        Just like "min_iter" but instead of being the minimum number of iteration, it's the maximum.

    update_nb_iter: ``int``
        If max_iter_fun is the default one, this numer give the number of time we need to succeed a scenario before
        having to increase the maximum number of timestep allowed

    step_increase_nb_iter: ``int`` or  ``None``
        Of how many timestep we increase the maximum number of timesteps allowed per episode. Set it to O to deactivate
        this.

    max_iter_fun: ``function``
        A function that return the maximum number of steps an episode can count as for the current epoch. For example
        it can be `max_iter_fun = lambda epoch_num : np.sqrt(50 * epoch_num)`
        [default lambda x: x / self.update_nb_iter]

    oversampling_rate: ``float`` or ``None``
        Set it to None to deactivate the oversampling of hard scenarios. Otherwise, this oversampling is done
        with something like `proba = 1. / (time_step_lived**oversampling_rate + 1)` where `proba` is the probability
        to be selected at the next call to "reset" and `time_step_lived` is the number of time steps

    random_sample_datetime_start: ``int`` or ``None``
        If ``None`` during training the chronics will always start at the datetime the chronics start.
        Otherwise, the training scheme will skip a number of time steps between 0 and  `random_sample_datetime_start`
        when loading the next chronics. This is particularly useful when you want your agent to learn to operate
        the grid regardless of the hour of day or day of the week.

    update_tensorboard_freq: ``int``
        Frequency at which tensorboard is refresh (tensorboard summaries are saved every update_tensorboard_freq
        steps)

    save_model_each: ``int``
        Frequency at which the model is saved (it is saved every "save_model_each" steps)

    max_global_norm_grad: ``float``
        Maximum gradient norm allowed (can make the training more stable) default to None if deactivated.
        Not all baselines are compatible.

    max_value_grad: ``float``
        Maximum value the gradient can take. Assign it to ``None`` to deactivate it. This can make the training
        more stable in some cases, but can slow down the training process too. Not all baselines are compatible.

    max_loss: ``float``
        Clip the value of the loss function. Set it to ``None`` to deactivate it. Again, this can make the training
        more stable but possibly slower. Not all baselines are compatible.
    """
    _tol_float_equal = float(1e-8)

    _int_attr = ["buffer_size", "minibatch_size", "step_for_final_epsilon",
                 "min_observation", "last_step", "num_frames", "update_freq",
                 "min_iter", "max_iter", "update_tensorboard_freq", "save_model_each", "_update_nb_iter",
                 "step_increase_nb_iter", "min_observe", "sample_one_random_action_begin"]
    _float_attr = ["_final_epsilon", "_initial_epsilon", "lr", "lr_decay_steps", "lr_decay_rate",
                   "discount_factor", "tau", "oversampling_rate",
                   "max_global_norm_grad", "max_value_grad", "max_loss"]

    def __init__(self,
                 buffer_size=40000,
                 minibatch_size=64,
                 step_for_final_epsilon=100000,  # step at which min_espilon is obtain
                 min_observation=5000,  # 5000
                 final_epsilon=1./(7*288.),  # have on average 1 random action per week of approx 7*288 time steps
                 initial_epsilon=0.4,
                 lr=1e-4,
                 lr_decay_steps=10000,
                 lr_decay_rate=0.999,
                 num_frames=1,
                 discount_factor=0.99,
                 tau=0.01,
                 update_freq=256,
                 min_iter=50,
                 max_iter=8064,  # 1 month
                 update_nb_iter=10,
                 step_increase_nb_iter=0,  # by default no oversampling / under sampling based on difficulty
                 update_tensorboard_freq=1000,  # update tensorboard every "update_tensorboard_freq" steps
                 save_model_each=10000,  # save the model every "update_tensorboard_freq" steps
                 random_sample_datetime_start=None,
                 oversampling_rate=None,
                 max_global_norm_grad=None,
                 max_value_grad=None,
                 max_loss=None,

                 # observer: let the neural network "observe" for a given amount of time
                 # all actions are replaced by a do nothing
                 min_observe=None,

                 # i do a random action at the beginning of an episode until a certain number of step
                 # is made
                 # it's recommended to have "min_observe" to be larger that this (this is an int)
                 sample_one_random_action_begin=None,
                 ):

        self.random_sample_datetime_start = random_sample_datetime_start

        self.buffer_size = int(buffer_size)
        self.minibatch_size = int(minibatch_size)
        self.min_observation = int(min_observation)
        self._final_epsilon = float(final_epsilon)  # have on average 1 random action per day of approx 288 timesteps at the end (never kill completely the exploration)
        self._initial_epsilon = float(initial_epsilon)
        self.step_for_final_epsilon = float(step_for_final_epsilon)
        self.lr = float(lr)
        self.lr_decay_steps = float(lr_decay_steps)
        self.lr_decay_rate = float(lr_decay_rate)

        # gradient clipping (if supported)
        self.max_global_norm_grad = max_global_norm_grad
        self.max_value_grad = max_value_grad
        self.max_loss = max_loss

        # observer
        self.min_observe = min_observe
        self.sample_one_random_action_begin = sample_one_random_action_begin

        self.last_step = int(0)
        self.num_frames = int(num_frames)
        self.discount_factor = float(discount_factor)
        self.tau = float(tau)
        self.update_freq = int(update_freq)
        self.min_iter = int(min_iter)
        self.max_iter = int(max_iter)
        self._1_update_nb_iter = None
        self._update_nb_iter = int(update_nb_iter)
        if step_increase_nb_iter is None:
            # 0 and None have the same effect: it disable the feature
            step_increase_nb_iter = 0
        self.step_increase_nb_iter = step_increase_nb_iter

        if oversampling_rate is not None:
            self.oversampling_rate = float(oversampling_rate)
        else:
            self.oversampling_rate = None

        self.update_tensorboard_freq = update_tensorboard_freq
        self.save_model_each = save_model_each
        self.max_iter_fun = self.default_max_iter_fun
        self._compute_exp_facto()

    @property
    def final_epsilon(self):
        return self._final_epsilon

    @final_epsilon.setter
    def final_epsilon(self, final_epsilon):
        self._final_epsilon = final_epsilon
        self._compute_exp_facto()

    @property
    def initial_epsilon(self):
        return self._initial_epsilon

    @initial_epsilon.setter
    def initial_epsilon(self, initial_epsilon):
        self._initial_epsilon = initial_epsilon
        self._compute_exp_facto()

    @property
    def update_nb_iter(self):
        return self._update_nb_iter

    @update_nb_iter.setter
    def update_nb_iter(self, update_nb_iter):
        self._update_nb_iter = update_nb_iter
        if self._update_nb_iter is not None and self._update_nb_iter > 0:
            self._1_update_nb_iter = 1.0 / self._update_nb_iter
        else:
            self._1_update_nb_iter = 1.0

    def _compute_exp_facto(self):
        if self.final_epsilon is not None and self.initial_epsilon is not None and self.final_epsilon > 0:
            self._exp_facto = np.log(self.initial_epsilon/self.final_epsilon)
        else:
            # TODO
            self._exp_facto = 1

    def default_max_iter_fun(self, nb_success):
        """the default max iteration function used"""
        return self.step_increase_nb_iter * int(nb_success * self._1_update_nb_iter)

    def tell_step(self, current_step):
        """tell this instance the number of training steps that have been made"""
        self.last_step = current_step

    def get_next_epsilon(self, current_step):
        """get the next epsilon for the e greedy exploration"""
        self.tell_step(current_step)
        if self.step_for_final_epsilon is None or self.initial_epsilon is None \
                or self._exp_facto is None or self.final_epsilon is None:
            res = 0.
        else:
            if current_step > self.step_for_final_epsilon:
                res = self.final_epsilon
            else:
                # exponential decrease
                res = self.initial_epsilon * np.exp(- (current_step / self.step_for_final_epsilon) * self._exp_facto )
        return res

    def to_dict(self):
        """serialize this instance to a dictionnary."""
        res = {}
        for attr_nm in self._int_attr:
            tmp = getattr(self, attr_nm)
            if tmp is not None:
                res[attr_nm] = int(tmp)
            else:
                res[attr_nm] = None
        for attr_nm in self._float_attr:
            tmp = getattr(self, attr_nm)
            if tmp is not None:
                res[attr_nm] = float(tmp)
            else:
                res[attr_nm] = None
        return res

    @staticmethod
    def from_dict(tmp):
        """initialize this instance from a dictionary"""
        if not isinstance(tmp, dict):
            raise RuntimeError("TrainingParam from dict must be called with a dictionary, and not {}".format(tmp))
        res = TrainingParam()
        for attr_nm in TrainingParam._int_attr:
            if attr_nm in tmp:
                tmp_ = tmp[attr_nm]
                if tmp_ is not None:
                    setattr(res, attr_nm, int(tmp_))
                else:
                    setattr(res, attr_nm, None)

        for attr_nm in TrainingParam._float_attr:
            if attr_nm in tmp:
                tmp_ = tmp[attr_nm]
                if tmp_ is not None:
                    setattr(res, attr_nm, float(tmp_))
                else:
                    setattr(res, attr_nm, None)
        res.update_nb_iter = res._update_nb_iter
        res.initial_epsilon = res._initial_epsilon
        res._compute_exp_facto()
        return res

    @staticmethod
    def from_json(json_path):
        """initialize this instance from a json"""
        if not os.path.exists(json_path):
            raise FileNotFoundError("No path are located at \"{}\"".format(json_path))
        with open(json_path, "r") as f:
            dict_ = json.load(f)
        return TrainingParam.from_dict(dict_)

    def save_as_json(self, path, name=None):
        """save this instance as a json"""
        res = self.to_dict()
        if name is None:
            name = "training_parameters.json"
        if not os.path.exists(path):
            raise RuntimeError("Directory \"{}\" not found to save the training parameters".format(path))
        if not os.path.isdir(path):
            raise NotADirectoryError("\"{}\" should be a directory".format(path))
        path_out = os.path.join(path, name)
        with open(path_out, "w", encoding="utf-8") as f:
            json.dump(res, fp=f, indent=4, sort_keys=True)

    def do_train(self):
        """return whether or not i should train the model at this time step"""
        return self.last_step % self.update_freq == 0

    def __eq__(self, other):
        res = True
        for el in self._int_attr:
            me_ = getattr(self, el)
            oth_ = getattr(other, el)
            if me_ is None and oth_ is not None:
                res = False
                break
            if oth_ is None and me_ is not None:
                res = False
                break
            if me_ is None and oth_ is None:
                continue
            if int(me_) != int(oth_):
                res = False
                break
        if res:
            for el in self._float_attr:
                me_ = getattr(self, el)
                oth_ = getattr(other, el)
                if me_ is None and oth_ is not None:
                    res = False
                    break
                if oth_ is None and me_ is not None:
                    res = False
                    break
                if me_ is None and oth_ is None:
                    continue
                if abs(float(me_) - float(oth_)) > self._tol_float_equal:
                    res = False
                    break
        return res

In [9]:
class DeepQ_NN(BaseDeepQ):
    """
    Constructs the desired deep q learning network
    
    .. warning::
        This baseline recodes entire the RL training procedure. You can use it if you
        want to have a deeper look at Deep Q Learning algorithm and a possible (non 
        optimized, slow, etc. implementation ).
        
        For a much better implementation, you can reuse the code of "PPO_RLLIB" 
        or the "PPO_SB3" baseline.
        
    Attributes
    ----------
    schedule_lr_model:
        The schedule for the learning rate.
    """

    def __init__(self,
                 nn_params,
                 training_param=None):
        if not _CAN_USE_TENSORFLOW:
            raise RuntimeError("Cannot import tensorflow, this function cannot be used.")
        
        if training_param is None:
            training_param = TrainingParam()
        BaseDeepQ.__init__(self,
                           nn_params,
                           training_param)
        self.schedule_lr_model = None
        self.construct_q_network()

    def construct_q_network(self):
        """
        This function will make 2 identical models, one will serve as a target model, the other one will be trained
        regurlarly.
        """
        self._model = Sequential()
        input_layer = Input(shape=(self._nn_archi.observation_size,),
                            name="state")
        lay = input_layer
        for lay_num, (size, act) in enumerate(zip(self._nn_archi.sizes, self._nn_archi.activs)):
            lay = Dense(size, name="layer_{}".format(lay_num))(lay)  # put at self.action_size
            lay = Activation(act)(lay)

        output = Dense(self._action_size, name="output")(lay)

        self._model = Model(inputs=[input_layer], outputs=[output])
        self._schedule_lr_model, self._optimizer_model = self.make_optimiser()
        self._model.compile(loss='mse', optimizer=self._optimizer_model)

        self._target_model = Model(inputs=[input_layer], outputs=[output])
        self._target_model.set_weights(self._model.get_weights())

In [10]:
class DeepQ_NNParam(NNParam):
    """
    This defined the specific parameters for the DeepQ network. 
    
    Nothing really different compared to the base class
    except that :attr:`l2rpn_baselines.utils.NNParam.nn_class` (nn_class) is :class:`deepQ_NN.DeepQ_NN`
    .. warning::
        This baseline recodes entire the RL training procedure. You can use it if you
        want to have a deeper look at Deep Q Learning algorithm and a possible (non 
        optimized, slow, etc. implementation ).
        
        For a much better implementation, you can reuse the code of "PPO_RLLIB" 
        or the "PPO_SB3" baseline.
    
    """
    _int_attr = copy.deepcopy(NNParam._int_attr)
    _float_attr = copy.deepcopy(NNParam._float_attr)
    _str_attr = copy.deepcopy(NNParam._str_attr)
    _list_float = copy.deepcopy(NNParam._list_float)
    _list_str = copy.deepcopy(NNParam._list_str)
    _list_int = copy.deepcopy(NNParam._list_int)

    nn_class = DeepQ_NN

    def __init__(self,
                 action_size,
                 observation_size,  # TODO this might not be usefull
                 sizes,
                 activs,
                 list_attr_obs
                 ):
        NNParam.__init__(self,
                         action_size,
                         observation_size,  # TODO this might not be usefull
                         sizes,
                         activs,
                         list_attr_obs
                         )

In [11]:
class DeepQSimple(DeepQAgent):
    """
    A simple deep q learning algorithm. It does nothing different thant its base class.
    
    .. warning::
        This baseline recodes entire the RL training procedure. You can use it if you
        want to have a deeper look at Deep Q Learning algorithm and a possible (non 
        optimized, slow, etc. implementation ).
        
        For a much better implementation, you can reuse the code of "PPO_RLLIB" 
        or the "PPO_SB3" baseline.
        
    """
    pass


In [12]:
def train(env,
          name=DEFAULT_NAME,
          iterations=1,
          save_path=None,
          load_path=None,
          logs_dir=None,
          training_param=None,
          filter_action_fun=None,
          kwargs_converters={},
          kwargs_archi={},
          verbose=True):
    """
    This function implements the "training" part of the balines "DeepQSimple".

    .. warning::
        This baseline recodes entire the RL training procedure. You can use it if you
        want to have a deeper look at Deep Q Learning algorithm and a possible (non 
        optimized, slow, etc. implementation ).
        
        For a much better implementation, you can reuse the code of "PPO_RLLIB" 
        or the "PPO_SB3" baseline.
        
    Parameters
    ----------
    env: :class:`grid2op.Environment`
        Then environment on which you need to train your agent.

    name: ``str```
        The name of your agent.

    iterations: ``int``
        For how many iterations (steps) do you want to train your agent. NB these are not episode, these are steps.

    save_path: ``str``
        Where do you want to save your baseline.

    load_path: ``str``
        If you want to reload your baseline, specify the path where it is located. **NB** if a baseline is reloaded
        some of the argument provided to this function will not be used.

    logs_dir: ``str``
        Where to store the tensorboard generated logs during the training. ``None`` if you don't want to log them.

    training_param: :class:`l2rpn_baselines.utils.TrainingParam`
        The parameters describing the way you will train your model.

    filter_action_fun: ``function``
        A function to filter the action space. See
        `IdToAct.filter_action <https://grid2op.readthedocs.io/en/latest/converter.html#grid2op.Converter.IdToAct.filter_action>`_
        documentation.

    verbose: ``bool``
        If you want something to be printed on the terminal (a better logging strategy will be put at some point)

    kwargs_converters: ``dict``
        A dictionary containing the key-word arguments pass at this initialization of the
        :class:`grid2op.Converter.IdToAct` that serves as "Base" for the Agent.

    kwargs_archi: ``dict``
        Key word arguments used for making the :class:`DeepQ_NNParam` object that will be used to build the baseline.

    Returns
    -------

    baseline: :class:`DeepQSimple`
        The trained baseline.


    .. _Example-deepqsimple:

    Examples
    ---------

    Here is an example on how to train a DeepQSimple baseline.

    First define a python script, for example

    .. code-block:: python

        import grid2op
        from grid2op.Reward import L2RPNReward
        from l2rpn_baselines.utils import TrainingParam, NNParam
        from l2rpn_baselines.DeepQSimple import train

        # define the environment
        env = grid2op.make("l2rpn_case14_sandbox",
                           reward_class=L2RPNReward)

        # use the default training parameters
        tp = TrainingParam()

        # this will be the list of what part of the observation I want to keep
        # more information on https://grid2op.readthedocs.io/en/latest/observation.html#main-observation-attributes
        li_attr_obs_X = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q",
                         "actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line",
                         "time_before_cooldown_sub", "rho", "timestep_overflow", "line_status"]

        # neural network architecture
        observation_size = NNParam.get_obs_size(env, li_attr_obs_X)
        sizes = [800, 800, 800, 494, 494, 494]  # sizes of each hidden layers
        kwargs_archi = {'observation_size': observation_size,
                        'sizes': sizes,
                        'activs': ["relu" for _ in sizes],  # all relu activation function
                        "list_attr_obs": li_attr_obs_X}

        # select some part of the action
        # more information at https://grid2op.readthedocs.io/en/latest/converter.html#grid2op.Converter.IdToAct.init_converter
        kwargs_converters = {"all_actions": None,
                             "set_line_status": False,
                             "change_bus_vect": True,
                             "set_topo_vect": False
                             }
        # define the name of the model
        nm_ = "AnneOnymous"
        try:
            train(env,
                  name=nm_,
                  iterations=10000,
                  save_path="/WHERE/I/SAVED/THE/MODEL",
                  load_path=None,
                  logs_dir="/WHERE/I/SAVED/THE/LOGS",
                  training_param=tp,
                  kwargs_converters=kwargs_converters,
                  kwargs_archi=kwargs_archi)
        finally:
            env.close()

    """
    import tensorflow as tf  # lazy import to save import time
    # Limit gpu usage
    try:
        physical_devices = tf.config.list_physical_devices('GPU')
        if len(physical_devices) > 0:
            tf.config.experimental.set_memory_growth(physical_devices[0], True)
    except AttributeError:
         # issue of https://stackoverflow.com/questions/59266150/attributeerror-module-tensorflow-core-api-v2-config-has-no-attribute-list-p
        try:
            physical_devices = tf.config.experimental.list_physical_devices('GPU')
            if len(physical_devices) > 0:
                tf.config.experimental.set_memory_growth(physical_devices[0], True)
        except Exception:
            warnings.warn(_WARN_GPU_MEMORY)
    except Exception:
        warnings.warn(_WARN_GPU_MEMORY)

    if training_param is None:
        training_param = TrainingParam()

    # compute the proper size for the converter
    kwargs_archi["action_size"] = DeepQSimple.get_action_size(env.action_space, filter_action_fun, kwargs_converters)

    if load_path is not None:
        path_model, path_target_model = DeepQ_NN.get_path_model(load_path, name)
        if verbose:
            print("INFO: Reloading a model, the architecture parameters will be ignored")
        nn_archi = DeepQ_NNParam.from_json(os.path.join(path_model, "nn_architecture.json"))
    else:
        nn_archi = DeepQ_NNParam(**kwargs_archi)

    baseline = DeepQSimple(action_space=env.action_space,
                           nn_archi=nn_archi,
                           name=name,
                           istraining=True,
                           verbose=verbose,
                           filter_action_fun=filter_action_fun,
                            **kwargs_converters
                            )

    if load_path is not None:
        if verbose:
            print("INFO: Reloading a model, training parameters will be ignored")
        baseline.load(load_path)
        training_param = baseline._training_param

    baseline.train(env,
                   iterations,
                   save_path=save_path,
                   logdir=logs_dir,
                   training_param=training_param)
    # as in our example (and in our explanation) we recommend to save the mode regurlarly in the "train" function
    # it is not necessary to save it again here. But if you chose not to follow these advice, it is more than
    # recommended to save the "baseline" at the end of this function with:
    # baseline.save(path_save)
    return baseline

In [13]:
def evaluate(env,
             name=DEFAULT_NAME,
             load_path=None,
             logs_path=DEFAULT_LOGS_DIR,
             nb_episode=DEFAULT_NB_EPISODE,
             nb_process=DEFAULT_NB_PROCESS,
             max_steps=DEFAULT_MAX_STEPS,
             verbose=False,
             save_gif=False,
             filter_action_fun=None):
    """
    How to evaluate the performances of the trained :class:`DeepQSimple` agent.

    .. warning::
        This baseline recodes entire the RL training procedure. You can use it if you
        want to have a deeper look at Deep Q Learning algorithm and a possible (non 
        optimized, slow, etc. implementation ).
        
        For a much better implementation, you can reuse the code of "PPO_RLLIB" 
        or the "PPO_SB3" baseline.

    Parameters
    ----------
    env: :class:`grid2op.Environment`
        The environment on which you evaluate your agent.

    name: ``str``
        The name of the trained baseline

    load_path: ``str``
        Path where the agent has been stored

    logs_path: ``str``
        Where to write the results of the assessment

    nb_episode: ``str``
        How many episodes to run during the assessment of the performances

    nb_process: ``int``
        On how many process the assessment will be made. (setting this > 1 can lead to some speed ups but can be
        unstable on some plaform)

    max_steps: ``int``
        How many steps at maximum your agent will be assessed

    verbose: ``bool``
        Currently un used

    save_gif: ``bool``
        Whether or not you want to save, as a gif, the performance of your agent. It might cause memory issues (might
        take a lot of ram) and drastically increase computation time.

    Returns
    -------
    agent: :class:`l2rpn_baselines.utils.DeepQAgent`
        The loaded agent that has been evaluated thanks to the runner.

    res: ``list``
        The results of the Runner on which the agent was tested.

    Examples
    -------
    You can evaluate a DeepQSimple this way:

    .. code-block:: python

        from grid2op.Reward import L2RPNSandBoxScore, L2RPNReward
        from l2rpn_baselines.DeepQSimple import eval

        # Create dataset env
        env = make("l2rpn_case14_sandbox",
                   reward_class=L2RPNSandBoxScore,
                   other_rewards={
                       "reward": L2RPNReward
                   })

        # Call evaluation interface
        evaluate(env,
                 name="MyAwesomeAgent",
                 load_path="/WHERE/I/SAVED/THE/MODEL",
                 logs_path=None,
                 nb_episode=10,
                 nb_process=1,
                 max_steps=-1,
                 verbose=False,
                 save_gif=False)


    """

    import tensorflow as tf  # lazy import to save import time
    # Limit gpu usage
    physical_devices = tf.config.list_physical_devices('GPU')
    if len(physical_devices):
        tf.config.experimental.set_memory_growth(physical_devices[0], True)

    runner_params = env.get_params_for_runner()
    runner_params["verbose"] = verbose

    if load_path is None:
        raise RuntimeError("Cannot evaluate a model if there is nothing to be loaded.")
    path_model, path_target_model = DeepQ_NN.get_path_model(load_path, name)
    nn_archi = DeepQ_NNParam.from_json(os.path.join(path_model, "nn_architecture.json"))

    # Run
    # Create agent
    agent = DeepQSimple(action_space=env.action_space,
                        name=name,
                        store_action=nb_process == 1,
                        nn_archi=nn_archi,
                        observation_space=env.observation_space,
                        filter_action_fun=filter_action_fun)

    # Load weights from file
    agent.load(load_path)

    # Build runner
    runner = Runner(**runner_params,
                    agentClass=None,
                    agentInstance=agent)

    # Print model summary
    stringlist = []
    agent.deep_q._model.summary(print_fn=lambda x: stringlist.append(x))
    short_model_summary = "\n".join(stringlist)
    if verbose:
        print(short_model_summary)

    # Run
    os.makedirs(logs_path, exist_ok=True)
    res = runner.run(path_save=logs_path,
                     nb_episode=nb_episode,
                     nb_process=nb_process,
                     max_iter=max_steps,
                     pbar=verbose)

    # Print summary
    if verbose:
        print("Evaluation summary:")
        for _, chron_name, cum_reward, nb_time_step, max_ts in res:
            msg_tmp = "chronics at: {}".format(chron_name)
            msg_tmp += "\ttotal score: {:.6f}".format(cum_reward)
            msg_tmp += "\ttime steps: {:.0f}/{:.0f}".format(nb_time_step, max_ts)
            print(msg_tmp)

        if len(agent.dict_action):
            # I output some of the actions played
            print("The agent played {} different action".format(len(agent.dict_action)))
            for id_, (nb, act, types) in agent.dict_action.items():
                print("Action with ID {} was played {} times".format(id_, nb))
                print("{}".format(act))
                print("-----------")

    if save_gif:
        if verbose:
            print("Saving the gif of the episodes")
        save_log_gif(logs_path, res)

    return agent, res

In [16]:
warnings.filterwarnings('ignore')
env = grid2op.make("l2rpn_neurips_2020_track1_small", reward_class=L2RPNReward)
tp = TrainingParam()

li_attr_obs_X = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q",
                         "actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line",
                         "time_before_cooldown_sub", "rho", "timestep_overflow", "line_status"]

observation_size = NNParam.get_obs_size(env, li_attr_obs_X)
sizes = [800, 800, 800, 494, 494, 494]  # sizes of each hidden layers
kwargs_archi = {'observation_size': observation_size,
                        'sizes': sizes,
                        'activs': ["relu" for _ in sizes],  # all relu activation function
                        "list_attr_obs": li_attr_obs_X}

kwargs_converters = {"all_actions": None,
                             "set_line_status": False,
                             "change_bus_vect": True,
                             "set_topo_vect": False
                             }
# define the name of the model
nm_ = "DeepSarsa_Agent"
try:
    train(env,
          name=nm_,
          iterations=15000,
          save_path="./DSARSA_Agent/model",
          load_path=None,
          logs_dir="./DSARSA_Agent/logs",
          training_param=tp,
          kwargs_converters=kwargs_converters,
          kwargs_archi=kwargs_archi)
finally:
    env.close()

  0%|                                                                                        | 0/15000 [00:00<?, ?it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  0%|                                                                                | 4/15000 [00:00<36:05,  6.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  0%|                                                                               | 10/15000 [00:00<17:15, 14.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  0%|                                                                               | 16/15000 [00:01<13:41, 18.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  0%|                                                                               | 22/15000 [00:01<16:18, 15.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  0%|▏                                                                              | 28/15000 [00:01<13:18, 18.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  0%|▏                                                                              | 31/15000 [00:02<12:30, 19.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  0%|▏                                                                              | 37/15000 [00:02<11:29, 21.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  0%|▏                                                                              | 43/15000 [00:02<11:01, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  0%|▎                                                                              | 49/15000 [00:02<10:49, 23.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  0%|▎                                                                              | 55/15000 [00:03<10:39, 23.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  0%|▎                                                                              | 61/15000 [00:03<10:36, 23.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  0%|▎                                                                              | 67/15000 [00:03<10:39, 23.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  0%|▍                                                                              | 73/15000 [00:03<10:36, 23.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▍                                                                              | 79/15000 [00:04<10:33, 23.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▍                                                                              | 85/15000 [00:04<10:33, 23.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▍                                                                              | 88/15000 [00:04<10:31, 23.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▍                                                                              | 94/15000 [00:04<10:33, 23.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▌                                                                             | 100/15000 [00:05<10:34, 23.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▌                                                                             | 106/15000 [00:05<10:33, 23.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▌                                                                             | 112/15000 [00:05<10:46, 23.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▌                                                                             | 118/15000 [00:05<10:38, 23.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▋                                                                             | 124/15000 [00:06<10:35, 23.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▋                                                                             | 127/15000 [00:06<10:35, 23.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▋                                                                             | 133/15000 [00:06<10:31, 23.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▋                                                                             | 139/15000 [00:06<10:31, 23.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▋                                                                             | 142/15000 [00:06<10:29, 23.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▊                                                                             | 148/15000 [00:07<10:32, 23.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▊                                                                             | 154/15000 [00:07<10:31, 23.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▊                                                                             | 160/15000 [00:07<10:32, 23.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▊                                                                             | 166/15000 [00:07<10:33, 23.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▉                                                                             | 172/15000 [00:08<10:37, 23.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▉                                                                             | 175/15000 [00:08<10:34, 23.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▉                                                                             | 181/15000 [00:08<10:51, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  1%|▉                                                                             | 184/15000 [00:08<11:10, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|▉                                                                             | 190/15000 [00:09<14:43, 16.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|█                                                                             | 196/15000 [00:09<12:34, 19.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|█                                                                             | 199/15000 [00:09<11:58, 20.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|█                                                                             | 205/15000 [00:09<11:29, 21.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|█                                                                             | 211/15000 [00:10<10:59, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|█                                                                             | 214/15000 [00:10<10:50, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  1%|█▏                                                                            | 220/15000 [00:10<10:38, 23.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▏                                                                            | 226/15000 [00:10<10:48, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▏                                                                            | 232/15000 [00:10<10:51, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▏                                                                            | 238/15000 [00:11<10:57, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▎                                                                            | 241/15000 [00:11<15:47, 15.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▎                                                                            | 247/15000 [00:11<13:03, 18.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▎                                                                            | 253/15000 [00:12<11:43, 20.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▎                                                                            | 259/15000 [00:12<11:05, 22.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▎                                                                            | 262/15000 [00:12<11:04, 22.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▍                                                                            | 268/15000 [00:12<14:16, 17.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▍                                                                            | 271/15000 [00:13<13:11, 18.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▍                                                                            | 277/15000 [00:13<11:48, 20.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▍                                                                            | 283/15000 [00:13<11:06, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▌                                                                            | 289/15000 [00:13<10:43, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▌                                                                            | 292/15000 [00:13<10:37, 23.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▌                                                                            | 298/15000 [00:14<10:29, 23.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▌                                                                            | 304/15000 [00:14<10:26, 23.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▌                                                                            | 310/15000 [00:14<10:28, 23.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▋                                                                            | 316/15000 [00:14<10:37, 23.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▋                                                                            | 322/15000 [00:15<10:39, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▋                                                                            | 328/15000 [00:15<10:38, 22.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▋                                                                            | 334/15000 [00:15<10:46, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▊                                                                            | 337/15000 [00:15<10:41, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▊                                                                            | 343/15000 [00:16<10:52, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▊                                                                            | 346/15000 [00:16<11:09, 21.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  2%|█▊                                                                            | 352/15000 [00:16<11:16, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▊                                                                            | 358/15000 [00:16<10:51, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▉                                                                            | 361/15000 [00:17<11:32, 21.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▉                                                                            | 367/15000 [00:17<11:18, 21.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  2%|█▉                                                                            | 370/15000 [00:17<11:02, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|█▉                                                                            | 376/15000 [00:17<10:42, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|█▉                                                                            | 382/15000 [00:17<11:04, 22.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██                                                                            | 385/15000 [00:18<11:03, 22.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██                                                                            | 391/15000 [00:18<11:11, 21.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██                                                                            | 394/15000 [00:18<16:42, 14.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██                                                                            | 399/15000 [00:18<14:27, 16.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██                                                                            | 405/15000 [00:19<12:47, 19.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██                                                                            | 408/15000 [00:19<12:04, 20.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▏                                                                           | 414/15000 [00:19<11:30, 21.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▏                                                                           | 420/15000 [00:19<11:11, 21.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▏                                                                           | 423/15000 [00:20<11:11, 21.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▏                                                                           | 429/15000 [00:20<11:00, 22.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▎                                                                           | 435/15000 [00:20<11:16, 21.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▎                                                                           | 438/15000 [00:20<11:44, 20.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▎                                                                           | 441/15000 [00:20<11:59, 20.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▎                                                                           | 447/15000 [00:21<12:13, 19.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▎                                                                           | 451/15000 [00:21<12:21, 19.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██▎                                                                           | 455/15000 [00:21<12:24, 19.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██▍                                                                           | 459/15000 [00:21<12:33, 19.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██▍                                                                           | 463/15000 [00:22<12:56, 18.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██▍                                                                           | 467/15000 [00:22<13:11, 18.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▍                                                                           | 471/15000 [00:22<13:06, 18.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▍                                                                           | 477/15000 [00:22<11:39, 20.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▍                                                                           | 480/15000 [00:22<11:51, 20.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▌                                                                           | 486/15000 [00:23<12:17, 19.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██▌                                                                           | 491/15000 [00:23<12:02, 20.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██▌                                                                           | 494/15000 [00:23<11:33, 20.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██▌                                                                           | 500/15000 [00:23<11:10, 21.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]

  3%|██▌                                                                           | 503/15000 [00:24<11:00, 21.95it/s]


<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▋                                                                           | 506/15000 [00:24<11:50, 20.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██▋                                                                           | 513/15000 [00:24<13:01, 18.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  3%|██▋                                                                           | 515/15000 [00:24<12:46, 18.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  3%|██▋                                                                           | 521/15000 [00:25<11:44, 20.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|██▋                                                                           | 527/15000 [00:25<11:05, 21.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|██▊                                                                           | 533/15000 [00:25<10:42, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|██▊                                                                           | 536/15000 [00:25<10:34, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|██▊                                                                           | 542/15000 [00:25<10:26, 23.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|██▊                                                                           | 545/15000 [00:26<10:23, 23.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|██▊                                                                           | 551/15000 [00:26<10:17, 23.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|██▉                                                                           | 557/15000 [00:26<10:38, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|██▉                                                                           | 563/15000 [00:26<10:45, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|██▉                                                                           | 566/15000 [00:26<10:53, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|██▉                                                                           | 572/15000 [00:27<10:41, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███                                                                           | 578/15000 [00:27<10:47, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███                                                                           | 581/15000 [00:27<10:46, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███                                                                           | 587/15000 [00:27<10:49, 22.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███                                                                           | 593/15000 [00:28<10:49, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███                                                                           | 596/15000 [00:28<10:48, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███▏                                                                          | 602/15000 [00:28<11:22, 21.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███▏                                                                          | 605/15000 [00:29<17:05, 14.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███▏                                                                          | 610/15000 [00:29<15:06, 15.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███▏                                                                          | 613/15000 [00:29<13:47, 17.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███▏                                                                          | 618/15000 [00:29<16:46, 14.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███▏                                                                          | 623/15000 [00:30<14:12, 16.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███▎                                                                          | 629/15000 [00:30<12:10, 19.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███▎                                                                          | 632/15000 [00:30<11:33, 20.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███▎                                                                          | 638/15000 [00:30<11:14, 21.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███▎                                                                          | 641/15000 [00:30<11:33, 20.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███▎                                                                          | 647/15000 [00:31<11:12, 21.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███▍                                                                          | 650/15000 [00:31<11:21, 21.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███▍                                                                          | 653/15000 [00:31<17:36, 13.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███▍                                                                          | 657/15000 [00:31<16:05, 14.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███▍                                                                          | 662/15000 [00:32<14:13, 16.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███▍                                                                          | 666/15000 [00:32<15:33, 15.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  4%|███▍                                                                          | 668/15000 [00:32<15:26, 15.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  4%|███▍                                                                          | 672/15000 [00:32<14:57, 15.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  5%|███▌                                                                          | 678/15000 [00:33<12:38, 18.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  5%|███▌                                                                          | 681/15000 [00:33<12:38, 18.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  5%|███▌                                                                          | 685/15000 [00:33<13:00, 18.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  5%|███▌                                                                          | 689/15000 [00:33<13:04, 18.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▌                                                                          | 693/15000 [00:34<13:07, 18.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  5%|███▌                                                                          | 697/15000 [00:34<13:33, 17.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▋                                                                          | 702/15000 [00:34<20:22, 11.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▋                                                                          | 704/15000 [00:35<27:19,  8.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▋                                                                          | 708/15000 [00:35<21:02, 11.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  5%|███▋                                                                          | 714/15000 [00:35<15:13, 15.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▋                                                                          | 720/15000 [00:36<12:44, 18.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▊                                                                          | 726/15000 [00:36<11:21, 20.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▊                                                                          | 729/15000 [00:36<11:01, 21.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▊                                                                          | 735/15000 [00:36<10:33, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▊                                                                          | 741/15000 [00:36<10:44, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▉                                                                          | 747/15000 [00:37<10:45, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▉                                                                          | 750/15000 [00:37<10:35, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▉                                                                          | 756/15000 [00:37<10:34, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|███▉                                                                          | 762/15000 [00:37<10:49, 21.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  5%|███▉                                                                          | 765/15000 [00:38<11:32, 20.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  5%|███▉                                                                          | 768/15000 [00:38<11:50, 20.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████                                                                          | 774/15000 [00:38<11:24, 20.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  5%|████                                                                          | 777/15000 [00:38<11:13, 21.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████                                                                          | 783/15000 [00:38<10:44, 22.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████                                                                          | 789/15000 [00:39<10:32, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████▏                                                                         | 795/15000 [00:39<10:24, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████▏                                                                         | 798/15000 [00:39<10:18, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████▏                                                                         | 801/15000 [00:39<15:27, 15.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████▏                                                                         | 807/15000 [00:40<13:13, 17.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████▏                                                                         | 813/15000 [00:40<11:43, 20.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████▎                                                                         | 819/15000 [00:40<11:03, 21.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  5%|████▎                                                                         | 822/15000 [00:40<11:01, 21.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▎                                                                         | 828/15000 [00:41<11:03, 21.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▎                                                                         | 834/15000 [00:41<10:37, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▎                                                                         | 837/15000 [00:41<10:28, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▍                                                                         | 843/15000 [00:41<10:36, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▍                                                                         | 849/15000 [00:42<10:23, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▍                                                                         | 852/15000 [00:42<10:21, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▍                                                                         | 858/15000 [00:42<10:28, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|████▍                                                                         | 864/15000 [00:42<10:38, 22.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▌                                                                         | 867/15000 [00:42<12:21, 19.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|████▌                                                                         | 873/15000 [00:43<11:38, 20.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▌                                                                         | 876/15000 [00:43<11:43, 20.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▌                                                                         | 881/15000 [00:43<16:35, 14.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▌                                                                         | 885/15000 [00:44<15:17, 15.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▋                                                                         | 891/15000 [00:44<12:29, 18.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▋                                                                         | 895/15000 [00:44<12:42, 18.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▋                                                                         | 900/15000 [00:44<11:58, 19.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|████▋                                                                         | 902/15000 [00:44<12:09, 19.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|████▋                                                                         | 906/15000 [00:45<13:37, 17.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▋                                                                         | 911/15000 [00:45<12:29, 18.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|████▊                                                                         | 917/15000 [00:45<12:12, 19.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▊                                                                         | 920/15000 [00:45<12:12, 19.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▊                                                                         | 925/15000 [00:46<11:59, 19.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▊                                                                         | 930/15000 [00:46<11:35, 20.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|████▊                                                                         | 933/15000 [00:46<11:11, 20.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|████▉                                                                         | 939/15000 [00:46<11:00, 21.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▉                                                                         | 942/15000 [00:46<10:52, 21.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|████▉                                                                         | 948/15000 [00:47<10:33, 22.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|████▉                                                                         | 954/15000 [00:47<10:27, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|████▉                                                                         | 960/15000 [00:47<10:56, 21.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|█████                                                                         | 966/15000 [00:48<11:01, 21.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  6%|█████                                                                         | 972/15000 [00:48<10:33, 22.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  6%|█████                                                                         | 975/15000 [00:48<10:24, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████                                                                         | 981/15000 [00:48<11:00, 21.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████                                                                         | 984/15000 [00:48<11:43, 19.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▏                                                                        | 990/15000 [00:49<11:11, 20.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▏                                                                        | 996/15000 [00:49<11:13, 20.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▏                                                                        | 999/15000 [00:49<11:01, 21.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▏                                                                       | 1004/15000 [00:51<40:08,  5.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▏                                                                       | 1008/15000 [00:51<28:06,  8.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▏                                                                       | 1012/15000 [00:51<21:10, 11.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▏                                                                       | 1016/15000 [00:52<18:31, 12.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▏                                                                       | 1018/15000 [00:52<17:14, 13.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▎                                                                       | 1024/15000 [00:52<13:34, 17.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▎                                                                       | 1027/15000 [00:52<12:54, 18.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▎                                                                       | 1032/15000 [00:53<12:54, 18.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▎                                                                       | 1034/15000 [00:53<12:35, 18.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▎                                                                       | 1039/15000 [00:53<16:42, 13.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▎                                                                       | 1042/15000 [00:53<15:00, 15.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▍                                                                       | 1048/15000 [00:54<12:39, 18.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▍                                                                       | 1051/15000 [00:54<12:12, 19.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▍                                                                       | 1057/15000 [00:54<11:42, 19.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▍                                                                       | 1063/15000 [00:54<11:08, 20.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▍                                                                       | 1066/15000 [00:54<10:54, 21.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▌                                                                       | 1072/15000 [00:55<11:13, 20.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▌                                                                       | 1078/15000 [00:55<10:44, 21.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▌                                                                       | 1081/15000 [00:55<10:36, 21.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▌                                                                       | 1087/15000 [00:55<11:03, 20.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▌                                                                       | 1093/15000 [00:56<11:33, 20.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▋                                                                       | 1096/15000 [00:56<11:17, 20.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▋                                                                       | 1101/15000 [00:56<13:03, 17.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▋                                                                       | 1104/15000 [00:56<12:24, 18.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▋                                                                       | 1110/15000 [00:57<12:00, 19.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▋                                                                       | 1113/15000 [00:57<11:47, 19.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  7%|█████▋                                                                       | 1117/15000 [00:57<11:44, 19.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  7%|█████▊                                                                       | 1122/15000 [00:57<12:43, 18.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|█████▊                                                                       | 1126/15000 [00:57<12:47, 18.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|█████▊                                                                       | 1130/15000 [00:58<15:01, 15.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  8%|█████▊                                                                       | 1135/15000 [00:58<13:15, 17.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|█████▊                                                                       | 1137/15000 [00:58<13:00, 17.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  8%|█████▊                                                                       | 1142/15000 [00:58<11:47, 19.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|█████▉                                                                       | 1147/15000 [00:59<12:50, 17.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  8%|█████▉                                                                       | 1150/15000 [00:59<12:12, 18.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|█████▉                                                                       | 1156/15000 [00:59<11:16, 20.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|█████▉                                                                       | 1162/15000 [00:59<10:58, 21.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  8%|█████▉                                                                       | 1165/15000 [01:00<11:43, 19.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████                                                                       | 1170/15000 [01:00<11:26, 20.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████                                                                       | 1176/15000 [01:00<10:51, 21.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████                                                                       | 1182/15000 [01:00<10:32, 21.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████                                                                       | 1185/15000 [01:00<10:24, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████                                                                       | 1191/15000 [01:01<11:01, 20.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▏                                                                      | 1194/15000 [01:01<10:47, 21.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▏                                                                      | 1200/15000 [01:01<11:06, 20.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▏                                                                      | 1206/15000 [01:01<10:27, 22.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▏                                                                      | 1212/15000 [01:02<10:11, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▏                                                                      | 1215/15000 [01:02<10:22, 22.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  8%|██████▎                                                                      | 1221/15000 [01:02<10:06, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▎                                                                      | 1227/15000 [01:02<10:38, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▎                                                                      | 1230/15000 [01:03<10:26, 21.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▎                                                                      | 1236/15000 [01:03<10:19, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▎                                                                      | 1239/15000 [01:03<10:41, 21.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▍                                                                      | 1244/15000 [01:03<12:46, 17.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▍                                                                      | 1248/15000 [01:04<16:56, 13.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▍                                                                      | 1251/15000 [01:04<14:38, 15.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  8%|██████▍                                                                      | 1255/15000 [01:04<13:44, 16.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▍                                                                      | 1261/15000 [01:04<12:03, 19.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▍                                                                      | 1263/15000 [01:04<11:55, 19.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▌                                                                      | 1269/15000 [01:05<11:19, 20.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  8%|██████▌                                                                      | 1275/15000 [01:05<11:15, 20.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▌                                                                      | 1278/15000 [01:05<11:32, 19.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|██████▌                                                                      | 1284/15000 [01:05<11:00, 20.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|██████▌                                                                      | 1290/15000 [01:06<10:44, 21.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|██████▋                                                                      | 1293/15000 [01:06<10:40, 21.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▋                                                                      | 1299/15000 [01:06<10:34, 21.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|██████▋                                                                      | 1305/15000 [01:06<10:40, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▋                                                                      | 1308/15000 [01:07<11:02, 20.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▋                                                                      | 1311/15000 [01:07<11:18, 20.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▊                                                                      | 1316/15000 [01:07<13:07, 17.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▊                                                                      | 1318/15000 [01:07<19:33, 11.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▊                                                                      | 1324/15000 [01:08<15:26, 14.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|██████▊                                                                      | 1327/15000 [01:08<13:32, 16.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|██████▊                                                                      | 1333/15000 [01:08<11:42, 19.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▊                                                                      | 1339/15000 [01:08<10:57, 20.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|██████▉                                                                      | 1342/15000 [01:09<10:49, 21.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|██████▉                                                                      | 1348/15000 [01:09<10:29, 21.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▉                                                                      | 1354/15000 [01:09<10:41, 21.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|██████▉                                                                      | 1357/15000 [01:09<10:39, 21.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|██████▉                                                                      | 1363/15000 [01:10<11:00, 20.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|███████                                                                      | 1368/15000 [01:10<11:52, 19.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|███████                                                                      | 1370/15000 [01:10<18:19, 12.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|███████                                                                      | 1374/15000 [01:10<15:42, 14.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|███████                                                                      | 1380/15000 [01:11<12:38, 17.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|███████                                                                      | 1384/15000 [01:11<12:44, 17.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|███████▏                                                                     | 1388/15000 [01:11<12:31, 18.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|███████▏                                                                     | 1393/15000 [01:11<11:27, 19.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|███████▏                                                                     | 1396/15000 [01:12<11:02, 20.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|███████▏                                                                     | 1402/15000 [01:12<10:25, 21.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|███████▏                                                                     | 1408/15000 [01:12<10:26, 21.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|███████▎                                                                     | 1414/15000 [01:12<10:30, 21.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


  9%|███████▎                                                                     | 1417/15000 [01:13<10:17, 22.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


  9%|███████▎                                                                     | 1423/15000 [01:13<10:05, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▎                                                                     | 1429/15000 [01:13<11:16, 20.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 10%|███████▎                                                                     | 1432/15000 [01:13<11:01, 20.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▎                                                                     | 1435/15000 [01:14<15:16, 14.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▍                                                                     | 1440/15000 [01:14<13:10, 17.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▍                                                                     | 1446/15000 [01:14<11:36, 19.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▍                                                                     | 1452/15000 [01:14<11:04, 20.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▍                                                                     | 1458/15000 [01:15<10:34, 21.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▍                                                                     | 1461/15000 [01:15<10:28, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▌                                                                     | 1467/15000 [01:15<10:09, 22.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 10%|███████▌                                                                     | 1473/15000 [01:15<10:20, 21.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▌                                                                     | 1479/15000 [01:16<10:16, 21.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 10%|███████▌                                                                     | 1482/15000 [01:16<10:13, 22.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▋                                                                     | 1488/15000 [01:16<10:19, 21.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 10%|███████▋                                                                     | 1491/15000 [01:16<10:38, 21.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▋                                                                     | 1497/15000 [01:16<10:43, 20.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▋                                                                     | 1503/15000 [01:17<10:33, 21.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▋                                                                     | 1509/15000 [01:17<10:19, 21.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 10%|███████▊                                                                     | 1512/15000 [01:17<10:14, 21.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▊                                                                     | 1515/15000 [01:17<10:09, 22.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▊                                                                     | 1521/15000 [01:18<11:54, 18.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▊                                                                     | 1525/15000 [01:18<16:21, 13.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 10%|███████▊                                                                     | 1528/15000 [01:18<14:38, 15.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▊                                                                     | 1533/15000 [01:18<12:56, 17.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▉                                                                     | 1536/15000 [01:19<12:13, 18.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▉                                                                     | 1542/15000 [01:19<11:08, 20.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 10%|███████▉                                                                     | 1545/15000 [01:19<11:08, 20.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▉                                                                     | 1551/15000 [01:19<11:40, 19.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|███████▉                                                                     | 1557/15000 [01:20<10:42, 20.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|████████                                                                     | 1560/15000 [01:20<10:24, 21.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 10%|████████                                                                     | 1566/15000 [01:20<10:01, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 10%|████████                                                                     | 1569/15000 [01:20<10:09, 22.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 10%|████████                                                                     | 1575/15000 [01:20<11:15, 19.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████                                                                     | 1578/15000 [01:21<11:27, 19.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▏                                                                    | 1584/15000 [01:21<10:55, 20.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▏                                                                    | 1590/15000 [01:21<10:20, 21.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▏                                                                    | 1593/15000 [01:21<10:19, 21.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▏                                                                    | 1599/15000 [01:22<10:05, 22.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▏                                                                    | 1605/15000 [01:22<09:58, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▎                                                                    | 1608/15000 [01:22<09:55, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▎                                                                    | 1614/15000 [01:22<11:02, 20.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▎                                                                    | 1620/15000 [01:23<10:44, 20.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▎                                                                    | 1626/15000 [01:23<10:16, 21.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▎                                                                    | 1629/15000 [01:23<10:07, 22.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▍                                                                    | 1635/15000 [01:23<10:43, 20.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▍                                                                    | 1638/15000 [01:23<10:25, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▍                                                                    | 1644/15000 [01:24<10:12, 21.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▍                                                                    | 1650/15000 [01:24<09:51, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▍                                                                    | 1653/15000 [01:24<09:54, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▌                                                                    | 1659/15000 [01:24<09:53, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▌                                                                    | 1662/15000 [01:24<09:52, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▌                                                                    | 1668/15000 [01:25<10:11, 21.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▌                                                                    | 1674/15000 [01:25<09:50, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▌                                                                    | 1680/15000 [01:25<10:08, 21.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▋                                                                    | 1686/15000 [01:26<09:44, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▋                                                                    | 1689/15000 [01:26<09:48, 22.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▋                                                                    | 1695/15000 [01:26<10:26, 21.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▋                                                                    | 1698/15000 [01:26<10:12, 21.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▋                                                                    | 1704/15000 [01:26<10:01, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 11%|████████▊                                                                    | 1710/15000 [01:27<09:56, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▊                                                                    | 1716/15000 [01:27<09:52, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 11%|████████▊                                                                    | 1719/15000 [01:27<09:52, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|████████▊                                                                    | 1725/15000 [01:27<10:03, 21.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|████████▊                                                                    | 1728/15000 [01:28<14:32, 15.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|████████▉                                                                    | 1733/15000 [01:28<12:35, 17.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|████████▉                                                                    | 1739/15000 [01:28<11:13, 19.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|████████▉                                                                    | 1745/15000 [01:28<10:31, 20.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|████████▉                                                                    | 1748/15000 [01:29<10:18, 21.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████                                                                    | 1754/15000 [01:29<10:21, 21.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████                                                                    | 1757/15000 [01:29<10:44, 20.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████                                                                    | 1760/15000 [01:29<15:00, 14.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████                                                                    | 1765/15000 [01:30<13:19, 16.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████                                                                    | 1771/15000 [01:30<11:20, 19.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████                                                                    | 1774/15000 [01:30<10:57, 20.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▏                                                                   | 1780/15000 [01:30<10:54, 20.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████▏                                                                   | 1786/15000 [01:31<15:13, 14.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▏                                                                   | 1792/15000 [01:31<12:22, 17.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▏                                                                   | 1795/15000 [01:31<11:49, 18.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▏                                                                   | 1801/15000 [01:32<11:18, 19.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▎                                                                   | 1804/15000 [01:32<11:08, 19.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▎                                                                   | 1809/15000 [01:32<11:51, 18.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████▎                                                                   | 1814/15000 [01:32<11:16, 19.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████▎                                                                   | 1819/15000 [01:33<11:05, 19.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████▎                                                                   | 1822/15000 [01:33<10:48, 20.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████▍                                                                   | 1828/15000 [01:33<10:04, 21.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▍                                                                   | 1834/15000 [01:33<09:51, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▍                                                                   | 1837/15000 [01:33<09:50, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▍                                                                   | 1843/15000 [01:34<09:44, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▍                                                                   | 1849/15000 [01:34<10:04, 21.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▌                                                                   | 1855/15000 [01:34<09:50, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▌                                                                   | 1858/15000 [01:34<10:36, 20.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▌                                                                   | 1861/15000 [01:34<10:45, 20.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 12%|█████████▌                                                                   | 1867/15000 [01:35<10:31, 20.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 12%|█████████▌                                                                   | 1873/15000 [01:35<09:58, 21.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▋                                                                   | 1876/15000 [01:35<09:53, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|█████████▋                                                                   | 1882/15000 [01:35<09:44, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▋                                                                   | 1888/15000 [01:36<09:40, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|█████████▋                                                                   | 1894/15000 [01:36<09:31, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▋                                                                   | 1897/15000 [01:36<09:26, 23.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▊                                                                   | 1903/15000 [01:36<09:22, 23.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▊                                                                   | 1909/15000 [01:37<09:15, 23.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|█████████▊                                                                   | 1912/15000 [01:37<09:17, 23.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▊                                                                   | 1918/15000 [01:37<09:17, 23.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|█████████▊                                                                   | 1921/15000 [01:37<09:47, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|█████████▉                                                                   | 1927/15000 [01:37<09:51, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▉                                                                   | 1930/15000 [01:38<10:28, 20.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▉                                                                   | 1933/15000 [01:38<14:28, 15.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▉                                                                   | 1938/15000 [01:38<12:46, 17.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|█████████▉                                                                   | 1943/15000 [01:39<15:31, 14.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|█████████▉                                                                   | 1946/15000 [01:39<13:58, 15.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|██████████                                                                   | 1949/15000 [01:39<13:02, 16.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|██████████                                                                   | 1953/15000 [01:39<16:27, 13.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|██████████                                                                   | 1956/15000 [01:39<14:08, 15.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|██████████                                                                   | 1962/15000 [01:40<11:42, 18.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|██████████                                                                   | 1968/15000 [01:40<10:30, 20.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|██████████▏                                                                  | 1974/15000 [01:40<10:02, 21.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|██████████▏                                                                  | 1977/15000 [01:40<10:09, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|██████████▏                                                                  | 1983/15000 [01:41<10:18, 21.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|██████████▏                                                                  | 1986/15000 [01:41<10:29, 20.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|██████████▏                                                                  | 1992/15000 [01:41<10:39, 20.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|██████████▏                                                                  | 1995/15000 [01:41<10:28, 20.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|██████████▎                                                                  | 1998/15000 [01:41<10:42, 20.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|██████████▎                                                                  | 2001/15000 [01:43<38:02,  5.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|██████████▎                                                                  | 2006/15000 [01:43<28:03,  7.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|██████████▎                                                                  | 2011/15000 [01:44<19:33, 11.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 13%|██████████▎                                                                  | 2014/15000 [01:44<16:11, 13.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 13%|██████████▎                                                                  | 2020/15000 [01:44<12:51, 16.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 14%|██████████▍                                                                  | 2026/15000 [01:44<11:58, 18.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▍                                                                  | 2029/15000 [01:44<11:12, 19.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▍                                                                  | 2035/15000 [01:45<10:42, 20.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▍                                                                  | 2041/15000 [01:45<10:04, 21.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▍                                                                  | 2044/15000 [01:45<09:55, 21.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 14%|██████████▌                                                                  | 2050/15000 [01:45<14:03, 15.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▌                                                                  | 2056/15000 [01:46<11:38, 18.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▌                                                                  | 2059/15000 [01:46<11:02, 19.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▌                                                                  | 2065/15000 [01:46<10:13, 21.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▋                                                                  | 2071/15000 [01:46<09:38, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▋                                                                  | 2077/15000 [01:47<09:23, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▋                                                                  | 2083/15000 [01:47<09:15, 23.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▋                                                                  | 2086/15000 [01:47<09:15, 23.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▋                                                                  | 2092/15000 [01:47<09:11, 23.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▊                                                                  | 2098/15000 [01:48<09:13, 23.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▊                                                                  | 2104/15000 [01:48<09:09, 23.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▊                                                                  | 2110/15000 [01:48<09:32, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 14%|██████████▊                                                                  | 2113/15000 [01:48<09:43, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▉                                                                  | 2119/15000 [01:48<09:27, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▉                                                                  | 2125/15000 [01:49<09:17, 23.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▉                                                                  | 2128/15000 [01:49<09:17, 23.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▉                                                                  | 2134/15000 [01:49<09:23, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|██████████▉                                                                  | 2140/15000 [01:49<09:35, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|███████████                                                                  | 2146/15000 [01:50<09:36, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 14%|███████████                                                                  | 2152/15000 [01:50<09:52, 21.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 14%|███████████                                                                  | 2155/15000 [01:50<09:56, 21.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 14%|███████████                                                                  | 2160/15000 [01:50<11:29, 18.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 14%|███████████                                                                  | 2166/15000 [01:51<10:18, 20.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 14%|███████████▏                                                                 | 2169/15000 [01:51<09:59, 21.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 14%|███████████▏                                                                 | 2175/15000 [01:51<09:32, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▏                                                                 | 2181/15000 [01:51<09:34, 22.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▏                                                                 | 2184/15000 [01:51<09:28, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▏                                                                 | 2190/15000 [01:52<09:16, 23.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▎                                                                 | 2196/15000 [01:52<09:12, 23.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▎                                                                 | 2199/15000 [01:52<09:22, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▎                                                                 | 2205/15000 [01:52<09:23, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 15%|███████████▎                                                                 | 2211/15000 [01:53<09:19, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 15%|███████████▍                                                                 | 2217/15000 [01:53<09:27, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▍                                                                 | 2220/15000 [01:53<09:20, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▍                                                                 | 2226/15000 [01:53<09:31, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▍                                                                 | 2229/15000 [01:53<09:29, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▍                                                                 | 2235/15000 [01:54<09:15, 22.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▍                                                                 | 2238/15000 [01:54<09:15, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 15%|███████████▌                                                                 | 2244/15000 [01:54<09:15, 22.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▌                                                                 | 2250/15000 [01:54<09:21, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▌                                                                 | 2256/15000 [01:55<09:15, 22.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▌                                                                 | 2259/15000 [01:55<09:11, 23.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▋                                                                 | 2265/15000 [01:55<09:30, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 15%|███████████▋                                                                 | 2268/15000 [01:55<09:20, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▋                                                                 | 2271/15000 [01:55<09:37, 22.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▋                                                                 | 2277/15000 [01:56<12:35, 16.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 15%|███████████▋                                                                 | 2283/15000 [01:56<10:52, 19.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 15%|███████████▋                                                                 | 2286/15000 [01:56<10:20, 20.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▊                                                                 | 2292/15000 [01:56<09:41, 21.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▊                                                                 | 2298/15000 [01:57<09:21, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▊                                                                 | 2304/15000 [01:57<09:27, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▊                                                                 | 2310/15000 [01:57<09:12, 22.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▉                                                                 | 2316/15000 [01:57<09:06, 23.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 15%|███████████▉                                                                 | 2319/15000 [01:58<09:26, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|███████████▉                                                                 | 2325/15000 [01:58<09:27, 22.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|███████████▉                                                                 | 2331/15000 [01:58<09:12, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|███████████▉                                                                 | 2337/15000 [01:58<09:52, 21.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████                                                                 | 2343/15000 [01:59<09:28, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|████████████                                                                 | 2349/15000 [01:59<09:11, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|████████████                                                                 | 2352/15000 [01:59<09:08, 23.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████                                                                 | 2358/15000 [01:59<09:08, 23.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▏                                                                | 2364/15000 [02:00<09:02, 23.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▏                                                                | 2367/15000 [02:00<09:20, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▏                                                                | 2373/15000 [02:00<09:13, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|████████████▏                                                                | 2379/15000 [02:00<09:13, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|████████████▏                                                                | 2382/15000 [02:00<09:11, 22.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▎                                                                | 2388/15000 [02:01<09:06, 23.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▎                                                                | 2394/15000 [02:01<09:01, 23.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▎                                                                | 2400/15000 [02:01<08:58, 23.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▎                                                                | 2406/15000 [02:01<08:57, 23.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▍                                                                | 2412/15000 [02:02<08:59, 23.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|████████████▍                                                                | 2418/15000 [02:02<08:55, 23.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|████████████▍                                                                | 2421/15000 [02:02<08:56, 23.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▍                                                                | 2427/15000 [02:02<09:25, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▍                                                                | 2430/15000 [02:02<09:28, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▌                                                                | 2436/15000 [02:03<14:11, 14.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▌                                                                | 2442/15000 [02:03<11:30, 18.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▌                                                                | 2448/15000 [02:03<10:09, 20.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▌                                                                | 2454/15000 [02:04<09:44, 21.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▋                                                                | 2460/15000 [02:04<09:18, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 16%|████████████▋                                                                | 2463/15000 [02:04<09:11, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|████████████▋                                                                | 2469/15000 [02:04<09:01, 23.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 16%|████████████▋                                                                | 2475/15000 [02:05<09:16, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 17%|████████████▋                                                                | 2481/15000 [02:05<09:52, 21.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|████████████▊                                                                | 2484/15000 [02:05<13:50, 15.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 17%|████████████▊                                                                | 2490/15000 [02:06<11:25, 18.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|████████████▊                                                                | 2496/15000 [02:06<10:24, 20.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|████████████▊                                                                | 2502/15000 [02:06<09:36, 21.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|████████████▊                                                                | 2505/15000 [02:06<09:23, 22.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|████████████▉                                                                | 2511/15000 [02:06<09:25, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|████████████▉                                                                | 2517/15000 [02:07<09:09, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 17%|████████████▉                                                                | 2523/15000 [02:07<09:20, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|████████████▉                                                                | 2526/15000 [02:07<09:17, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|████████████▉                                                                | 2532/15000 [02:07<09:24, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████                                                                | 2538/15000 [02:08<09:45, 21.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████                                                                | 2544/15000 [02:08<09:16, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████                                                                | 2547/15000 [02:08<09:17, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████                                                                | 2553/15000 [02:08<09:05, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 17%|█████████████▏                                                               | 2559/15000 [02:09<09:01, 22.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▏                                                               | 2562/15000 [02:09<08:58, 23.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▏                                                               | 2568/15000 [02:09<08:52, 23.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▏                                                               | 2574/15000 [02:09<08:52, 23.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 17%|█████████████▏                                                               | 2577/15000 [02:09<08:55, 23.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▎                                                               | 2583/15000 [02:10<08:55, 23.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 17%|█████████████▎                                                               | 2589/15000 [02:10<08:52, 23.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 17%|█████████████▎                                                               | 2592/15000 [02:10<09:09, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▎                                                               | 2598/15000 [02:10<09:40, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▎                                                               | 2601/15000 [02:10<09:24, 21.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▍                                                               | 2607/15000 [02:11<09:23, 22.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▍                                                               | 2610/15000 [02:11<09:12, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▍                                                               | 2616/15000 [02:11<09:31, 21.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 17%|█████████████▍                                                               | 2619/15000 [02:12<14:38, 14.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▍                                                               | 2625/15000 [02:12<11:49, 17.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▌                                                               | 2631/15000 [02:12<13:35, 15.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▌                                                               | 2637/15000 [02:13<11:23, 18.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▌                                                               | 2640/15000 [02:13<10:44, 19.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▌                                                               | 2646/15000 [02:13<10:00, 20.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 18%|█████████████▌                                                               | 2652/15000 [02:13<09:24, 21.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 18%|█████████████▋                                                               | 2655/15000 [02:13<09:32, 21.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▋                                                               | 2661/15000 [02:14<09:27, 21.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▋                                                               | 2667/15000 [02:14<09:06, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 18%|█████████████▋                                                               | 2673/15000 [02:14<08:51, 23.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▋                                                               | 2676/15000 [02:14<08:48, 23.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▊                                                               | 2679/15000 [02:14<09:09, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▊                                                               | 2685/15000 [02:15<12:20, 16.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▊                                                               | 2691/15000 [02:15<10:37, 19.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▊                                                               | 2697/15000 [02:15<09:44, 21.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▊                                                               | 2700/15000 [02:16<09:26, 21.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▉                                                               | 2706/15000 [02:16<09:06, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▉                                                               | 2712/15000 [02:16<09:15, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▉                                                               | 2715/15000 [02:16<09:08, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 18%|█████████████▉                                                               | 2721/15000 [02:17<08:56, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|█████████████▉                                                               | 2727/15000 [02:17<08:52, 23.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 18%|██████████████                                                               | 2730/15000 [02:17<08:48, 23.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|██████████████                                                               | 2736/15000 [02:17<08:47, 23.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|██████████████                                                               | 2739/15000 [02:17<08:45, 23.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|██████████████                                                               | 2745/15000 [02:18<08:43, 23.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|██████████████                                                               | 2751/15000 [02:18<08:44, 23.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|██████████████▏                                                              | 2757/15000 [02:18<08:43, 23.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|██████████████▏                                                              | 2763/15000 [02:18<08:42, 23.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|██████████████▏                                                              | 2769/15000 [02:19<08:40, 23.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 18%|██████████████▏                                                              | 2775/15000 [02:19<08:38, 23.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▎                                                              | 2778/15000 [02:19<08:58, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▎                                                              | 2784/15000 [02:19<09:01, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▎                                                              | 2787/15000 [02:19<08:55, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▎                                                              | 2793/15000 [02:20<09:02, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▎                                                              | 2799/15000 [02:20<08:58, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▍                                                              | 2802/15000 [02:20<08:56, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 19%|██████████████▍                                                              | 2808/15000 [02:20<09:28, 21.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▍                                                              | 2811/15000 [02:20<09:44, 20.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▍                                                              | 2814/15000 [02:21<13:38, 14.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▍                                                              | 2820/15000 [02:21<11:11, 18.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 19%|██████████████▌                                                              | 2826/15000 [02:21<09:56, 20.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 19%|██████████████▌                                                              | 2829/15000 [02:21<09:51, 20.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▌                                                              | 2835/15000 [02:22<09:16, 21.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▌                                                              | 2841/15000 [02:22<08:56, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▌                                                              | 2847/15000 [02:22<08:50, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▋                                                              | 2853/15000 [02:22<08:44, 23.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▋                                                              | 2859/15000 [02:23<08:41, 23.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▋                                                              | 2862/15000 [02:23<08:40, 23.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 19%|██████████████▋                                                              | 2868/15000 [02:23<08:55, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▊                                                              | 2874/15000 [02:23<09:09, 22.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▊                                                              | 2880/15000 [02:24<11:58, 16.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 19%|██████████████▊                                                              | 2883/15000 [02:24<10:56, 18.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▊                                                              | 2889/15000 [02:24<09:42, 20.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▊                                                              | 2895/15000 [02:25<09:32, 21.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▉                                                              | 2898/15000 [02:25<09:20, 21.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 19%|██████████████▉                                                              | 2904/15000 [02:25<13:40, 14.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 19%|██████████████▉                                                              | 2910/15000 [02:25<11:05, 18.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 19%|██████████████▉                                                              | 2916/15000 [02:26<10:15, 19.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 19%|██████████████▉                                                              | 2919/15000 [02:26<09:47, 20.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████                                                              | 2925/15000 [02:26<09:09, 21.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████                                                              | 2928/15000 [02:26<08:58, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████                                                              | 2934/15000 [02:27<09:19, 21.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████                                                              | 2940/15000 [02:27<09:22, 21.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████                                                              | 2943/15000 [02:27<09:07, 22.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▏                                                             | 2949/15000 [02:27<08:50, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▏                                                             | 2955/15000 [02:27<08:41, 23.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 20%|███████████████▏                                                             | 2958/15000 [02:28<08:41, 23.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▏                                                             | 2964/15000 [02:28<08:35, 23.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 20%|███████████████▏                                                             | 2970/15000 [02:28<08:33, 23.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 20%|███████████████▎                                                             | 2973/15000 [02:28<08:34, 23.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▎                                                             | 2979/15000 [02:29<08:57, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▎                                                             | 2985/15000 [02:29<08:50, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 20%|███████████████▎                                                             | 2988/15000 [02:29<09:05, 22.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▎                                                             | 2994/15000 [02:29<09:22, 21.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▍                                                             | 2997/15000 [02:29<09:26, 21.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 20%|███████████████▍                                                             | 3000/15000 [02:30<13:27, 14.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▍                                                             | 3005/15000 [02:31<30:51,  6.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 20%|███████████████▍                                                             | 3011/15000 [02:31<19:21, 10.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▍                                                             | 3014/15000 [02:32<16:05, 12.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▌                                                             | 3020/15000 [02:32<12:25, 16.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▌                                                             | 3026/15000 [02:32<10:30, 19.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▌                                                             | 3029/15000 [02:32<10:37, 18.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▌                                                             | 3035/15000 [02:33<12:54, 15.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 20%|███████████████▌                                                             | 3040/15000 [02:33<11:15, 17.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▋                                                             | 3046/15000 [02:33<09:45, 20.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▋                                                             | 3052/15000 [02:34<09:21, 21.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▋                                                             | 3055/15000 [02:34<09:05, 21.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▋                                                             | 3061/15000 [02:34<09:10, 21.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 20%|███████████████▋                                                             | 3064/15000 [02:34<08:56, 22.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 20%|███████████████▊                                                             | 3070/15000 [02:34<09:08, 21.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|███████████████▊                                                             | 3076/15000 [02:35<08:47, 22.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|███████████████▊                                                             | 3082/15000 [02:35<08:38, 23.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|███████████████▊                                                             | 3088/15000 [02:35<08:31, 23.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|███████████████▊                                                             | 3091/15000 [02:35<08:32, 23.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|███████████████▉                                                             | 3094/15000 [02:35<08:33, 23.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|███████████████▉                                                             | 3100/15000 [02:36<11:56, 16.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|███████████████▉                                                             | 3106/15000 [02:36<10:12, 19.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|███████████████▉                                                             | 3109/15000 [02:36<09:46, 20.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|███████████████▉                                                             | 3112/15000 [02:36<09:51, 20.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|███████████████▉                                                             | 3115/15000 [02:37<14:26, 13.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████                                                             | 3121/15000 [02:37<11:38, 17.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|████████████████                                                             | 3127/15000 [02:37<10:04, 19.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████                                                             | 3130/15000 [02:38<09:33, 20.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████                                                             | 3136/15000 [02:38<12:32, 15.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|████████████████                                                             | 3140/15000 [02:38<11:48, 16.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|████████████████▏                                                            | 3146/15000 [02:39<10:11, 19.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▏                                                            | 3149/15000 [02:39<09:55, 19.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▏                                                            | 3155/15000 [02:39<09:19, 21.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▏                                                            | 3158/15000 [02:39<09:38, 20.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▏                                                            | 3164/15000 [02:39<10:13, 19.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▎                                                            | 3167/15000 [02:40<09:44, 20.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|████████████████▎                                                            | 3173/15000 [02:40<09:36, 20.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▎                                                            | 3179/15000 [02:40<09:07, 21.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|████████████████▎                                                            | 3182/15000 [02:40<09:55, 19.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▎                                                            | 3188/15000 [02:41<09:27, 20.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|████████████████▍                                                            | 3191/15000 [02:41<09:12, 21.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▍                                                            | 3197/15000 [02:41<10:19, 19.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▍                                                            | 3200/15000 [02:41<09:54, 19.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▍                                                            | 3206/15000 [02:41<09:19, 21.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|████████████████▍                                                            | 3212/15000 [02:42<09:13, 21.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 21%|████████████████▌                                                            | 3218/15000 [02:42<09:06, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 21%|████████████████▌                                                            | 3221/15000 [02:42<09:09, 21.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▌                                                            | 3227/15000 [02:42<09:28, 20.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▌                                                            | 3233/15000 [02:43<09:08, 21.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▌                                                            | 3236/15000 [02:43<10:15, 19.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▋                                                            | 3239/15000 [02:43<10:02, 19.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|████████████████▋                                                            | 3241/15000 [02:43<14:25, 13.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|████████████████▋                                                            | 3247/15000 [02:44<11:43, 16.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▋                                                            | 3253/15000 [02:44<10:15, 19.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|████████████████▋                                                            | 3256/15000 [02:44<09:56, 19.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▋                                                            | 3262/15000 [02:44<09:45, 20.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|████████████████▊                                                            | 3268/15000 [02:45<09:12, 21.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▊                                                            | 3271/15000 [02:45<09:01, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|████████████████▊                                                            | 3277/15000 [02:45<08:50, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▊                                                            | 3283/15000 [02:45<08:43, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|████████████████▉                                                            | 3289/15000 [02:46<08:47, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▉                                                            | 3292/15000 [02:46<09:21, 20.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▉                                                            | 3295/15000 [02:46<13:16, 14.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|████████████████▉                                                            | 3301/15000 [02:46<10:53, 17.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▉                                                            | 3304/15000 [02:46<10:17, 18.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|████████████████▉                                                            | 3310/15000 [02:47<09:43, 20.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|█████████████████                                                            | 3316/15000 [02:47<09:26, 20.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|█████████████████                                                            | 3319/15000 [02:47<09:23, 20.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|█████████████████                                                            | 3325/15000 [02:47<09:26, 20.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|█████████████████                                                            | 3331/15000 [02:48<09:18, 20.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|█████████████████                                                            | 3334/15000 [02:48<09:05, 21.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|█████████████████▏                                                           | 3340/15000 [02:48<09:12, 21.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|█████████████████▏                                                           | 3343/15000 [02:48<09:23, 20.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|█████████████████▏                                                           | 3346/15000 [02:48<09:44, 19.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 22%|█████████████████▏                                                           | 3352/15000 [02:49<09:12, 21.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|█████████████████▏                                                           | 3358/15000 [02:49<08:59, 21.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|█████████████████▎                                                           | 3361/15000 [02:49<09:15, 20.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|█████████████████▎                                                           | 3367/15000 [02:49<09:16, 20.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 22%|█████████████████▎                                                           | 3373/15000 [02:50<09:10, 21.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▎                                                           | 3376/15000 [02:50<09:10, 21.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▎                                                           | 3382/15000 [02:50<09:03, 21.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▍                                                           | 3385/15000 [02:50<09:01, 21.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▍                                                           | 3391/15000 [02:51<08:58, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▍                                                           | 3394/15000 [02:51<08:57, 21.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▍                                                           | 3400/15000 [02:51<08:50, 21.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▍                                                           | 3406/15000 [02:51<08:43, 22.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▌                                                           | 3412/15000 [02:52<09:16, 20.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▌                                                           | 3415/15000 [02:52<09:06, 21.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▌                                                           | 3421/15000 [02:52<08:49, 21.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▌                                                           | 3427/15000 [02:52<08:41, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▌                                                           | 3430/15000 [02:52<08:37, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▋                                                           | 3436/15000 [02:53<08:28, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▋                                                           | 3442/15000 [02:53<08:26, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▋                                                           | 3445/15000 [02:53<08:23, 22.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▋                                                           | 3451/15000 [02:53<08:22, 23.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▋                                                           | 3454/15000 [02:53<08:21, 23.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▊                                                           | 3460/15000 [02:54<08:23, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▊                                                           | 3466/15000 [02:54<08:28, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▊                                                           | 3472/15000 [02:54<08:24, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|█████████████████▊                                                           | 3475/15000 [02:54<08:23, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▊                                                           | 3481/15000 [02:55<08:20, 23.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▉                                                           | 3487/15000 [02:55<08:20, 22.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▉                                                           | 3490/15000 [02:55<08:20, 23.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▉                                                           | 3496/15000 [02:55<08:30, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▉                                                           | 3502/15000 [02:55<08:23, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|█████████████████▉                                                           | 3505/15000 [02:56<08:22, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|██████████████████                                                           | 3511/15000 [02:56<08:17, 23.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|██████████████████                                                           | 3514/15000 [02:56<08:37, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 23%|██████████████████                                                           | 3520/15000 [02:57<11:31, 16.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 23%|██████████████████                                                           | 3523/15000 [02:57<10:31, 18.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████                                                           | 3529/15000 [02:57<09:38, 19.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 24%|██████████████████▏                                                          | 3532/15000 [02:57<09:17, 20.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▏                                                          | 3538/15000 [02:57<08:44, 21.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▏                                                          | 3544/15000 [02:58<08:26, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▏                                                          | 3550/15000 [02:58<08:19, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▎                                                          | 3556/15000 [02:58<08:20, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 24%|██████████████████▎                                                          | 3559/15000 [02:58<08:21, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▎                                                          | 3565/15000 [02:58<08:18, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▎                                                          | 3571/15000 [02:59<08:15, 23.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▎                                                          | 3574/15000 [02:59<08:15, 23.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▍                                                          | 3580/15000 [02:59<08:19, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 24%|██████████████████▍                                                          | 3586/15000 [02:59<08:17, 22.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▍                                                          | 3592/15000 [03:00<08:15, 23.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▍                                                          | 3595/15000 [03:00<08:19, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▍                                                          | 3601/15000 [03:00<08:16, 22.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▌                                                          | 3607/15000 [03:00<08:12, 23.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▌                                                          | 3610/15000 [03:00<08:16, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▌                                                          | 3616/15000 [03:01<08:15, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▌                                                          | 3622/15000 [03:01<08:15, 22.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▌                                                          | 3628/15000 [03:01<08:14, 22.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▋                                                          | 3631/15000 [03:01<08:15, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▋                                                          | 3637/15000 [03:02<08:30, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 24%|██████████████████▋                                                          | 3640/15000 [03:02<08:23, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▋                                                          | 3646/15000 [03:02<08:19, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▋                                                          | 3652/15000 [03:02<08:30, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▊                                                          | 3658/15000 [03:03<08:23, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 24%|██████████████████▊                                                          | 3661/15000 [03:03<08:24, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▊                                                          | 3667/15000 [03:03<08:20, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 24%|██████████████████▊                                                          | 3673/15000 [03:03<08:15, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|██████████████████▊                                                          | 3676/15000 [03:03<08:13, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|██████████████████▉                                                          | 3682/15000 [03:04<08:11, 23.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|██████████████████▉                                                          | 3688/15000 [03:04<08:08, 23.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|██████████████████▉                                                          | 3691/15000 [03:04<08:11, 23.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|██████████████████▉                                                          | 3697/15000 [03:04<08:14, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|███████████████████                                                          | 3703/15000 [03:05<08:23, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████                                                          | 3709/15000 [03:05<08:15, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████                                                          | 3712/15000 [03:05<08:13, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████                                                          | 3718/15000 [03:05<08:15, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████                                                          | 3721/15000 [03:05<08:30, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▏                                                         | 3727/15000 [03:06<08:42, 21.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|███████████████████▏                                                         | 3730/15000 [03:06<12:29, 15.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|███████████████████▏                                                         | 3736/15000 [03:06<10:17, 18.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▏                                                         | 3742/15000 [03:06<09:12, 20.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|███████████████████▏                                                         | 3745/15000 [03:07<08:54, 21.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▎                                                         | 3751/15000 [03:07<08:34, 21.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▎                                                         | 3757/15000 [03:07<08:21, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▎                                                         | 3763/15000 [03:07<08:13, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▎                                                         | 3769/15000 [03:08<08:10, 22.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▍                                                         | 3775/15000 [03:08<08:09, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▍                                                         | 3778/15000 [03:08<08:08, 22.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▍                                                         | 3784/15000 [03:08<08:33, 21.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▍                                                         | 3787/15000 [03:08<08:34, 21.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|███████████████████▍                                                         | 3793/15000 [03:09<08:18, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|███████████████████▌                                                         | 3799/15000 [03:09<08:11, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|███████████████████▌                                                         | 3802/15000 [03:09<08:09, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▌                                                         | 3808/15000 [03:09<08:16, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 25%|███████████████████▌                                                         | 3814/15000 [03:10<08:06, 23.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|███████████████████▌                                                         | 3820/15000 [03:10<08:13, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 25%|███████████████████▌                                                         | 3823/15000 [03:10<08:10, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|███████████████████▋                                                         | 3829/15000 [03:10<08:16, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 26%|███████████████████▋                                                         | 3835/15000 [03:11<08:12, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|███████████████████▋                                                         | 3841/15000 [03:11<08:10, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|███████████████████▋                                                         | 3847/15000 [03:11<08:07, 22.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|███████████████████▊                                                         | 3853/15000 [03:11<08:01, 23.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|███████████████████▊                                                         | 3856/15000 [03:12<08:02, 23.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 26%|███████████████████▊                                                         | 3862/15000 [03:12<08:14, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|███████████████████▊                                                         | 3868/15000 [03:12<08:21, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 26%|███████████████████▉                                                         | 3874/15000 [03:12<08:13, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|███████████████████▉                                                         | 3877/15000 [03:12<08:10, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 26%|███████████████████▉                                                         | 3883/15000 [03:13<08:06, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|███████████████████▉                                                         | 3886/15000 [03:13<08:07, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|███████████████████▉                                                         | 3892/15000 [03:13<08:01, 23.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████                                                         | 3898/15000 [03:13<08:04, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████                                                         | 3901/15000 [03:14<08:09, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 26%|████████████████████                                                         | 3907/15000 [03:14<10:59, 16.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████                                                         | 3910/15000 [03:14<10:38, 17.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 26%|████████████████████                                                         | 3916/15000 [03:15<11:59, 15.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████▏                                                        | 3922/15000 [03:15<09:55, 18.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████▏                                                        | 3928/15000 [03:15<08:55, 20.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████▏                                                        | 3931/15000 [03:15<08:41, 21.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████▏                                                        | 3937/15000 [03:16<08:17, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████▏                                                        | 3940/15000 [03:16<08:13, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████▎                                                        | 3946/15000 [03:16<08:08, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████▎                                                        | 3952/15000 [03:16<08:05, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████▎                                                        | 3955/15000 [03:16<08:07, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 26%|████████████████████▎                                                        | 3961/15000 [03:17<08:12, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 26%|████████████████████▎                                                        | 3967/15000 [03:17<08:06, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 26%|████████████████████▍                                                        | 3973/15000 [03:17<08:02, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▍                                                        | 3976/15000 [03:17<08:03, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 27%|████████████████████▍                                                        | 3982/15000 [03:17<08:00, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▍                                                        | 3988/15000 [03:18<07:56, 23.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▍                                                        | 3991/15000 [03:18<07:58, 23.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▌                                                        | 3997/15000 [03:18<08:47, 20.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 27%|████████████████████▌                                                        | 4000/15000 [03:18<08:37, 21.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▌                                                        | 4006/15000 [03:20<24:34,  7.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▌                                                        | 4009/15000 [03:20<19:39,  9.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▌                                                        | 4015/15000 [03:20<13:41, 13.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▋                                                        | 4018/15000 [03:20<12:02, 15.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▋                                                        | 4024/15000 [03:21<09:58, 18.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▋                                                        | 4030/15000 [03:21<08:56, 20.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▋                                                        | 4033/15000 [03:21<08:42, 20.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 27%|████████████████████▋                                                        | 4039/15000 [03:21<08:20, 21.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▊                                                        | 4045/15000 [03:22<08:08, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▊                                                        | 4051/15000 [03:22<08:02, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 27%|████████████████████▊                                                        | 4054/15000 [03:22<12:02, 15.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▊                                                        | 4060/15000 [03:22<09:55, 18.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 27%|████████████████████▊                                                        | 4066/15000 [03:23<08:53, 20.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▉                                                        | 4069/15000 [03:23<08:37, 21.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 27%|████████████████████▉                                                        | 4075/15000 [03:23<08:14, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▉                                                        | 4081/15000 [03:23<08:02, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▉                                                        | 4087/15000 [03:24<08:09, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|████████████████████▉                                                        | 4090/15000 [03:24<08:07, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|█████████████████████                                                        | 4096/15000 [03:24<08:01, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|█████████████████████                                                        | 4102/15000 [03:24<08:01, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 27%|█████████████████████                                                        | 4108/15000 [03:25<07:55, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|█████████████████████                                                        | 4114/15000 [03:25<08:02, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 27%|█████████████████████▏                                                       | 4117/15000 [03:25<08:20, 21.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 27%|█████████████████████▏                                                       | 4123/15000 [03:25<08:18, 21.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▏                                                       | 4129/15000 [03:26<08:10, 22.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 28%|█████████████████████▏                                                       | 4135/15000 [03:26<08:03, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▎                                                       | 4141/15000 [03:26<07:56, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▎                                                       | 4144/15000 [03:26<07:56, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▎                                                       | 4150/15000 [03:26<07:55, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▎                                                       | 4153/15000 [03:27<07:54, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 28%|█████████████████████▎                                                       | 4159/15000 [03:27<07:56, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▍                                                       | 4165/15000 [03:27<07:56, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▍                                                       | 4171/15000 [03:27<07:54, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 28%|█████████████████████▍                                                       | 4174/15000 [03:27<07:53, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 28%|█████████████████████▍                                                       | 4180/15000 [03:28<07:51, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▍                                                       | 4183/15000 [03:28<07:54, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▌                                                       | 4189/15000 [03:28<07:49, 23.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▌                                                       | 4195/15000 [03:28<07:50, 22.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▌                                                       | 4198/15000 [03:29<07:53, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 28%|█████████████████████▌                                                       | 4204/15000 [03:29<07:54, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▌                                                       | 4210/15000 [03:29<07:49, 22.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▋                                                       | 4216/15000 [03:29<07:50, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▋                                                       | 4222/15000 [03:30<07:50, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 28%|█████████████████████▋                                                       | 4225/15000 [03:30<07:51, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▋                                                       | 4231/15000 [03:30<07:44, 23.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▋                                                       | 4234/15000 [03:30<07:50, 22.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▊                                                       | 4240/15000 [03:30<08:03, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▊                                                       | 4246/15000 [03:31<08:16, 21.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 28%|█████████████████████▊                                                       | 4249/15000 [03:31<08:26, 21.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▊                                                       | 4252/15000 [03:31<08:30, 21.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 28%|█████████████████████▊                                                       | 4258/15000 [03:31<11:16, 15.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▉                                                       | 4263/15000 [03:32<10:01, 17.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 28%|█████████████████████▉                                                       | 4269/15000 [03:32<08:47, 20.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 28%|█████████████████████▉                                                       | 4272/15000 [03:32<08:27, 21.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|█████████████████████▉                                                       | 4278/15000 [03:32<08:18, 21.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|█████████████████████▉                                                       | 4281/15000 [03:33<08:22, 21.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████                                                       | 4287/15000 [03:33<08:28, 21.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 29%|██████████████████████                                                       | 4290/15000 [03:33<08:34, 20.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████                                                       | 4296/15000 [03:33<08:10, 21.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████                                                       | 4302/15000 [03:33<07:55, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████                                                       | 4308/15000 [03:34<07:50, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 29%|██████████████████████▏                                                      | 4311/15000 [03:34<07:49, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▏                                                      | 4317/15000 [03:34<07:46, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▏                                                      | 4323/15000 [03:34<07:41, 23.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▏                                                      | 4326/15000 [03:35<07:41, 23.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▏                                                      | 4332/15000 [03:35<07:40, 23.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▎                                                      | 4338/15000 [03:35<07:41, 23.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 29%|██████████████████████▎                                                      | 4341/15000 [03:35<11:25, 15.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▎                                                      | 4346/15000 [03:36<10:07, 17.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 29%|██████████████████████▎                                                      | 4351/15000 [03:36<11:53, 14.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▎                                                      | 4354/15000 [03:36<10:36, 16.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▍                                                      | 4360/15000 [03:36<09:04, 19.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▍                                                      | 4366/15000 [03:37<08:21, 21.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▍                                                      | 4372/15000 [03:37<08:00, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▍                                                      | 4378/15000 [03:37<10:30, 16.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 29%|██████████████████████▍                                                      | 4381/15000 [03:38<09:39, 18.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▌                                                      | 4387/15000 [03:38<08:46, 20.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▌                                                      | 4393/15000 [03:38<08:09, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▌                                                      | 4399/15000 [03:38<07:57, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 29%|██████████████████████▌                                                      | 4405/15000 [03:39<07:47, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▋                                                      | 4408/15000 [03:39<07:47, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▋                                                      | 4414/15000 [03:39<07:42, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▋                                                      | 4417/15000 [03:39<07:46, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 29%|██████████████████████▋                                                      | 4423/15000 [03:39<07:44, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▋                                                      | 4429/15000 [03:40<07:46, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▊                                                      | 4432/15000 [03:40<07:43, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 30%|██████████████████████▊                                                      | 4438/15000 [03:40<10:19, 17.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▊                                                      | 4441/15000 [03:40<09:34, 18.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▊                                                      | 4447/15000 [03:41<08:33, 20.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▊                                                      | 4453/15000 [03:41<08:31, 20.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▊                                                      | 4456/15000 [03:41<08:17, 21.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▉                                                      | 4462/15000 [03:41<07:56, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▉                                                      | 4468/15000 [03:42<07:46, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▉                                                      | 4471/15000 [03:42<07:45, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|██████████████████████▉                                                      | 4477/15000 [03:42<07:48, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████                                                      | 4483/15000 [03:42<07:44, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 30%|███████████████████████                                                      | 4489/15000 [03:43<07:39, 22.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████                                                      | 4495/15000 [03:43<07:47, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████                                                      | 4501/15000 [03:43<07:40, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 30%|███████████████████████                                                      | 4504/15000 [03:43<07:39, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████▏                                                     | 4510/15000 [03:43<07:38, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████▏                                                     | 4516/15000 [03:44<07:36, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████▏                                                     | 4519/15000 [03:44<07:33, 23.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 30%|███████████████████████▏                                                     | 4525/15000 [03:44<07:34, 23.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████▎                                                     | 4531/15000 [03:44<07:50, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████▎                                                     | 4537/15000 [03:45<07:40, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████▎                                                     | 4540/15000 [03:45<07:58, 21.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 30%|███████████████████████▎                                                     | 4546/15000 [03:45<07:45, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 30%|███████████████████████▎                                                     | 4552/15000 [03:45<07:39, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████▍                                                     | 4555/15000 [03:45<07:39, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████▍                                                     | 4561/15000 [03:46<07:33, 23.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 30%|███████████████████████▍                                                     | 4564/15000 [03:46<07:36, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 30%|███████████████████████▍                                                     | 4570/15000 [03:46<07:32, 23.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 31%|███████████████████████▍                                                     | 4576/15000 [03:46<07:32, 23.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▌                                                     | 4582/15000 [03:47<07:27, 23.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▌                                                     | 4585/15000 [03:47<07:30, 23.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▌                                                     | 4591/15000 [03:47<07:32, 23.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▌                                                     | 4594/15000 [03:47<07:33, 22.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▌                                                     | 4600/15000 [03:48<11:53, 14.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▋                                                     | 4605/15000 [03:48<10:20, 16.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▋                                                     | 4611/15000 [03:48<08:51, 19.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▋                                                     | 4614/15000 [03:48<08:47, 19.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▋                                                     | 4620/15000 [03:49<08:09, 21.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▋                                                     | 4626/15000 [03:49<07:51, 22.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▊                                                     | 4632/15000 [03:49<07:38, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▊                                                     | 4635/15000 [03:49<07:39, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 31%|███████████████████████▊                                                     | 4641/15000 [03:50<11:11, 15.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▊                                                     | 4647/15000 [03:50<09:20, 18.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▊                                                     | 4650/15000 [03:50<08:47, 19.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▉                                                     | 4656/15000 [03:50<08:09, 21.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▉                                                     | 4662/15000 [03:51<07:47, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▉                                                     | 4668/15000 [03:51<07:37, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|███████████████████████▉                                                     | 4671/15000 [03:51<07:36, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|████████████████████████                                                     | 4677/15000 [03:51<07:35, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 31%|████████████████████████                                                     | 4680/15000 [03:51<07:36, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|████████████████████████                                                     | 4686/15000 [03:52<07:34, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 31%|████████████████████████                                                     | 4692/15000 [03:52<07:42, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 31%|████████████████████████                                                     | 4698/15000 [03:52<07:35, 22.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|████████████████████████▏                                                    | 4704/15000 [03:52<07:32, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|████████████████████████▏                                                    | 4707/15000 [03:53<07:33, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 31%|████████████████████████▏                                                    | 4713/15000 [03:53<07:30, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 31%|████████████████████████▏                                                    | 4719/15000 [03:53<07:29, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▎                                                    | 4725/15000 [03:53<07:28, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 32%|████████████████████████▎                                                    | 4731/15000 [03:54<07:24, 23.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▎                                                    | 4734/15000 [03:54<07:27, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▎                                                    | 4740/15000 [03:54<07:29, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▎                                                    | 4743/15000 [03:54<07:31, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▍                                                    | 4749/15000 [03:54<07:29, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 32%|████████████████████████▍                                                    | 4755/15000 [03:55<07:27, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▍                                                    | 4761/15000 [03:55<07:24, 23.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 32%|████████████████████████▍                                                    | 4764/15000 [03:55<07:23, 23.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▍                                                    | 4769/15000 [03:56<10:37, 16.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▍                                                    | 4772/15000 [03:56<09:32, 17.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 32%|████████████████████████▌                                                    | 4775/15000 [03:56<12:52, 13.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▌                                                    | 4781/15000 [03:56<10:03, 16.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▌                                                    | 4787/15000 [03:57<08:42, 19.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 32%|████████████████████████▌                                                    | 4790/15000 [03:57<08:20, 20.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▌                                                    | 4796/15000 [03:57<07:52, 21.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 32%|████████████████████████▋                                                    | 4802/15000 [03:57<07:36, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▋                                                    | 4805/15000 [03:57<07:34, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▋                                                    | 4811/15000 [03:58<07:29, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▋                                                    | 4817/15000 [03:58<07:23, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▋                                                    | 4820/15000 [03:58<07:23, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▊                                                    | 4826/15000 [03:58<07:23, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 32%|████████████████████████▊                                                    | 4832/15000 [03:59<07:20, 23.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▊                                                    | 4838/15000 [03:59<07:20, 23.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▊                                                    | 4841/15000 [03:59<07:26, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▉                                                    | 4847/15000 [03:59<07:28, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▉                                                    | 4853/15000 [03:59<07:23, 22.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▉                                                    | 4859/15000 [04:00<07:30, 22.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 32%|████████████████████████▉                                                    | 4862/15000 [04:00<07:29, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 32%|████████████████████████▉                                                    | 4868/15000 [04:00<07:23, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 32%|█████████████████████████                                                    | 4874/15000 [04:00<07:26, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████                                                    | 4880/15000 [04:01<07:26, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████                                                    | 4883/15000 [04:01<07:26, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████                                                    | 4889/15000 [04:01<07:25, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████                                                    | 4892/15000 [04:01<07:28, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▏                                                   | 4898/15000 [04:01<07:24, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▏                                                   | 4904/15000 [04:02<07:23, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▏                                                   | 4910/15000 [04:02<07:25, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▏                                                   | 4913/15000 [04:02<07:25, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▎                                                   | 4919/15000 [04:02<07:25, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▎                                                   | 4925/15000 [04:03<07:24, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▎                                                   | 4928/15000 [04:03<07:28, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▎                                                   | 4934/15000 [04:03<07:27, 22.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▎                                                   | 4940/15000 [04:03<07:23, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▎                                                   | 4943/15000 [04:03<07:24, 22.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▍                                                   | 4949/15000 [04:04<07:27, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▍                                                   | 4955/15000 [04:04<07:27, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▍                                                   | 4961/15000 [04:04<07:23, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▍                                                   | 4964/15000 [04:04<07:27, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▌                                                   | 4970/15000 [04:05<07:26, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▌                                                   | 4976/15000 [04:05<07:26, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▌                                                   | 4979/15000 [04:05<07:26, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▌                                                   | 4985/15000 [04:05<07:22, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▌                                                   | 4991/15000 [04:06<07:21, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▋                                                   | 4997/15000 [04:06<07:23, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▋                                                   | 5000/15000 [04:06<11:24, 14.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▋                                                   | 5005/15000 [04:08<24:34,  6.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 33%|█████████████████████████▋                                                   | 5011/15000 [04:08<15:24, 10.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▊                                                   | 5017/15000 [04:08<11:10, 14.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 33%|█████████████████████████▊                                                   | 5020/15000 [04:08<10:01, 16.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|█████████████████████████▊                                                   | 5026/15000 [04:09<08:37, 19.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|█████████████████████████▊                                                   | 5032/15000 [04:09<07:56, 20.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|█████████████████████████▊                                                   | 5038/15000 [04:09<07:39, 21.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|█████████████████████████▉                                                   | 5041/15000 [04:09<07:33, 21.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|█████████████████████████▉                                                   | 5047/15000 [04:09<07:24, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|█████████████████████████▉                                                   | 5053/15000 [04:10<07:17, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|█████████████████████████▉                                                   | 5056/15000 [04:10<11:05, 14.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|█████████████████████████▉                                                   | 5062/15000 [04:10<09:21, 17.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|██████████████████████████                                                   | 5065/15000 [04:11<12:06, 13.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████                                                   | 5070/15000 [04:11<10:19, 16.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|██████████████████████████                                                   | 5074/15000 [04:11<12:30, 13.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████                                                   | 5079/15000 [04:12<10:15, 16.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████                                                   | 5082/15000 [04:12<10:02, 16.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|██████████████████████████                                                   | 5086/15000 [04:12<12:18, 13.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|██████████████████████████▏                                                  | 5091/15000 [04:12<10:13, 16.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|██████████████████████████▏                                                  | 5094/15000 [04:13<09:25, 17.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▏                                                  | 5100/15000 [04:13<08:14, 20.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▏                                                  | 5106/15000 [04:13<10:28, 15.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▏                                                  | 5112/15000 [04:14<08:46, 18.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|██████████████████████████▎                                                  | 5115/15000 [04:14<08:22, 19.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▎                                                  | 5118/15000 [04:14<07:58, 20.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [ True False  True False False False False  True  True False False  True
 False False False  True  True False  True  True False  True False False
 False False  True  True  True  True False False False  True  True False
 False False  True False  True False False False  True False False  True
 False False False False False False False False False False False  True
  True False  True  True]
<<<<< Epsilon in training is 0.283951786063016>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 46

 34%|██████████████████████████▎                                                  | 5124/15000 [04:15<15:11, 10.84it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|██████████████████████████▎                                                  | 5127/15000 [04:15<12:46, 12.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▎                                                  | 5133/15000 [04:15<09:54, 16.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▎                                                  | 5136/15000 [04:15<09:05, 18.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▍                                                  | 5142/15000 [04:16<08:19, 19.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▍                                                  | 5145/15000 [04:16<08:21, 19.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|██████████████████████████▍                                                  | 5151/15000 [04:16<07:59, 20.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▍                                                  | 5154/15000 [04:16<07:47, 21.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 34%|██████████████████████████▍                                                  | 5160/15000 [04:16<07:29, 21.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▌                                                  | 5166/15000 [04:17<07:19, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▌                                                  | 5169/15000 [04:17<07:14, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 34%|██████████████████████████▌                                                  | 5175/15000 [04:17<07:11, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▌                                                  | 5181/15000 [04:17<07:09, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▋                                                  | 5187/15000 [04:18<07:11, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▋                                                  | 5193/15000 [04:18<07:12, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▋                                                  | 5199/15000 [04:18<07:07, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▋                                                  | 5202/15000 [04:18<07:17, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▋                                                  | 5208/15000 [04:19<07:12, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 35%|██████████████████████████▋                                                  | 5211/15000 [04:19<07:09, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▊                                                  | 5217/15000 [04:19<07:06, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 35%|██████████████████████████▊                                                  | 5223/15000 [04:19<07:06, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▊                                                  | 5229/15000 [04:19<07:17, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▊                                                  | 5235/15000 [04:20<07:22, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▉                                                  | 5238/15000 [04:20<07:31, 21.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▉                                                  | 5244/15000 [04:20<09:44, 16.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 35%|██████████████████████████▉                                                  | 5250/15000 [04:21<08:22, 19.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▉                                                  | 5253/15000 [04:21<08:00, 20.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|██████████████████████████▉                                                  | 5259/15000 [04:21<07:31, 21.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|███████████████████████████                                                  | 5265/15000 [04:21<07:17, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 35%|███████████████████████████                                                  | 5268/15000 [04:21<07:15, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|███████████████████████████                                                  | 5274/15000 [04:22<07:11, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 35%|███████████████████████████                                                  | 5280/15000 [04:22<07:05, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|███████████████████████████▏                                                 | 5286/15000 [04:22<07:02, 22.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 35%|███████████████████████████▏                                                 | 5292/15000 [04:22<07:00, 23.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|███████████████████████████▏                                                 | 5295/15000 [04:23<07:03, 22.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|███████████████████████████▏                                                 | 5301/15000 [04:23<07:00, 23.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 35%|███████████████████████████▏                                                 | 5307/15000 [04:23<06:58, 23.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|███████████████████████████▎                                                 | 5310/15000 [04:23<07:09, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 35%|███████████████████████████▎                                                 | 5316/15000 [04:24<07:14, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 35%|███████████████████████████▎                                                 | 5322/15000 [04:24<07:12, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▎                                                 | 5325/15000 [04:24<07:08, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▎                                                 | 5331/15000 [04:24<10:41, 15.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▍                                                 | 5337/15000 [04:25<08:51, 18.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▍                                                 | 5340/15000 [04:25<08:24, 19.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 36%|███████████████████████████▍                                                 | 5346/15000 [04:25<07:45, 20.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▍                                                 | 5352/15000 [04:25<07:25, 21.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 36%|███████████████████████████▍                                                 | 5355/15000 [04:25<07:17, 22.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▌                                                 | 5361/15000 [04:26<07:11, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 36%|███████████████████████████▌                                                 | 5367/15000 [04:26<07:07, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 36%|███████████████████████████▌                                                 | 5370/15000 [04:26<07:05, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 36%|███████████████████████████▌                                                 | 5376/15000 [04:26<07:22, 21.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650] and comparison is [False False  True False False  True False  True  True False False  True
 False False False False False False False False  True False False False
 False False  True False False False False  True  True False  True  True
 False False False False False False False False False False False False
 False  True False False False  True False False False False  True  True
 False False False False]
<<<<< Epsilon in training is 0.27912827741229157>>>>>>>
<<<<< next_action is [ 4650  4650  4650  4650  4650  4650  4650  4650  4650  4650  4650  4650
  4650  4650  4650  4650  4650  4650  4650  4650  4650  4650  4650  4650
  46

 36%|███████████████████████████▌                                                 | 5379/15000 [04:27<12:54, 12.42it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▋                                                 | 5384/15000 [04:27<10:19, 15.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▋                                                 | 5387/15000 [04:27<09:22, 17.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 36%|███████████████████████████▋                                                 | 5393/15000 [04:28<08:11, 19.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▋                                                 | 5399/15000 [04:28<07:30, 21.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▋                                                 | 5405/15000 [04:28<07:12, 22.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▊                                                 | 5408/15000 [04:28<07:09, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 36%|███████████████████████████▊                                                 | 5414/15000 [04:28<07:01, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▊                                                 | 5417/15000 [04:29<07:03, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 36%|███████████████████████████▊                                                 | 5423/15000 [04:29<06:59, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▊                                                 | 5429/15000 [04:29<06:58, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▉                                                 | 5435/15000 [04:29<06:55, 23.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▉                                                 | 5441/15000 [04:30<06:53, 23.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▉                                                 | 5444/15000 [04:30<06:57, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|███████████████████████████▉                                                 | 5450/15000 [04:30<06:55, 23.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|████████████████████████████                                                 | 5456/15000 [04:30<06:59, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|████████████████████████████                                                 | 5462/15000 [04:31<06:58, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 36%|████████████████████████████                                                 | 5465/15000 [04:31<06:59, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 36%|████████████████████████████                                                 | 5471/15000 [04:31<06:59, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████                                                 | 5477/15000 [04:31<07:02, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▏                                                | 5483/15000 [04:31<06:59, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▏                                                | 5489/15000 [04:32<06:57, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▏                                                | 5492/15000 [04:32<06:57, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▏                                                | 5498/15000 [04:32<07:13, 21.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▏                                                | 5501/15000 [04:32<07:21, 21.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▎                                                | 5507/15000 [04:33<09:31, 16.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▎                                                | 5510/15000 [04:33<08:45, 18.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▎                                                | 5516/15000 [04:33<07:47, 20.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▎                                                | 5522/15000 [04:33<07:19, 21.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▎                                                | 5525/15000 [04:34<07:16, 21.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▍                                                | 5531/15000 [04:34<07:07, 22.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▍                                                | 5537/15000 [04:34<06:59, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▍                                                | 5543/15000 [04:34<07:03, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▍                                                | 5546/15000 [04:35<07:01, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▍                                                | 5549/15000 [04:35<07:11, 21.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▌                                                | 5555/15000 [04:35<09:24, 16.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▌                                                | 5558/15000 [04:35<08:39, 18.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▌                                                | 5563/15000 [04:36<11:37, 13.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▌                                                | 5566/15000 [04:36<10:09, 15.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▌                                                | 5571/15000 [04:36<08:43, 18.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▋                                                | 5577/15000 [04:36<07:49, 20.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▋                                                | 5580/15000 [04:37<07:30, 20.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▋                                                | 5586/15000 [04:37<07:08, 21.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▋                                                | 5589/15000 [04:37<07:19, 21.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▋                                                | 5595/15000 [04:37<07:09, 21.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▊                                                | 5601/15000 [04:37<07:00, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▊                                                | 5604/15000 [04:38<06:58, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▊                                                | 5610/15000 [04:38<06:54, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 37%|████████████████████████████▊                                                | 5616/15000 [04:38<06:54, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 37%|████████████████████████████▊                                                | 5619/15000 [04:38<06:57, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|████████████████████████████▉                                                | 5625/15000 [04:39<06:56, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|████████████████████████████▉                                                | 5631/15000 [04:39<06:51, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650] and comparison is [False  True False  True False  True False False False False False  True
 False False False  True False False False  True False False False False
 False  True False False False False False False False False  True False
 False False False  True False  True  True  True  True False False False
 False  True False False False False  True False False False False  True
 False False False False]
<<<<< Epsilon in training is 0.2743867060369976>>>>>>>
<<<<< next_action is [465

 38%|████████████████████████████▉                                                | 5634/15000 [04:39<12:21, 12.64it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|████████████████████████████▉                                                | 5640/15000 [04:40<09:34, 16.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|████████████████████████████▉                                                | 5646/15000 [04:40<08:12, 19.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████                                                | 5652/15000 [04:40<07:27, 20.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████                                                | 5655/15000 [04:40<07:14, 21.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████                                                | 5661/15000 [04:41<07:22, 21.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████                                                | 5666/15000 [04:41<10:17, 15.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████                                                | 5671/15000 [04:41<08:56, 17.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▏                                               | 5674/15000 [04:41<08:13, 18.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████▏                                               | 5680/15000 [04:42<07:49, 19.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████▏                                               | 5686/15000 [04:42<07:17, 21.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▏                                               | 5689/15000 [04:42<07:08, 21.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▏                                               | 5695/15000 [04:42<06:56, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████▏                                               | 5698/15000 [04:42<07:04, 21.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▎                                               | 5701/15000 [04:43<10:08, 15.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████▎                                               | 5707/15000 [04:43<08:26, 18.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████▎                                               | 5713/15000 [04:43<07:32, 20.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████▎                                               | 5716/15000 [04:43<07:22, 20.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▎                                               | 5722/15000 [04:44<07:03, 21.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▍                                               | 5728/15000 [04:44<06:56, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▍                                               | 5734/15000 [04:44<06:49, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▍                                               | 5737/15000 [04:44<06:47, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▍                                               | 5743/15000 [04:45<06:46, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████▍                                               | 5746/15000 [04:45<06:47, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▌                                               | 5752/15000 [04:45<06:45, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████▌                                               | 5758/15000 [04:45<06:54, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▌                                               | 5764/15000 [04:46<06:49, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 38%|█████████████████████████████▌                                               | 5767/15000 [04:46<07:04, 21.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 38%|█████████████████████████████▋                                               | 5773/15000 [04:46<07:06, 21.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|█████████████████████████████▋                                               | 5776/15000 [04:46<07:03, 21.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|█████████████████████████████▋                                               | 5782/15000 [04:46<06:58, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|█████████████████████████████▋                                               | 5785/15000 [04:47<10:05, 15.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|█████████████████████████████▋                                               | 5791/15000 [04:47<08:22, 18.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|█████████████████████████████▊                                               | 5797/15000 [04:47<07:32, 20.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|█████████████████████████████▊                                               | 5803/15000 [04:48<07:07, 21.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|█████████████████████████████▊                                               | 5806/15000 [04:48<06:59, 21.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|█████████████████████████████▊                                               | 5812/15000 [04:48<06:53, 22.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|█████████████████████████████▊                                               | 5818/15000 [04:48<06:44, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|█████████████████████████████▉                                               | 5821/15000 [04:48<06:43, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|█████████████████████████████▉                                               | 5827/15000 [04:49<06:50, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|█████████████████████████████▉                                               | 5833/15000 [04:49<06:44, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|█████████████████████████████▉                                               | 5839/15000 [04:49<06:40, 22.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|██████████████████████████████                                               | 5845/15000 [04:49<06:42, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|██████████████████████████████                                               | 5848/15000 [04:50<06:42, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████                                               | 5854/15000 [04:50<06:41, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████                                               | 5860/15000 [04:50<06:39, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████                                               | 5866/15000 [04:50<06:38, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|██████████████████████████████▏                                              | 5869/15000 [04:50<06:40, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|██████████████████████████████▏                                              | 5872/15000 [04:51<06:53, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 39%|██████████████████████████████▏                                              | 5878/15000 [04:51<09:14, 16.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████▏                                              | 5884/15000 [04:51<07:57, 19.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████▏                                              | 5887/15000 [04:51<07:33, 20.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650] and comparison is [False  True False  True False False  True False False False False  True
  True False False False False  True False False  True False False False
 False False False  True False False False False  True False  True  True
 False False  True False False False False False False False False False
  True  True  True False False False  True False False False  True  True
 False False False False]
<<<<< Epsilon in training is 0.269725680062963>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4

 39%|██████████████████████████████▏                                              | 5890/15000 [04:52<12:47, 11.87it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████▎                                              | 5896/15000 [04:52<09:41, 15.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████▎                                              | 5902/15000 [04:52<08:09, 18.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████▎                                              | 5908/15000 [04:53<07:22, 20.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████▎                                              | 5914/15000 [04:53<06:58, 21.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████▎                                              | 5917/15000 [04:53<06:51, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 39%|██████████████████████████████▍                                              | 5923/15000 [04:53<06:42, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▍                                              | 5929/15000 [04:54<06:36, 22.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▍                                              | 5932/15000 [04:54<06:38, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▍                                              | 5938/15000 [04:54<06:39, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▍                                              | 5941/15000 [04:54<06:39, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▌                                              | 5947/15000 [04:54<06:38, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▌                                              | 5953/15000 [04:55<06:34, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▌                                              | 5959/15000 [04:55<06:40, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▌                                              | 5965/15000 [04:55<06:36, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▋                                              | 5968/15000 [04:55<06:40, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▋                                              | 5974/15000 [04:56<06:43, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▋                                              | 5980/15000 [04:56<06:39, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 40%|██████████████████████████████▋                                              | 5983/15000 [04:56<06:42, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▋                                              | 5989/15000 [04:56<06:43, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▊                                              | 5992/15000 [04:56<06:42, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 40%|██████████████████████████████▊                                              | 5998/15000 [04:57<06:54, 21.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▊                                              | 6003/15000 [04:58<22:57,  6.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▊                                              | 6007/15000 [04:59<16:16,  9.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 40%|██████████████████████████████▊                                              | 6013/15000 [04:59<10:42, 13.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▉                                              | 6019/15000 [04:59<08:39, 17.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 40%|██████████████████████████████▉                                              | 6022/15000 [04:59<08:37, 17.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 40%|██████████████████████████████▉                                              | 6025/15000 [05:00<11:26, 13.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|██████████████████████████████▉                                              | 6031/15000 [05:00<08:53, 16.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 40%|██████████████████████████████▉                                              | 6034/15000 [05:00<08:13, 18.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|███████████████████████████████                                              | 6040/15000 [05:00<07:26, 20.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|███████████████████████████████                                              | 6046/15000 [05:01<06:59, 21.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|███████████████████████████████                                              | 6052/15000 [05:01<06:46, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 40%|███████████████████████████████                                              | 6055/15000 [05:01<06:40, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|███████████████████████████████                                              | 6061/15000 [05:01<06:40, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|███████████████████████████████▏                                             | 6067/15000 [05:01<06:35, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 40%|███████████████████████████████▏                                             | 6073/15000 [05:02<06:32, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▏                                             | 6076/15000 [05:02<06:33, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▏                                             | 6082/15000 [05:02<06:30, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▏                                             | 6085/15000 [05:02<06:44, 22.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▎                                             | 6091/15000 [05:03<06:55, 21.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 41%|███████████████████████████████▎                                             | 6094/15000 [05:03<06:46, 21.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▎                                             | 6097/15000 [05:03<07:00, 21.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▎                                             | 6103/15000 [05:03<09:19, 15.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 41%|███████████████████████████████▎                                             | 6106/15000 [05:03<08:30, 17.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▎                                             | 6111/15000 [05:04<07:44, 19.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▍                                             | 6117/15000 [05:04<07:06, 20.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▍                                             | 6123/15000 [05:04<06:44, 21.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▍                                             | 6126/15000 [05:04<06:40, 22.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 41%|███████████████████████████████▍                                             | 6132/15000 [05:05<06:34, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▌                                             | 6138/15000 [05:05<06:30, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 41%|███████████████████████████████▌                                             | 6141/15000 [05:05<09:30, 15.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▌                                             | 6144/15000 [05:05<08:33, 17.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False  True False  True False False False  True  True False False
 False False False False  True False False  True False False False  True
 False False False  True False False False  True False  True False False
 False False False False  True False False False  True  True False False
 False False False False False False False False False False False  True
 False False False False]
<<<<< Epsilon in training is 0.26514383125987956>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650]>>>>>>>
<<<<< next_a

 41%|███████████████████████████████▌                                             | 6147/15000 [05:06<13:12, 11.17it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▌                                             | 6150/15000 [05:06<11:09, 13.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 41%|███████████████████████████████▌                                             | 6156/15000 [05:06<08:43, 16.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▋                                             | 6162/15000 [05:06<07:31, 19.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 41%|███████████████████████████████▋                                             | 6168/15000 [05:07<07:02, 20.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▋                                             | 6174/15000 [05:07<06:45, 21.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 41%|███████████████████████████████▋                                             | 6180/15000 [05:07<06:36, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 41%|███████████████████████████████▋                                             | 6183/15000 [05:07<06:33, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▊                                             | 6189/15000 [05:08<06:27, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▊                                             | 6195/15000 [05:08<06:37, 22.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 41%|███████████████████████████████▊                                             | 6201/15000 [05:08<06:33, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▊                                             | 6204/15000 [05:08<06:30, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▉                                             | 6210/15000 [05:09<06:26, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▉                                             | 6216/15000 [05:09<06:25, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 41%|███████████████████████████████▉                                             | 6219/15000 [05:09<06:27, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|███████████████████████████████▉                                             | 6225/15000 [05:09<06:27, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|███████████████████████████████▉                                             | 6231/15000 [05:10<06:24, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████                                             | 6234/15000 [05:10<06:25, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████                                             | 6240/15000 [05:10<09:05, 16.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████                                             | 6245/15000 [05:10<08:07, 17.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 42%|████████████████████████████████                                             | 6251/15000 [05:11<07:09, 20.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████                                             | 6254/15000 [05:11<07:10, 20.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▏                                            | 6260/15000 [05:11<07:22, 19.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▏                                            | 6263/15000 [05:11<10:10, 14.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▏                                            | 6269/15000 [05:12<08:14, 17.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▏                                            | 6275/15000 [05:12<07:18, 19.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▏                                            | 6278/15000 [05:12<07:01, 20.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▎                                            | 6284/15000 [05:12<07:02, 20.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 42%|████████████████████████████████▎                                            | 6287/15000 [05:13<06:51, 21.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▎                                            | 6293/15000 [05:13<08:47, 16.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▎                                            | 6296/15000 [05:13<08:06, 17.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▎                                            | 6302/15000 [05:13<07:13, 20.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▍                                            | 6308/15000 [05:14<06:47, 21.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▍                                            | 6311/15000 [05:14<06:43, 21.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▍                                            | 6317/15000 [05:14<06:42, 21.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 42%|████████████████████████████████▍                                            | 6320/15000 [05:14<09:37, 15.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▍                                            | 6324/15000 [05:15<08:58, 16.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▍                                            | 6330/15000 [05:15<09:53, 14.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▌                                            | 6335/15000 [05:15<08:32, 16.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▌                                            | 6338/15000 [05:16<07:54, 18.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▌                                            | 6344/15000 [05:16<07:04, 20.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▌                                            | 6350/15000 [05:16<06:47, 21.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▋                                            | 6356/15000 [05:16<06:31, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▋                                            | 6359/15000 [05:16<06:27, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▋                                            | 6365/15000 [05:17<06:31, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▋                                            | 6368/15000 [05:17<06:30, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 42%|████████████████████████████████▋                                            | 6374/15000 [05:17<06:29, 22.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|████████████████████████████████▊                                            | 6380/15000 [05:17<06:22, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|████████████████████████████████▊                                            | 6383/15000 [05:18<06:23, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 43%|████████████████████████████████▊                                            | 6389/15000 [05:18<06:19, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|████████████████████████████████▊                                            | 6395/15000 [05:18<06:18, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|████████████████████████████████▊                                            | 6398/15000 [05:18<06:19, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650] and comparison is [False  True False False False False  True False  True False  True False
 False  True False  True  True False False  True False  True False False
 False  True False False False  True False False False  True False False
 False False False False False False False  True False False False False
 False False  True False  True False  True False False False False  True
 False  True  True False]
<<<<< Epsilon in training is 0.2606398146396621>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 46

 43%|████████████████████████████████▊                                            | 6401/15000 [05:19<11:18, 12.68it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 43%|████████████████████████████████▉                                            | 6407/15000 [05:19<08:46, 16.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|████████████████████████████████▉                                            | 6413/15000 [05:19<07:31, 19.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|████████████████████████████████▉                                            | 6416/15000 [05:19<07:07, 20.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|████████████████████████████████▉                                            | 6422/15000 [05:20<06:40, 21.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|████████████████████████████████▉                                            | 6428/15000 [05:20<06:32, 21.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████                                            | 6431/15000 [05:20<06:25, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████                                            | 6437/15000 [05:20<06:30, 21.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 43%|█████████████████████████████████                                            | 6443/15000 [05:21<06:22, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████                                            | 6446/15000 [05:21<06:25, 22.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████                                            | 6452/15000 [05:21<06:51, 20.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 43%|█████████████████████████████████▏                                           | 6458/15000 [05:22<08:58, 15.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 43%|█████████████████████████████████▏                                           | 6461/15000 [05:22<08:10, 17.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 43%|█████████████████████████████████▏                                           | 6466/15000 [05:22<09:43, 14.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████▏                                           | 6472/15000 [05:22<07:54, 17.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████▏                                           | 6475/15000 [05:22<07:20, 19.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 43%|█████████████████████████████████▎                                           | 6481/15000 [05:23<06:45, 20.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 43%|█████████████████████████████████▎                                           | 6484/15000 [05:23<09:27, 15.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████▎                                           | 6490/15000 [05:23<07:50, 18.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████▎                                           | 6493/15000 [05:23<07:20, 19.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████▎                                           | 6499/15000 [05:24<06:50, 20.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 43%|█████████████████████████████████▍                                           | 6505/15000 [05:24<06:33, 21.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████▍                                           | 6508/15000 [05:24<06:30, 21.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████▍                                           | 6514/15000 [05:24<06:22, 22.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████▍                                           | 6520/15000 [05:25<06:16, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 43%|█████████████████████████████████▍                                           | 6523/15000 [05:25<06:18, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▌                                           | 6526/15000 [05:25<06:19, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|█████████████████████████████████▌                                           | 6532/15000 [05:25<08:12, 17.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▌                                           | 6535/15000 [05:26<07:36, 18.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▌                                           | 6541/15000 [05:26<06:59, 20.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▌                                           | 6547/15000 [05:26<06:34, 21.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|█████████████████████████████████▋                                           | 6553/15000 [05:26<06:25, 21.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▋                                           | 6556/15000 [05:26<06:20, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▋                                           | 6562/15000 [05:27<06:15, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▋                                           | 6568/15000 [05:27<06:13, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▋                                           | 6571/15000 [05:27<06:14, 22.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▊                                           | 6577/15000 [05:27<06:08, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▊                                           | 6580/15000 [05:28<06:10, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|█████████████████████████████████▊                                           | 6586/15000 [05:28<06:12, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▊                                           | 6592/15000 [05:28<06:09, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▊                                           | 6598/15000 [05:28<06:10, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|█████████████████████████████████▉                                           | 6604/15000 [05:29<06:12, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|█████████████████████████████████▉                                           | 6610/15000 [05:29<06:14, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|█████████████████████████████████▉                                           | 6613/15000 [05:29<06:11, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|█████████████████████████████████▉                                           | 6619/15000 [05:29<06:12, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|██████████████████████████████████                                           | 6625/15000 [05:30<06:10, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|██████████████████████████████████                                           | 6628/15000 [05:30<06:13, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|██████████████████████████████████                                           | 6634/15000 [05:30<06:10, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|██████████████████████████████████                                           | 6640/15000 [05:30<06:13, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|██████████████████████████████████                                           | 6646/15000 [05:30<06:23, 21.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|██████████████████████████████████▏                                          | 6649/15000 [05:31<06:17, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|██████████████████████████████████▏                                          | 6655/15000 [05:31<06:10, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False  True False False  True  True False False  True False False
 False  True  True False  True False  True False  True False  True False
  True False False  True  True False  True False  True False False False
 False False False False False False  True False False  True False False
  True False False False False False False False False  True False False
 False False  True  True]
<<<<< Epsilon in training is 0.25621230806163126>>>>>>>
<<<<< n

 44%|██████████████████████████████████▏                                          | 6658/15000 [05:31<11:13, 12.38it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 44%|██████████████████████████████████▏                                          | 6664/15000 [05:32<08:36, 16.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|██████████████████████████████████▏                                          | 6670/15000 [05:32<07:18, 19.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 44%|██████████████████████████████████▎                                          | 6673/15000 [05:32<07:01, 19.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▎                                          | 6679/15000 [05:32<06:34, 21.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▎                                          | 6685/15000 [05:33<06:21, 21.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▎                                          | 6688/15000 [05:33<06:16, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▎                                          | 6691/15000 [05:33<06:13, 22.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▎                                          | 6696/15000 [05:33<08:44, 15.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▍                                          | 6699/15000 [05:33<07:51, 17.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▍                                          | 6703/15000 [05:34<07:33, 18.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▍                                          | 6709/15000 [05:34<06:43, 20.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▍                                          | 6715/15000 [05:34<06:22, 21.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▍                                          | 6718/15000 [05:34<06:16, 22.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▌                                          | 6724/15000 [05:35<06:08, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▌                                          | 6730/15000 [05:35<06:03, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▌                                          | 6733/15000 [05:35<06:05, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▌                                          | 6739/15000 [05:35<06:07, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▌                                          | 6744/15000 [05:36<08:30, 16.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▋                                          | 6749/15000 [05:36<07:33, 18.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▋                                          | 6755/15000 [05:36<06:41, 20.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▋                                          | 6758/15000 [05:36<06:30, 21.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▋                                          | 6764/15000 [05:37<06:14, 21.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▊                                          | 6770/15000 [05:37<06:09, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▊                                          | 6773/15000 [05:37<06:06, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▊                                          | 6779/15000 [05:37<06:03, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▊                                          | 6785/15000 [05:37<06:00, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▊                                          | 6791/15000 [05:38<05:58, 22.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▉                                          | 6797/15000 [05:38<06:01, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|██████████████████████████████████▉                                          | 6800/15000 [05:38<06:04, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▉                                          | 6806/15000 [05:38<06:06, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▉                                          | 6809/15000 [05:39<06:05, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 45%|██████████████████████████████████▉                                          | 6815/15000 [05:39<06:00, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 45%|███████████████████████████████████                                          | 6821/15000 [05:39<05:59, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████                                          | 6827/15000 [05:39<05:59, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████                                          | 6833/15000 [05:40<05:58, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████                                          | 6839/15000 [05:40<05:56, 22.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████                                          | 6842/15000 [05:40<05:58, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▏                                         | 6848/15000 [05:40<05:58, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▏                                         | 6854/15000 [05:41<05:58, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▏                                         | 6857/15000 [05:41<05:57, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▏                                         | 6863/15000 [05:41<06:11, 21.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▏                                         | 6866/15000 [05:41<06:45, 20.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████▎                                         | 6872/15000 [05:42<08:38, 15.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▎                                         | 6877/15000 [05:42<07:39, 17.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▎                                         | 6882/15000 [05:42<06:59, 19.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████▎                                         | 6885/15000 [05:42<06:41, 20.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▎                                         | 6891/15000 [05:43<06:54, 19.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▍                                         | 6897/15000 [05:43<08:19, 16.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████▍                                         | 6903/15000 [05:43<07:11, 18.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████▍                                         | 6909/15000 [05:44<06:29, 20.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▍                                         | 6912/15000 [05:44<06:18, 21.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650] and comparison is [False  True False False False False  True False False False  True False
 False  True False False False False False  True False False False False
  True False False False False False False False  True False False  True
 False False  True  True False False False False False False  True False
  True False False False  True  True False  True False False  True  True
 False False  True False]
<<<<< Epsilon in training is 0.2518600118444027>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 

 46%|███████████████████████████████████▍                                         | 6915/15000 [05:44<11:04, 12.17it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▌                                         | 6918/15000 [05:44<09:35, 14.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▌                                         | 6924/15000 [05:45<07:40, 17.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▌                                         | 6930/15000 [05:45<06:49, 19.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▌                                         | 6936/15000 [05:45<06:22, 21.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▌                                         | 6939/15000 [05:45<06:15, 21.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▋                                         | 6945/15000 [05:45<06:08, 21.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 46%|███████████████████████████████████▋                                         | 6948/15000 [05:46<06:05, 22.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████▋                                         | 6954/15000 [05:46<05:57, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████▋                                         | 6960/15000 [05:46<05:55, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████▊                                         | 6966/15000 [05:46<05:53, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████▊                                         | 6969/15000 [05:47<05:52, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 46%|███████████████████████████████████▊                                         | 6975/15000 [05:47<06:05, 21.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|███████████████████████████████████▊                                         | 6978/15000 [05:47<06:00, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 47%|███████████████████████████████████▊                                         | 6984/15000 [05:47<06:04, 21.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|███████████████████████████████████▊                                         | 6987/15000 [05:47<06:00, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|███████████████████████████████████▉                                         | 6993/15000 [05:48<05:55, 22.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|███████████████████████████████████▉                                         | 6999/15000 [05:48<05:52, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|███████████████████████████████████▉                                         | 7004/15000 [05:49<19:46,  6.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|███████████████████████████████████▉                                         | 7009/15000 [05:50<15:35,  8.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████                                         | 7015/15000 [05:50<10:15, 12.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████                                         | 7018/15000 [05:50<08:51, 15.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████                                         | 7024/15000 [05:51<07:19, 18.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████                                         | 7030/15000 [05:51<06:31, 20.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████                                         | 7036/15000 [05:51<06:09, 21.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▏                                        | 7039/15000 [05:51<06:01, 22.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▏                                        | 7045/15000 [05:51<05:53, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▏                                        | 7048/15000 [05:52<05:52, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▏                                        | 7054/15000 [05:52<05:47, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 47%|████████████████████████████████████▏                                        | 7060/15000 [05:52<05:47, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▎                                        | 7063/15000 [05:52<05:49, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▎                                        | 7069/15000 [05:53<05:51, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▎                                        | 7075/15000 [05:53<05:48, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▎                                        | 7081/15000 [05:53<05:45, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▎                                        | 7084/15000 [05:53<05:48, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 47%|████████████████████████████████████▍                                        | 7090/15000 [05:53<05:51, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 47%|████████████████████████████████████▍                                        | 7096/15000 [05:54<05:47, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▍                                        | 7099/15000 [05:54<05:48, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 47%|████████████████████████████████████▍                                        | 7105/15000 [05:54<05:45, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▌                                        | 7111/15000 [05:54<05:47, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▌                                        | 7117/15000 [05:55<06:01, 21.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 47%|████████████████████████████████████▌                                        | 7123/15000 [05:55<05:53, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▌                                        | 7126/15000 [05:55<05:51, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▌                                        | 7132/15000 [05:55<05:52, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▋                                        | 7138/15000 [05:56<05:50, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▋                                        | 7141/15000 [05:56<05:46, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▋                                        | 7147/15000 [05:56<05:44, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 48%|████████████████████████████████████▋                                        | 7153/15000 [05:56<05:43, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▋                                        | 7156/15000 [05:56<05:44, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▊                                        | 7162/15000 [05:57<05:41, 22.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▊                                        | 7168/15000 [05:57<05:39, 23.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False False False  True False False  True False
  True False  True False False  True False False  True False False  True
 False False False False False False False False False False  True False
 False  True False False  True False False False False False False False
 False False False False False  True False False False False False False
 False  True False  True]
<<<<< Epsilon in training is 0.24758164838437>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 

 48%|████████████████████████████████████▊                                        | 7171/15000 [05:57<10:18, 12.67it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▊                                        | 7174/15000 [05:58<08:55, 14.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 48%|████████████████████████████████████▊                                        | 7180/15000 [05:58<07:17, 17.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▉                                        | 7186/15000 [05:58<06:33, 19.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 48%|████████████████████████████████████▉                                        | 7192/15000 [05:58<06:06, 21.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|████████████████████████████████████▉                                        | 7195/15000 [05:58<06:02, 21.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 48%|████████████████████████████████████▉                                        | 7201/15000 [05:59<05:54, 22.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 48%|████████████████████████████████████▉                                        | 7207/15000 [05:59<05:46, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████                                        | 7210/15000 [05:59<05:44, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████                                        | 7216/15000 [05:59<05:43, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████                                        | 7222/15000 [06:00<05:46, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████                                        | 7225/15000 [06:00<05:56, 21.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████                                        | 7231/15000 [06:00<05:50, 22.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████▏                                       | 7237/15000 [06:00<05:50, 22.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 48%|█████████████████████████████████████▏                                       | 7243/15000 [06:01<05:45, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 48%|█████████████████████████████████████▏                                       | 7249/15000 [06:01<05:38, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████▏                                       | 7252/15000 [06:01<05:37, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████▎                                       | 7258/15000 [06:01<05:37, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████▎                                       | 7261/15000 [06:01<05:45, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████▎                                       | 7267/15000 [06:02<06:05, 21.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 48%|█████████████████████████████████████▎                                       | 7273/15000 [06:02<06:09, 20.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▎                                       | 7276/15000 [06:02<06:22, 20.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▎                                       | 7279/15000 [06:03<08:52, 14.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 49%|█████████████████████████████████████▍                                       | 7285/15000 [06:03<07:14, 17.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 49%|█████████████████████████████████████▍                                       | 7288/15000 [06:03<06:48, 18.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 49%|█████████████████████████████████████▍                                       | 7294/15000 [06:03<06:13, 20.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▍                                       | 7300/15000 [06:03<05:52, 21.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▌                                       | 7306/15000 [06:04<05:41, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▌                                       | 7309/15000 [06:04<05:39, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▌                                       | 7315/15000 [06:04<05:37, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 49%|█████████████████████████████████████▌                                       | 7321/15000 [06:04<05:36, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▌                                       | 7327/15000 [06:05<05:37, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▋                                       | 7330/15000 [06:05<05:47, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▋                                       | 7336/15000 [06:05<05:49, 21.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▋                                       | 7342/15000 [06:05<05:44, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▋                                       | 7345/15000 [06:05<05:52, 21.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▋                                       | 7351/15000 [06:06<06:11, 20.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▊                                       | 7357/15000 [06:06<07:46, 16.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 49%|█████████████████████████████████████▊                                       | 7360/15000 [06:06<07:07, 17.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▊                                       | 7366/15000 [06:07<06:24, 19.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 49%|█████████████████████████████████████▊                                       | 7372/15000 [06:07<05:58, 21.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▊                                       | 7378/15000 [06:07<05:43, 22.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 49%|█████████████████████████████████████▉                                       | 7381/15000 [06:07<05:51, 21.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▉                                       | 7387/15000 [06:08<05:54, 21.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▉                                       | 7390/15000 [06:08<05:47, 21.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▉                                       | 7396/15000 [06:08<05:37, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|█████████████████████████████████████▉                                       | 7402/15000 [06:08<05:32, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 49%|██████████████████████████████████████                                       | 7405/15000 [06:08<05:34, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|██████████████████████████████████████                                       | 7411/15000 [06:09<05:33, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 49%|██████████████████████████████████████                                       | 7417/15000 [06:09<05:30, 22.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 49%|██████████████████████████████████████                                       | 7423/15000 [06:09<05:32, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650] and comparison is [False False False False  True False  True  True False False  True False
 False False False False False False False  True False  True  True  True
 False False False  True False  True  True False False False  True False
  True False False  True False False  True False False False False  True
 False False False False False False False  True False  True False False
 False False  True False]
<<<<< Epsilon 

 50%|██████████████████████████████████████▏                                      | 7429/15000 [06:10<08:40, 14.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▏                                      | 7432/15000 [06:10<07:46, 16.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▏                                      | 7438/15000 [06:10<06:38, 18.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▏                                      | 7444/15000 [06:10<06:03, 20.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 50%|██████████████████████████████████████▏                                      | 7447/15000 [06:11<05:52, 21.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▎                                      | 7453/15000 [06:11<05:40, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▎                                      | 7459/15000 [06:11<05:35, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▎                                      | 7465/15000 [06:11<05:32, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▎                                      | 7471/15000 [06:12<05:29, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▎                                      | 7474/15000 [06:12<05:27, 22.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▍                                      | 7480/15000 [06:12<05:26, 23.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▍                                      | 7483/15000 [06:12<05:27, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▍                                      | 7489/15000 [06:12<05:29, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▍                                      | 7495/15000 [06:13<05:30, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▍                                      | 7498/15000 [06:13<05:31, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▌                                      | 7504/15000 [06:13<05:32, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 50%|██████████████████████████████████████▌                                      | 7510/15000 [06:13<05:29, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▌                                      | 7516/15000 [06:14<05:28, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▌                                      | 7519/15000 [06:14<05:36, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 50%|██████████████████████████████████████▋                                      | 7525/15000 [06:14<07:21, 16.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▋                                      | 7531/15000 [06:14<06:21, 19.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▋                                      | 7534/15000 [06:15<06:04, 20.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 50%|██████████████████████████████████████▋                                      | 7540/15000 [06:15<05:44, 21.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▋                                      | 7543/15000 [06:15<05:42, 21.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▊                                      | 7549/15000 [06:15<05:35, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▊                                      | 7555/15000 [06:16<05:28, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▊                                      | 7561/15000 [06:16<05:24, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 50%|██████████████████████████████████████▊                                      | 7564/15000 [06:16<05:25, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 50%|██████████████████████████████████████▊                                      | 7570/15000 [06:16<05:24, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 50%|██████████████████████████████████████▊                                      | 7573/15000 [06:16<05:24, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|██████████████████████████████████████▉                                      | 7579/15000 [06:17<05:24, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|██████████████████████████████████████▉                                      | 7585/15000 [06:17<05:25, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|██████████████████████████████████████▉                                      | 7591/15000 [06:17<05:23, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|██████████████████████████████████████▉                                      | 7594/15000 [06:17<05:25, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████                                      | 7600/15000 [06:17<05:29, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 51%|███████████████████████████████████████                                      | 7606/15000 [06:18<05:24, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████                                      | 7609/15000 [06:18<05:24, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████                                      | 7615/15000 [06:18<05:25, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████                                      | 7621/15000 [06:18<05:25, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 51%|███████████████████████████████████████▏                                     | 7627/15000 [06:19<05:22, 22.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 51%|███████████████████████████████████████▏                                     | 7630/15000 [06:19<05:21, 22.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 51%|███████████████████████████████████████▏                                     | 7636/15000 [06:19<05:21, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▏                                     | 7642/15000 [06:19<05:22, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▏                                     | 7645/15000 [06:19<05:22, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▎                                     | 7651/15000 [06:20<05:24, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▎                                     | 7657/15000 [06:20<05:23, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▎                                     | 7663/15000 [06:20<05:26, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▎                                     | 7666/15000 [06:20<05:25, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▍                                     | 7672/15000 [06:21<05:27, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▍                                     | 7678/15000 [06:21<05:27, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False  True  True False  True  True False False False
  True False  True False False  True  True False False False  True False
 False  True False False False False False False False False False False
  True False False False False  True  True False False False False False
 False False  True False False  True  True False  True False  True False
  True  True False  True]
<<<<< Ep

 51%|███████████████████████████████████████▍                                     | 7684/15000 [06:22<08:26, 14.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▍                                     | 7690/15000 [06:22<06:50, 17.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▌                                     | 7696/15000 [06:22<06:03, 20.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▌                                     | 7702/15000 [06:22<05:46, 21.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▌                                     | 7705/15000 [06:22<05:47, 20.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▌                                     | 7711/15000 [06:23<05:32, 21.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▌                                     | 7717/15000 [06:23<05:24, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 51%|███████████████████████████████████████▋                                     | 7723/15000 [06:23<05:21, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▋                                     | 7726/15000 [06:23<05:21, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▋                                     | 7732/15000 [06:24<05:19, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▋                                     | 7735/15000 [06:24<05:19, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▋                                     | 7741/15000 [06:24<07:22, 16.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▊                                     | 7744/15000 [06:24<06:46, 17.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▊                                     | 7749/15000 [06:25<08:18, 14.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 52%|███████████████████████████████████████▊                                     | 7754/15000 [06:25<07:06, 16.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▊                                     | 7757/15000 [06:25<06:42, 17.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▊                                     | 7763/15000 [06:26<06:15, 19.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▉                                     | 7769/15000 [06:26<07:43, 15.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▉                                     | 7775/15000 [06:26<06:27, 18.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 52%|███████████████████████████████████████▉                                     | 7778/15000 [06:26<06:08, 19.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▉                                     | 7784/15000 [06:27<05:40, 21.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|███████████████████████████████████████▉                                     | 7790/15000 [06:27<05:28, 21.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████                                     | 7793/15000 [06:27<05:25, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 52%|████████████████████████████████████████                                     | 7799/15000 [06:27<05:19, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████                                     | 7805/15000 [06:28<05:19, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 52%|████████████████████████████████████████                                     | 7811/15000 [06:28<05:19, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 52%|████████████████████████████████████████                                     | 7814/15000 [06:28<05:17, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▏                                    | 7820/15000 [06:28<05:16, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▏                                    | 7826/15000 [06:29<05:14, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▏                                    | 7832/15000 [06:29<05:12, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▏                                    | 7835/15000 [06:29<05:12, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▎                                    | 7841/15000 [06:29<05:13, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▎                                    | 7847/15000 [06:29<05:12, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▎                                    | 7853/15000 [06:30<05:10, 23.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▎                                    | 7856/15000 [06:30<05:09, 23.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▎                                    | 7862/15000 [06:30<05:11, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 52%|████████████████████████████████████████▍                                    | 7868/15000 [06:30<05:10, 22.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 52%|████████████████████████████████████████▍                                    | 7871/15000 [06:31<05:10, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▍                                    | 7877/15000 [06:31<05:12, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▍                                    | 7880/15000 [06:31<05:12, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▍                                    | 7886/15000 [06:31<05:11, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 53%|████████████████████████████████████████▌                                    | 7892/15000 [06:31<05:09, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▌                                    | 7898/15000 [06:32<05:10, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▌                                    | 7901/15000 [06:32<05:10, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▌                                    | 7907/15000 [06:32<05:11, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▌                                    | 7913/15000 [06:32<05:09, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▋                                    | 7916/15000 [06:33<05:12, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▋                                    | 7922/15000 [06:33<05:11, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▋                                    | 7928/15000 [06:33<05:45, 20.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▋                                    | 7931/15000 [06:33<08:04, 14.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▋                                    | 7934/15000 [06:34<07:17, 16.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False  True False False False False  True False False False False  True
 False False False False  True False False False  True  True False False
 False False False False False False  True False  True False False  True
  True False  True False False False  True False False False False False
 False  True False False False False False False False  True False False
 False False False False]
<<<<< Epsilon in training is 0.23517770184676212>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 465

 53%|████████████████████████████████████████▋                                    | 7937/15000 [06:34<10:44, 10.95it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▊                                    | 7943/15000 [06:34<07:57, 14.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▊                                    | 7949/15000 [06:35<06:31, 18.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▊                                    | 7952/15000 [06:35<06:05, 19.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▊                                    | 7958/15000 [06:35<05:43, 20.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▉                                    | 7964/15000 [06:35<05:23, 21.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▉                                    | 7967/15000 [06:35<05:18, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|████████████████████████████████████████▉                                    | 7973/15000 [06:36<05:29, 21.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 53%|████████████████████████████████████████▉                                    | 7976/15000 [06:36<05:22, 21.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 53%|████████████████████████████████████████▉                                    | 7982/15000 [06:36<05:15, 22.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|█████████████████████████████████████████                                    | 7988/15000 [06:36<05:11, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|█████████████████████████████████████████                                    | 7994/15000 [06:37<05:11, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|█████████████████████████████████████████                                    | 8000/15000 [06:37<07:03, 16.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|█████████████████████████████████████████                                    | 8005/15000 [06:39<16:47,  6.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|█████████████████████████████████████████                                    | 8007/15000 [06:39<14:39,  7.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|█████████████████████████████████████████                                    | 8011/15000 [06:39<12:58,  8.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|█████████████████████████████████████████▏                                   | 8016/15000 [06:39<09:05, 12.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|█████████████████████████████████████████▏                                   | 8019/15000 [06:39<07:49, 14.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 53%|█████████████████████████████████████████▏                                   | 8023/15000 [06:40<09:12, 12.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▏                                   | 8026/15000 [06:40<07:44, 15.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 54%|█████████████████████████████████████████▏                                   | 8031/15000 [06:40<08:47, 13.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 54%|█████████████████████████████████████████▏                                   | 8034/15000 [06:41<07:34, 15.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▎                                   | 8040/15000 [06:41<06:15, 18.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▎                                   | 8043/15000 [06:41<06:11, 18.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 54%|█████████████████████████████████████████▎                                   | 8046/15000 [06:41<08:27, 13.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▎                                   | 8051/15000 [06:42<06:57, 16.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▎                                   | 8056/15000 [06:42<08:28, 13.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▎                                   | 8059/15000 [06:42<07:24, 15.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▍                                   | 8064/15000 [06:43<08:28, 13.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▍                                   | 8070/15000 [06:43<06:37, 17.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▍                                   | 8073/15000 [06:43<06:11, 18.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▍                                   | 8079/15000 [06:43<05:40, 20.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▌                                   | 8085/15000 [06:44<05:19, 21.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▌                                   | 8088/15000 [06:44<05:13, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▌                                   | 8091/15000 [06:44<05:12, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 54%|█████████████████████████████████████████▌                                   | 8097/15000 [06:44<07:19, 15.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▌                                   | 8100/15000 [06:45<06:37, 17.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▌                                   | 8106/15000 [06:45<05:49, 19.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▋                                   | 8109/15000 [06:45<05:42, 20.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▋                                   | 8112/15000 [06:45<07:58, 14.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▋                                   | 8117/15000 [06:46<06:42, 17.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▋                                   | 8123/15000 [06:46<05:58, 19.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▋                                   | 8126/15000 [06:46<05:51, 19.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▋                                   | 8132/15000 [06:46<05:32, 20.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▊                                   | 8138/15000 [06:46<05:21, 21.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▊                                   | 8141/15000 [06:47<05:19, 21.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▊                                   | 8147/15000 [06:47<05:20, 21.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▊                                   | 8152/15000 [06:47<07:37, 14.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▊                                   | 8155/15000 [06:48<06:46, 16.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▉                                   | 8158/15000 [06:48<06:12, 18.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▉                                   | 8161/15000 [06:48<08:37, 13.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▉                                   | 8167/15000 [06:48<06:42, 16.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 54%|█████████████████████████████████████████▉                                   | 8173/15000 [06:49<05:54, 19.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|█████████████████████████████████████████▉                                   | 8176/15000 [06:49<05:39, 20.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████                                   | 8182/15000 [06:49<05:19, 21.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████                                   | 8188/15000 [06:49<05:13, 21.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████                                   | 8191/15000 [06:49<05:09, 21.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650] and comparison is [False False  True  True False False False False False False  True  True
 False False False  True False False False  True False False False  True
 False False False False  True False False False  True False False False
 False False False False  True False  True False False False False  True
 False False False False False  True False False False False  True False
 False False  True  True]
<<<<< Epsilon in training is 0.23118272194174547>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 46

 55%|██████████████████████████████████████████                                   | 8197/15000 [06:50<07:53, 14.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████                                   | 8200/15000 [06:50<07:00, 16.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 55%|██████████████████████████████████████████                                   | 8206/15000 [06:50<05:56, 19.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▏                                  | 8212/15000 [06:51<05:28, 20.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▏                                  | 8215/15000 [06:51<05:19, 21.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▏                                  | 8221/15000 [06:51<05:07, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▏                                  | 8227/15000 [06:51<05:10, 21.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▎                                  | 8233/15000 [06:52<05:04, 22.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▎                                  | 8239/15000 [06:52<04:58, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▎                                  | 8242/15000 [06:52<04:57, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▎                                  | 8245/15000 [06:52<05:01, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▎                                  | 8251/15000 [06:52<05:35, 20.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 55%|██████████████████████████████████████████▍                                  | 8257/15000 [06:53<05:16, 21.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▍                                  | 8260/15000 [06:53<05:09, 21.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▍                                  | 8266/15000 [06:53<05:04, 22.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▍                                  | 8272/15000 [06:53<05:16, 21.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▍                                  | 8278/15000 [06:54<05:07, 21.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▌                                  | 8281/15000 [06:54<05:03, 22.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▌                                  | 8287/15000 [06:54<05:01, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▌                                  | 8293/15000 [06:54<05:13, 21.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 55%|██████████████████████████████████████████▌                                  | 8296/15000 [06:55<05:17, 21.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▌                                  | 8299/15000 [06:55<07:40, 14.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▋                                  | 8304/15000 [06:55<06:27, 17.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▋                                  | 8310/15000 [06:55<05:49, 19.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▋                                  | 8315/15000 [06:56<07:35, 14.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▋                                  | 8318/15000 [06:56<06:41, 16.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 55%|██████████████████████████████████████████▋                                  | 8324/15000 [06:56<05:55, 18.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|██████████████████████████████████████████▋                                  | 8327/15000 [06:57<08:06, 13.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|██████████████████████████████████████████▊                                  | 8333/15000 [06:57<06:23, 17.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|██████████████████████████████████████████▊                                  | 8339/15000 [06:57<05:35, 19.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 56%|██████████████████████████████████████████▊                                  | 8342/15000 [06:57<05:25, 20.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|██████████████████████████████████████████▊                                  | 8348/15000 [06:58<05:07, 21.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 56%|██████████████████████████████████████████▉                                  | 8354/15000 [06:58<04:59, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|██████████████████████████████████████████▉                                  | 8360/15000 [06:58<04:53, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|██████████████████████████████████████████▉                                  | 8366/15000 [06:58<04:54, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|██████████████████████████████████████████▉                                  | 8369/15000 [06:58<05:03, 21.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|██████████████████████████████████████████▉                                  | 8375/15000 [06:59<07:29, 14.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████                                  | 8381/15000 [06:59<06:06, 18.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 56%|███████████████████████████████████████████                                  | 8387/15000 [06:59<05:33, 19.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 56%|███████████████████████████████████████████                                  | 8390/15000 [07:00<05:21, 20.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████                                  | 8396/15000 [07:00<05:11, 21.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████                                  | 8399/15000 [07:00<07:35, 14.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 56%|███████████████████████████████████████████▏                                 | 8404/15000 [07:00<06:31, 16.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▏                                 | 8408/15000 [07:01<08:10, 13.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▏                                 | 8411/15000 [07:01<06:59, 15.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▏                                 | 8417/15000 [07:01<05:48, 18.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▏                                 | 8420/15000 [07:01<05:43, 19.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▏                                 | 8423/15000 [07:02<11:14,  9.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▎                                 | 8428/15000 [07:02<08:19, 13.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 56%|███████████████████████████████████████████▎                                 | 8434/15000 [07:03<06:26, 16.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▎                                 | 8437/15000 [07:03<06:08, 17.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▎                                 | 8440/15000 [07:03<05:53, 18.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▎                                 | 8446/15000 [07:03<07:19, 14.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False  True False  True False False  True False False False  True
 False False  True False  True  True False  True False False False  True
 False False  True False  True  True False  True  True  True  True False
 False False False False False False False  True False False False  True
 False False False  True False  True False False False False  True  True
 False False Fals

 56%|███████████████████████████████████████████▍                                 | 8452/15000 [07:04<08:41, 12.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▍                                 | 8457/15000 [07:04<07:03, 15.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▍                                 | 8461/15000 [07:05<08:25, 12.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▍                                 | 8467/15000 [07:05<06:19, 17.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 56%|███████████████████████████████████████████▍                                 | 8470/15000 [07:05<05:48, 18.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 57%|███████████████████████████████████████████▌                                 | 8476/15000 [07:06<08:00, 13.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▌                                 | 8482/15000 [07:06<06:17, 17.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▌                                 | 8488/15000 [07:06<05:30, 19.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 57%|███████████████████████████████████████████▌                                 | 8491/15000 [07:06<05:16, 20.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▌                                 | 8497/15000 [07:06<04:57, 21.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▋                                 | 8503/15000 [07:07<04:53, 22.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 57%|███████████████████████████████████████████▋                                 | 8506/15000 [07:07<04:51, 22.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▋                                 | 8512/15000 [07:07<04:47, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▋                                 | 8518/15000 [07:07<04:46, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▊                                 | 8524/15000 [07:08<04:43, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▊                                 | 8527/15000 [07:08<04:44, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▊                                 | 8533/15000 [07:08<04:56, 21.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▊                                 | 8539/15000 [07:08<04:51, 22.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▊                                 | 8545/15000 [07:09<04:47, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▉                                 | 8548/15000 [07:09<04:45, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 57%|███████████████████████████████████████████▉                                 | 8554/15000 [07:09<06:28, 16.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▉                                 | 8557/15000 [07:09<05:56, 18.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▉                                 | 8563/15000 [07:10<05:17, 20.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 57%|███████████████████████████████████████████▉                                 | 8569/15000 [07:10<07:02, 15.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|███████████████████████████████████████████▉                                 | 8571/15000 [07:11<09:27, 11.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|████████████████████████████████████████████                                 | 8577/15000 [07:11<06:52, 15.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|████████████████████████████████████████████                                 | 8583/15000 [07:11<05:43, 18.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 57%|████████████████████████████████████████████                                 | 8586/15000 [07:11<05:26, 19.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|████████████████████████████████████████████                                 | 8592/15000 [07:12<07:33, 14.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 57%|████████████████████████████████████████████▏                                | 8597/15000 [07:12<06:20, 16.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|████████████████████████████████████████████▏                                | 8603/15000 [07:12<05:38, 18.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|████████████████████████████████████████████▏                                | 8606/15000 [07:12<05:29, 19.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|████████████████████████████████████████████▏                                | 8609/15000 [07:13<05:37, 18.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|████████████████████████████████████████████▏                                | 8614/15000 [07:13<07:00, 15.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|████████████████████████████████████████████▏                                | 8617/15000 [07:13<06:14, 17.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 57%|████████████████████████████████████████████▎                                | 8623/15000 [07:13<05:25, 19.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▎                                | 8626/15000 [07:13<05:15, 20.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 58%|████████████████████████████████████████████▎                                | 8632/15000 [07:14<05:01, 21.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▎                                | 8638/15000 [07:14<04:48, 22.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▎                                | 8644/15000 [07:14<04:43, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▍                                | 8650/15000 [07:15<04:42, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 58%|████████████████████████████████████████████▍                                | 8653/15000 [07:15<04:41, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▍                                | 8659/15000 [07:15<04:40, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▍                                | 8665/15000 [07:15<04:39, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▍                                | 8668/15000 [07:15<04:37, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▌                                | 8674/15000 [07:16<04:48, 21.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 58%|████████████████████████████████████████████▌                                | 8680/15000 [07:16<04:59, 21.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▌                                | 8683/15000 [07:16<07:11, 14.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▌                                | 8688/15000 [07:17<06:05, 17.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▋                                | 8694/15000 [07:17<05:26, 19.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▋                                | 8697/15000 [07:17<07:32, 13.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▋                                | 8703/15000 [07:17<06:01, 17.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False False False False  True False  True  True
 False  True False False False False False False False False False  True
 False False False False False  True False False False False False  True
  True False  True False  True False False False False False False False
 False False False False  True False False False False  True  True False
 False  True False False]
<<<<< Epsilon in training is 0.2233951983390067>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 

 58%|████████████████████████████████████████████▋                                | 8708/15000 [07:18<08:14, 12.73it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 58%|████████████████████████████████████████████▋                                | 8711/15000 [07:18<07:04, 14.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▋                                | 8717/15000 [07:18<05:47, 18.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▊                                | 8723/15000 [07:19<05:09, 20.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▊                                | 8726/15000 [07:19<05:00, 20.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▊                                | 8732/15000 [07:19<04:46, 21.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▊                                | 8738/15000 [07:19<04:44, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▊                                | 8741/15000 [07:19<04:54, 21.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▉                                | 8747/15000 [07:20<04:50, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▉                                | 8753/15000 [07:20<04:38, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▉                                | 8759/15000 [07:20<04:35, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|████████████████████████████████████████████▉                                | 8765/15000 [07:21<04:40, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 58%|█████████████████████████████████████████████                                | 8768/15000 [07:21<04:38, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 58%|█████████████████████████████████████████████                                | 8774/15000 [07:21<04:38, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████                                | 8780/15000 [07:21<04:34, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████                                | 8783/15000 [07:21<04:35, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████                                | 8789/15000 [07:22<04:38, 22.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▏                               | 8794/15000 [07:22<06:42, 15.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▏                               | 8797/15000 [07:22<06:02, 17.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▏                               | 8802/15000 [07:22<05:36, 18.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▏                               | 8804/15000 [07:23<08:10, 12.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▏                               | 8810/15000 [07:23<06:07, 16.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▎                               | 8816/15000 [07:23<05:17, 19.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▎                               | 8819/15000 [07:23<05:04, 20.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▎                               | 8825/15000 [07:24<04:47, 21.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▎                               | 8831/15000 [07:24<04:39, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▎                               | 8837/15000 [07:24<04:36, 22.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▍                               | 8843/15000 [07:24<04:31, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▍                               | 8849/15000 [07:25<04:35, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▍                               | 8852/15000 [07:25<04:35, 22.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▍                               | 8858/15000 [07:25<04:32, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 59%|█████████████████████████████████████████████▌                               | 8864/15000 [07:25<04:30, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▌                               | 8867/15000 [07:26<04:31, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▌                               | 8873/15000 [07:26<04:34, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▌                               | 8876/15000 [07:26<06:57, 14.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▌                               | 8882/15000 [07:26<05:39, 18.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▋                               | 8888/15000 [07:27<05:05, 20.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▋                               | 8894/15000 [07:27<04:48, 21.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▋                               | 8897/15000 [07:27<04:45, 21.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▋                               | 8903/15000 [07:27<04:37, 21.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▋                               | 8909/15000 [07:28<04:33, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▋                               | 8912/15000 [07:28<04:33, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▊                               | 8918/15000 [07:28<04:37, 21.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 59%|█████████████████████████████████████████████▊                               | 8921/15000 [07:28<06:53, 14.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 60%|█████████████████████████████████████████████▊                               | 8927/15000 [07:29<05:37, 17.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|█████████████████████████████████████████████▊                               | 8930/15000 [07:29<05:16, 19.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|█████████████████████████████████████████████▊                               | 8936/15000 [07:29<04:49, 20.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|█████████████████████████████████████████████▉                               | 8942/15000 [07:29<04:39, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|█████████████████████████████████████████████▉                               | 8948/15000 [07:30<04:33, 22.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|█████████████████████████████████████████████▉                               | 8954/15000 [07:30<04:30, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|█████████████████████████████████████████████▉                               | 8957/15000 [07:30<04:30, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 60%|█████████████████████████████████████████████▉                               | 8960/15000 [07:30<04:27, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False False False  True False False False False
 False False False False False False  True  True False False False  True
 False False False False  True False False False False False False  True
 False False False False  True False False False  True False False False
  True  True False False False False False False False  True False False
  True False False False]
<<<<< Epsilon in training is 0.2196003686368989>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 465

 60%|██████████████████████████████████████████████                               | 8966/15000 [07:31<08:45, 11.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 60%|██████████████████████████████████████████████                               | 8969/15000 [07:31<07:26, 13.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████                               | 8974/15000 [07:32<07:58, 12.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████                               | 8977/15000 [07:32<06:54, 14.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████                               | 8983/15000 [07:32<05:38, 17.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▏                              | 8989/15000 [07:32<05:05, 19.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▏                              | 8992/15000 [07:32<05:02, 19.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▏                              | 8995/15000 [07:33<05:00, 20.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▏                              | 8998/15000 [07:33<07:27, 13.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▏                              | 9004/15000 [07:35<15:09,  6.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▎                              | 9010/15000 [07:35<09:39, 10.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▎                              | 9012/15000 [07:35<08:39, 11.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▎                              | 9017/15000 [07:35<08:16, 12.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▎                              | 9020/15000 [07:36<06:59, 14.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▎                              | 9026/15000 [07:36<05:35, 17.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▎                              | 9029/15000 [07:36<05:14, 18.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▍                              | 9035/15000 [07:36<04:53, 20.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▍                              | 9041/15000 [07:37<06:19, 15.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▍                              | 9044/15000 [07:37<05:43, 17.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 60%|██████████████████████████████████████████████▍                              | 9050/15000 [07:37<05:00, 19.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▍                              | 9056/15000 [07:37<04:41, 21.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▌                              | 9062/15000 [07:38<04:29, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▌                              | 9065/15000 [07:38<04:26, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▌                              | 9071/15000 [07:38<04:22, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 60%|██████████████████████████████████████████████▌                              | 9074/15000 [07:38<04:21, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▌                              | 9080/15000 [07:38<04:31, 21.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 61%|██████████████████████████████████████████████▋                              | 9083/15000 [07:39<04:36, 21.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 61%|██████████████████████████████████████████████▋                              | 9086/15000 [07:39<06:49, 14.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▋                              | 9091/15000 [07:39<05:45, 17.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▋                              | 9097/15000 [07:39<05:00, 19.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▋                              | 9100/15000 [07:40<04:57, 19.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▋                              | 9106/15000 [07:40<04:37, 21.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 61%|██████████████████████████████████████████████▊                              | 9112/15000 [07:40<04:27, 22.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▊                              | 9118/15000 [07:40<04:24, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▊                              | 9121/15000 [07:41<04:31, 21.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▊                              | 9127/15000 [07:41<04:38, 21.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▊                              | 9130/15000 [07:41<04:40, 20.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▉                              | 9136/15000 [07:41<06:09, 15.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▉                              | 9139/15000 [07:42<05:45, 16.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▉                              | 9143/15000 [07:42<07:10, 13.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▉                              | 9146/15000 [07:42<06:13, 15.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▉                              | 9151/15000 [07:42<05:28, 17.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|██████████████████████████████████████████████▉                              | 9155/15000 [07:43<07:15, 13.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████                              | 9160/15000 [07:43<05:53, 16.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████                              | 9164/15000 [07:44<07:50, 12.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 61%|███████████████████████████████████████████████                              | 9167/15000 [07:44<06:32, 14.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████                              | 9173/15000 [07:44<06:59, 13.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 61%|███████████████████████████████████████████████                              | 9179/15000 [07:44<05:31, 17.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████▏                             | 9185/15000 [07:45<06:31, 14.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████▏                             | 9188/15000 [07:45<05:49, 16.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████▏                             | 9194/15000 [07:45<05:06, 18.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 61%|███████████████████████████████████████████████▏                             | 9200/15000 [07:46<04:44, 20.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████▏                             | 9203/15000 [07:46<04:49, 20.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████▎                             | 9206/15000 [07:46<04:43, 20.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████▎                             | 9212/15000 [07:46<04:37, 20.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████▎                             | 9215/15000 [07:46<04:37, 20.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False False False False False False False False
 False  True False False False False False False False False False False
 False False False  True False  True False  True  True  True False False
 False False False False False False  True False False  True  True False
 False False False False False  True  True False False False False  True
 False False False False]
<<<<< Epsilon in training is 0.21587000196969544>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4

 61%|███████████████████████████████████████████████▎                             | 9221/15000 [07:47<07:00, 13.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 61%|███████████████████████████████████████████████▎                             | 9224/15000 [07:47<06:11, 15.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▍                             | 9230/15000 [07:47<05:19, 18.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▍                             | 9236/15000 [07:48<05:01, 19.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▍                             | 9239/15000 [07:48<07:11, 13.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▍                             | 9244/15000 [07:48<05:57, 16.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▍                             | 9250/15000 [07:49<05:05, 18.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▍                             | 9253/15000 [07:49<04:59, 19.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▌                             | 9259/15000 [07:49<04:42, 20.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 62%|███████████████████████████████████████████████▌                             | 9262/15000 [07:49<04:43, 20.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▌                             | 9267/15000 [07:50<06:44, 14.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▌                             | 9270/15000 [07:50<05:57, 16.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 62%|███████████████████████████████████████████████▌                             | 9276/15000 [07:50<05:07, 18.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▋                             | 9279/15000 [07:50<05:04, 18.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▋                             | 9285/15000 [07:51<04:51, 19.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▋                             | 9290/15000 [07:51<06:25, 14.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▋                             | 9296/15000 [07:51<05:22, 17.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▋                             | 9298/15000 [07:52<07:23, 12.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▊                             | 9304/15000 [07:52<05:37, 16.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▊                             | 9310/15000 [07:52<04:49, 19.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▊                             | 9316/15000 [07:52<04:31, 20.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▊                             | 9319/15000 [07:53<04:26, 21.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▊                             | 9325/15000 [07:53<04:19, 21.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▉                             | 9331/15000 [07:53<04:20, 21.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▉                             | 9337/15000 [07:53<04:16, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|███████████████████████████████████████████████▉                             | 9340/15000 [07:53<04:16, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 62%|███████████████████████████████████████████████▉                             | 9346/15000 [07:54<04:16, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|████████████████████████████████████████████████                             | 9352/15000 [07:54<04:15, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 62%|████████████████████████████████████████████████                             | 9355/15000 [07:54<04:14, 22.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 62%|████████████████████████████████████████████████                             | 9361/15000 [07:54<04:12, 22.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 62%|████████████████████████████████████████████████                             | 9367/15000 [07:55<04:11, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 62%|████████████████████████████████████████████████                             | 9370/15000 [07:55<04:12, 22.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▏                            | 9376/15000 [07:55<04:12, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 63%|████████████████████████████████████████████████▏                            | 9379/15000 [07:55<04:12, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▏                            | 9385/15000 [07:56<06:04, 15.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▏                            | 9387/15000 [07:56<07:48, 11.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▏                            | 9393/15000 [07:56<05:48, 16.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▏                            | 9399/15000 [07:57<04:55, 18.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 63%|████████████████████████████████████████████████▎                            | 9402/15000 [07:57<04:40, 19.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▎                            | 9408/15000 [07:57<04:22, 21.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 63%|████████████████████████████████████████████████▎                            | 9411/15000 [07:57<06:20, 14.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▎                            | 9417/15000 [07:58<05:11, 17.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▎                            | 9423/15000 [07:58<04:38, 20.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▍                            | 9426/15000 [07:58<04:34, 20.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 63%|████████████████████████████████████████████████▍                            | 9431/15000 [07:58<06:06, 15.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▍                            | 9435/15000 [07:59<07:09, 12.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 63%|████████████████████████████████████████████████▍                            | 9439/15000 [07:59<07:43, 12.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▍                            | 9443/15000 [08:00<08:03, 11.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▍                            | 9447/15000 [08:00<08:22, 11.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▌                            | 9451/15000 [08:00<08:23, 11.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 63%|████████████████████████████████████████████████▌                            | 9455/15000 [08:01<08:29, 10.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▌                            | 9459/15000 [08:01<08:28, 10.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 63%|████████████████████████████████████████████████▌                            | 9464/15000 [08:02<07:38, 12.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▌                            | 9467/15000 [08:02<06:20, 14.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▌                            | 9470/15000 [08:02<05:34, 16.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650] and comparison is [False False  True  True  True False False False False False False False
  True False False False False False False False False  True False False
  True False  True  True False  True  True  True False False  True False
  True False False False False  True  True  True False False False False
 False False False False False False False  True False False False False
 False False False False]
<<<<< Epsilon in training is 0.21220300329935904>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4

 63%|████████████████████████████████████████████████▋                            | 9475/15000 [08:03<07:46, 11.83it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 63%|████████████████████████████████████████████████▋                            | 9480/15000 [08:03<06:06, 15.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▋                            | 9483/15000 [08:03<05:35, 16.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▋                            | 9489/15000 [08:03<04:47, 19.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▋                            | 9495/15000 [08:03<04:24, 20.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▊                            | 9501/15000 [08:04<04:13, 21.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▊                            | 9504/15000 [08:04<04:08, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▊                            | 9509/15000 [08:04<06:13, 14.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▊                            | 9513/15000 [08:05<07:14, 12.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 63%|████████████████████████████████████████████████▊                            | 9517/15000 [08:05<07:46, 11.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 63%|████████████████████████████████████████████████▉                            | 9522/15000 [08:05<05:56, 15.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|████████████████████████████████████████████████▉                            | 9528/15000 [08:06<05:00, 18.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|████████████████████████████████████████████████▉                            | 9530/15000 [08:06<07:07, 12.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|████████████████████████████████████████████████▉                            | 9536/15000 [08:07<07:04, 12.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 64%|████████████████████████████████████████████████▉                            | 9540/15000 [08:07<07:39, 11.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|████████████████████████████████████████████████▉                            | 9544/15000 [08:07<07:59, 11.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████                            | 9546/15000 [08:08<09:47,  9.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 64%|█████████████████████████████████████████████████                            | 9552/15000 [08:08<08:11, 11.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████                            | 9556/15000 [08:09<08:16, 10.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████                            | 9560/15000 [08:09<08:20, 10.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████                            | 9563/15000 [08:09<06:51, 13.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████                            | 9568/15000 [08:09<05:40, 15.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 64%|█████████████████████████████████████████████████▏                           | 9572/15000 [08:10<06:53, 13.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 64%|█████████████████████████████████████████████████▏                           | 9576/15000 [08:10<07:35, 11.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▏                           | 9580/15000 [08:11<07:55, 11.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▏                           | 9583/15000 [08:11<06:34, 13.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▏                           | 9588/15000 [08:11<05:30, 16.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▏                           | 9593/15000 [08:11<06:18, 14.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▎                           | 9596/15000 [08:12<05:31, 16.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 64%|█████████████████████████████████████████████████▎                           | 9599/15000 [08:12<05:12, 17.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▎                           | 9603/15000 [08:12<06:38, 13.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▎                           | 9607/15000 [08:13<07:26, 12.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▎                           | 9611/15000 [08:13<07:50, 11.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▎                           | 9615/15000 [08:13<08:02, 11.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▍                           | 9619/15000 [08:14<08:04, 11.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▍                           | 9623/15000 [08:14<08:06, 11.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▍                           | 9628/15000 [08:15<07:25, 12.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▍                           | 9631/15000 [08:15<06:17, 14.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▍                           | 9633/15000 [08:15<08:07, 11.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 64%|█████████████████████████████████████████████████▍                           | 9639/15000 [08:15<05:52, 15.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 64%|█████████████████████████████████████████████████▌                           | 9644/15000 [08:16<05:06, 17.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▌                           | 9650/15000 [08:16<04:28, 19.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▌                           | 9656/15000 [08:16<05:36, 15.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▌                           | 9659/15000 [08:16<05:05, 17.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▌                           | 9665/15000 [08:17<04:41, 18.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▋                           | 9668/15000 [08:17<04:33, 19.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 64%|█████████████████████████████████████████████████▋                           | 9671/15000 [08:17<06:21, 13.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|█████████████████████████████████████████████████▋                           | 9676/15000 [08:18<06:48, 13.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|█████████████████████████████████████████████████▋                           | 9680/15000 [08:18<07:23, 12.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 65%|█████████████████████████████████████████████████▋                           | 9685/15000 [08:18<07:04, 12.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|█████████████████████████████████████████████████▋                           | 9688/15000 [08:19<06:05, 14.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 65%|█████████████████████████████████████████████████▋                           | 9690/15000 [08:19<07:51, 11.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|█████████████████████████████████████████████████▊                           | 9696/15000 [08:19<07:17, 12.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 65%|█████████████████████████████████████████████████▊                           | 9700/15000 [08:20<07:44, 11.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 65%|█████████████████████████████████████████████████▊                           | 9703/15000 [08:20<06:30, 13.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|█████████████████████████████████████████████████▊                           | 9708/15000 [08:20<05:22, 16.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|█████████████████████████████████████████████████▊                           | 9713/15000 [08:20<04:47, 18.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|█████████████████████████████████████████████████▉                           | 9720/15000 [08:21<05:46, 15.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|█████████████████████████████████████████████████▉                           | 9723/15000 [08:21<05:07, 17.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|█████████████████████████████████████████████████▉                           | 9726/15000 [08:21<04:44, 18.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False  True False False False False False False False False
  True False False False False  True False False False  True False  True
 False False False False False False False False False False False  True
 False False False False  True False False  True False  True  True False
 False False False False False  True  True False False  True False  True
 False False False False]
<<<<< Epsilon in training is 0.20859829618933926>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4

 65%|█████████████████████████████████████████████████▉                           | 9732/15000 [08:22<06:42, 13.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 65%|█████████████████████████████████████████████████▉                           | 9738/15000 [08:22<05:14, 16.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████                           | 9744/15000 [08:23<05:59, 14.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████                           | 9747/15000 [08:23<05:21, 16.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████                           | 9753/15000 [08:23<04:34, 19.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████                           | 9756/15000 [08:23<04:21, 20.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████                           | 9762/15000 [08:23<04:05, 21.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▏                          | 9768/15000 [08:24<03:56, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▏                          | 9774/15000 [08:24<03:54, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▏                          | 9777/15000 [08:24<03:51, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 65%|██████████████████████████████████████████████████▏                          | 9783/15000 [08:24<03:50, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▎                          | 9789/15000 [08:25<03:48, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▎                          | 9792/15000 [08:25<03:48, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▎                          | 9798/15000 [08:25<03:47, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▎                          | 9804/15000 [08:25<03:47, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▎                          | 9807/15000 [08:25<03:46, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▎                          | 9813/15000 [08:26<03:48, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 65%|██████████████████████████████████████████████████▍                          | 9819/15000 [08:26<03:47, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 66%|██████████████████████████████████████████████████▍                          | 9825/15000 [08:26<03:49, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▍                          | 9828/15000 [08:26<03:49, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▍                          | 9834/15000 [08:27<03:48, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▌                          | 9840/15000 [08:27<05:20, 16.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▌                          | 9844/15000 [08:28<06:28, 13.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▌                          | 9846/15000 [08:28<08:09, 10.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 66%|██████████████████████████████████████████████████▌                          | 9852/15000 [08:28<05:47, 14.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▌                          | 9855/15000 [08:28<05:18, 16.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 66%|██████████████████████████████████████████████████▌                          | 9857/15000 [08:29<07:19, 11.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▋                          | 9863/15000 [08:29<07:22, 11.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▋                          | 9866/15000 [08:29<06:03, 14.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▋                          | 9871/15000 [08:30<06:26, 13.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▋                          | 9874/15000 [08:30<05:31, 15.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▋                          | 9880/15000 [08:30<04:35, 18.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▋                          | 9886/15000 [08:30<04:12, 20.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▊                          | 9892/15000 [08:31<05:19, 15.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▊                          | 9895/15000 [08:31<04:49, 17.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▊                          | 9901/15000 [08:31<04:16, 19.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▊                          | 9904/15000 [08:31<04:17, 19.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▊                          | 9910/15000 [08:32<04:02, 20.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▉                          | 9916/15000 [08:32<03:51, 21.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▉                          | 9919/15000 [08:32<03:49, 22.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 66%|██████████████████████████████████████████████████▉                          | 9924/15000 [08:33<05:34, 15.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▉                          | 9929/15000 [08:33<04:49, 17.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|██████████████████████████████████████████████████▉                          | 9931/15000 [08:33<04:40, 18.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 66%|███████████████████████████████████████████████████                          | 9936/15000 [08:33<05:45, 14.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|███████████████████████████████████████████████████                          | 9939/15000 [08:33<05:04, 16.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|███████████████████████████████████████████████████                          | 9945/15000 [08:34<04:20, 19.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|███████████████████████████████████████████████████                          | 9951/15000 [08:34<04:02, 20.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|███████████████████████████████████████████████████                          | 9954/15000 [08:34<03:56, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|███████████████████████████████████████████████████                          | 9957/15000 [08:34<03:57, 21.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|███████████████████████████████████████████████████▏                         | 9963/15000 [08:35<05:09, 16.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|███████████████████████████████████████████████████▏                         | 9969/15000 [08:35<04:25, 18.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 66%|███████████████████████████████████████████████████▏                         | 9972/15000 [08:35<04:14, 19.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▏                         | 9978/15000 [08:35<03:57, 21.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▎                         | 9984/15000 [08:36<03:48, 21.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False  True False  True False False False False False False  True
  True False False False False False False False  True  True False False
 False False False  True False False False False False  True False  True
 False False  True False  True False False False False  True False False
 False False False False False False False False False False  True False
 False False False False]
<<<<< Epsilon in training is 0.20505482248858797>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 465

 67%|███████████████████████████████████████████████████▎                         | 9987/15000 [08:36<06:45, 12.37it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▎                         | 9990/15000 [08:36<05:53, 14.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▎                         | 9996/15000 [08:37<04:43, 17.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▎                         | 9999/15000 [08:37<04:24, 18.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
hardest scenarios
[143 125 127 128 129 130 131 132 133 134]
They have been chosen respectively
[1 1 1 1 1 1 1 1 1 1]
The number of timesteps played is
[85 72 44  8 27 15 24 47 49  7]
avg (accross all scenarios) number of timsteps played 17.289930555555557
Time alive: [42.5 36.  22.   4.  13.5  7.5 12.  23.5 24.5  3.5]
Avg time alive: 8.660590277777779


 67%|██████████████████████████████████████████████████▋                         | 10005/15000 [08:39<16:58,  4.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|██████████████████████████████████████████████████▋                         | 10008/15000 [08:39<12:58,  6.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|██████████████████████████████████████████████████▋                         | 10014/15000 [08:40<08:12, 10.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 67%|██████████████████████████████████████████████████▊                         | 10020/15000 [08:40<05:53, 14.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|██████████████████████████████████████████████████▊                         | 10023/15000 [08:40<05:14, 15.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|██████████████████████████████████████████████████▊                         | 10029/15000 [08:40<04:25, 18.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 67%|██████████████████████████████████████████████████▊                         | 10035/15000 [08:41<03:59, 20.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|██████████████████████████████████████████████████▊                         | 10041/15000 [08:41<03:48, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|██████████████████████████████████████████████████▉                         | 10047/15000 [08:41<03:43, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|██████████████████████████████████████████████████▉                         | 10050/15000 [08:41<03:41, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|██████████████████████████████████████████████████▉                         | 10056/15000 [08:41<03:40, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|██████████████████████████████████████████████████▉                         | 10062/15000 [08:42<03:38, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 67%|███████████████████████████████████████████████████                         | 10068/15000 [08:42<04:56, 16.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 67%|███████████████████████████████████████████████████                         | 10071/15000 [08:42<04:36, 17.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████                         | 10077/15000 [08:43<05:33, 14.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████                         | 10080/15000 [08:43<04:57, 16.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████                         | 10086/15000 [08:43<04:15, 19.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 67%|███████████████████████████████████████████████████                         | 10089/15000 [08:43<04:12, 19.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▏                        | 10092/15000 [08:44<04:18, 19.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 67%|███████████████████████████████████████████████████▏                        | 10097/15000 [08:44<05:28, 14.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 67%|███████████████████████████████████████████████████▏                        | 10100/15000 [08:44<04:53, 16.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▏                        | 10105/15000 [08:44<04:37, 17.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▏                        | 10109/15000 [08:45<05:54, 13.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 67%|███████████████████████████████████████████████████▏                        | 10112/15000 [08:45<05:23, 15.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▎                        | 10116/15000 [08:45<06:22, 12.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 67%|███████████████████████████████████████████████████▎                        | 10119/15000 [08:46<05:39, 14.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 67%|███████████████████████████████████████████████████▎                        | 10121/15000 [08:46<07:26, 10.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▎                        | 10127/15000 [08:46<05:14, 15.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▎                        | 10133/15000 [08:46<04:21, 18.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▎                        | 10136/15000 [08:47<04:04, 19.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▍                        | 10142/15000 [08:47<03:54, 20.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▍                        | 10148/15000 [08:47<03:56, 20.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▍                        | 10153/15000 [08:48<05:19, 15.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▍                        | 10156/15000 [08:48<04:45, 16.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▍                        | 10158/15000 [08:48<06:46, 11.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 68%|███████████████████████████████████████████████████▍                        | 10164/15000 [08:49<06:21, 12.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▌                        | 10170/15000 [08:49<04:50, 16.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▌                        | 10176/15000 [08:49<04:10, 19.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 68%|███████████████████████████████████████████████████▌                        | 10179/15000 [08:49<03:58, 20.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▌                        | 10185/15000 [08:49<03:44, 21.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▌                        | 10188/15000 [08:50<03:40, 21.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▋                        | 10191/15000 [08:50<03:40, 21.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 68%|███████████████████████████████████████████████████▋                        | 10194/15000 [08:50<05:28, 14.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▋                        | 10200/15000 [08:50<04:29, 17.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 68%|███████████████████████████████████████████████████▋                        | 10206/15000 [08:51<03:59, 20.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▋                        | 10212/15000 [08:51<03:44, 21.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▊                        | 10215/15000 [08:51<03:41, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▊                        | 10218/15000 [08:51<03:46, 21.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 68%|███████████████████████████████████████████████████▊                        | 10224/15000 [08:52<04:58, 15.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▊                        | 10227/15000 [08:52<04:45, 16.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▊                        | 10231/15000 [08:52<05:54, 13.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▊                        | 10235/15000 [08:53<06:51, 11.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▉                        | 10240/15000 [08:53<06:41, 11.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False False False  True False False False False
 False False  True False False False False False  True False False  True
 False False False False False False False False False False False False
 False  True False  True False False False False False False False False
 False False  True False False False False  True False False  True False
 False False False False]
<<<<< Epsilon in training is 0.20157154202094207>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 

 68%|███████████████████████████████████████████████████▉                        | 10242/15000 [08:54<09:33,  8.30it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▉                        | 10248/15000 [08:54<06:03, 13.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▉                        | 10254/15000 [08:54<04:41, 16.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|███████████████████████████████████████████████████▉                        | 10259/15000 [08:55<05:44, 13.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|████████████████████████████████████████████████████                        | 10265/15000 [08:55<04:30, 17.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 68%|████████████████████████████████████████████████████                        | 10271/15000 [08:55<03:58, 19.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 68%|████████████████████████████████████████████████████                        | 10274/15000 [08:55<03:49, 20.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████                        | 10280/15000 [08:56<05:04, 15.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 69%|████████████████████████████████████████████████████                        | 10283/15000 [08:56<04:35, 17.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▏                       | 10289/15000 [08:56<04:00, 19.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▏                       | 10295/15000 [08:57<05:06, 15.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▏                       | 10301/15000 [08:57<04:16, 18.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▏                       | 10304/15000 [08:57<04:01, 19.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▏                       | 10310/15000 [08:57<03:42, 21.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▎                       | 10316/15000 [08:58<03:32, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 69%|████████████████████████████████████████████████████▎                       | 10319/15000 [08:58<03:32, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▎                       | 10325/15000 [08:58<03:26, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▎                       | 10331/15000 [08:58<03:25, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▎                       | 10334/15000 [08:58<03:27, 22.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 69%|████████████████████████████████████████████████████▍                       | 10340/15000 [08:59<03:26, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 69%|████████████████████████████████████████████████████▍                       | 10346/15000 [08:59<03:23, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▍                       | 10352/15000 [08:59<03:26, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▍                       | 10358/15000 [09:00<03:25, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▍                       | 10361/15000 [09:00<03:24, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▌                       | 10367/15000 [09:00<03:24, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▌                       | 10373/15000 [09:00<03:23, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▌                       | 10376/15000 [09:00<03:22, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▌                       | 10382/15000 [09:01<03:20, 23.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▋                       | 10388/15000 [09:01<03:20, 22.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▋                       | 10394/15000 [09:01<03:20, 22.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 69%|████████████████████████████████████████████████████▋                       | 10397/15000 [09:01<03:21, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▋                       | 10403/15000 [09:01<03:21, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 69%|████████████████████████████████████████████████████▋                       | 10409/15000 [09:02<03:22, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▊                       | 10415/15000 [09:02<03:20, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▊                       | 10418/15000 [09:02<03:22, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 69%|████████████████████████████████████████████████████▊                       | 10421/15000 [09:02<03:29, 21.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|████████████████████████████████████████████████████▊                       | 10427/15000 [09:03<03:36, 21.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 70%|████████████████████████████████████████████████████▊                       | 10430/15000 [09:03<05:22, 14.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|████████████████████████████████████████████████████▉                       | 10436/15000 [09:03<04:19, 17.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|████████████████████████████████████████████████████▉                       | 10442/15000 [09:03<03:49, 19.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|████████████████████████████████████████████████████▉                       | 10448/15000 [09:04<03:34, 21.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|████████████████████████████████████████████████████▉                       | 10451/15000 [09:04<03:33, 21.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 70%|████████████████████████████████████████████████████▉                       | 10457/15000 [09:04<03:24, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████                       | 10463/15000 [09:04<03:29, 21.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████                       | 10466/15000 [09:05<05:17, 14.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████                       | 10471/15000 [09:05<04:34, 16.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████                       | 10474/15000 [09:05<04:33, 16.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████                       | 10479/15000 [09:06<05:24, 13.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████                       | 10482/15000 [09:06<04:43, 15.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▏                      | 10488/15000 [09:06<03:58, 18.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▏                      | 10494/15000 [09:06<03:37, 20.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [ True False  True False False False False False False False False False
 False False False False False  True False False False False False False
 False False  True False False  True False False False False False False
 False  True False False False False  True False False False False False
 False  True False False  True False False False False False False False
 False False  True False]
<<<<< Epsilon in training is 0.19814743227978304>>>>

 70%|█████████████████████████████████████████████████████▏                      | 10500/15000 [09:07<05:20, 14.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 70%|█████████████████████████████████████████████████████▏                      | 10505/15000 [09:07<04:27, 16.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 70%|█████████████████████████████████████████████████████▏                      | 10508/15000 [09:07<04:04, 18.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▎                      | 10514/15000 [09:08<05:08, 14.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 70%|█████████████████████████████████████████████████████▎                      | 10517/15000 [09:08<04:34, 16.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▎                      | 10521/15000 [09:08<04:48, 15.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▎                      | 10525/15000 [09:09<06:06, 12.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▎                      | 10528/15000 [09:09<05:23, 13.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▎                      | 10532/15000 [09:09<06:23, 11.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 70%|█████████████████████████████████████████████████████▍                      | 10535/15000 [09:09<05:13, 14.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 70%|█████████████████████████████████████████████████████▍                      | 10537/15000 [09:10<04:59, 14.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▍                      | 10541/15000 [09:10<06:09, 12.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 70%|█████████████████████████████████████████████████████▍                      | 10544/15000 [09:10<05:03, 14.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▍                      | 10549/15000 [09:11<06:06, 12.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▍                      | 10553/15000 [09:11<06:44, 10.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▍                      | 10556/15000 [09:11<05:29, 13.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▌                      | 10561/15000 [09:12<06:14, 11.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▌                      | 10564/15000 [09:12<05:13, 14.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▌                      | 10569/15000 [09:12<05:46, 12.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 70%|█████████████████████████████████████████████████████▌                      | 10572/15000 [09:13<04:59, 14.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▌                      | 10577/15000 [09:13<05:38, 13.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 71%|█████████████████████████████████████████████████████▌                      | 10580/15000 [09:13<04:52, 15.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▋                      | 10585/15000 [09:13<04:24, 16.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 71%|█████████████████████████████████████████████████████▋                      | 10590/15000 [09:14<05:14, 14.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 71%|█████████████████████████████████████████████████████▋                      | 10596/15000 [09:14<04:07, 17.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 71%|█████████████████████████████████████████████████████▋                      | 10602/15000 [09:15<05:09, 14.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▋                      | 10605/15000 [09:15<04:33, 16.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▊                      | 10609/15000 [09:15<04:37, 15.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▊                      | 10611/15000 [09:15<06:36, 11.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▊                      | 10616/15000 [09:16<05:04, 14.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▊                      | 10621/15000 [09:16<04:11, 17.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▊                      | 10627/15000 [09:16<03:50, 18.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▊                      | 10630/15000 [09:16<03:41, 19.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▉                      | 10636/15000 [09:17<03:26, 21.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 71%|█████████████████████████████████████████████████████▉                      | 10639/15000 [09:17<03:35, 20.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▉                      | 10645/15000 [09:17<04:48, 15.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▉                      | 10648/15000 [09:17<04:20, 16.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|█████████████████████████████████████████████████████▉                      | 10654/15000 [09:18<03:47, 19.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████                      | 10660/15000 [09:18<03:33, 20.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████                      | 10663/15000 [09:18<03:35, 20.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 71%|██████████████████████████████████████████████████████                      | 10669/15000 [09:18<03:23, 21.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████                      | 10672/15000 [09:19<03:19, 21.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████                      | 10677/15000 [09:19<04:48, 14.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████                      | 10680/15000 [09:19<04:17, 16.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 71%|██████████████████████████████████████████████████████▏                     | 10685/15000 [09:20<05:23, 13.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 71%|██████████████████████████████████████████████████████▏                     | 10688/15000 [09:20<04:47, 15.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████▏                     | 10696/15000 [09:20<05:12, 13.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████▏                     | 10699/15000 [09:21<04:33, 15.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████▏                     | 10705/15000 [09:21<03:49, 18.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████▎                     | 10708/15000 [09:21<03:40, 19.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████▎                     | 10714/15000 [09:21<03:23, 21.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 71%|██████████████████████████████████████████████████████▎                     | 10720/15000 [09:22<03:15, 21.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▎                     | 10726/15000 [09:22<03:11, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▎                     | 10729/15000 [09:22<03:10, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 72%|██████████████████████████████████████████████████████▍                     | 10735/15000 [09:22<03:11, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▍                     | 10738/15000 [09:22<03:10, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 72%|██████████████████████████████████████████████████████▍                     | 10744/15000 [09:23<03:08, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 72%|██████████████████████████████████████████████████████▍                     | 10750/15000 [09:23<03:12, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650] and comparison is [False False False False False False False False False False False False
 False False False  True  True  True  True False False  True False False
 False  True False False False  True False False False False False  True
 False False False False  True False False  True False False False  True
 False False False  True False  True False  True False False False False
  True  True False False]
<<<<< Epsilon in training is 

 72%|██████████████████████████████████████████████████████▍                     | 10755/15000 [09:23<05:34, 12.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▌                     | 10757/15000 [09:24<05:06, 13.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 72%|██████████████████████████████████████████████████████▌                     | 10762/15000 [09:24<05:34, 12.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▌                     | 10768/15000 [09:24<04:17, 16.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▌                     | 10771/15000 [09:24<03:56, 17.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 72%|██████████████████████████████████████████████████████▌                     | 10777/15000 [09:25<03:32, 19.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▋                     | 10783/15000 [09:25<03:18, 21.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▋                     | 10786/15000 [09:25<03:17, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▋                     | 10792/15000 [09:25<03:11, 21.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▋                     | 10795/15000 [09:26<03:11, 21.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 72%|██████████████████████████████████████████████████████▋                     | 10801/15000 [09:26<03:14, 21.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▋                     | 10804/15000 [09:26<03:31, 19.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▊                     | 10810/15000 [09:27<04:40, 14.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 72%|██████████████████████████████████████████████████████▊                     | 10813/15000 [09:27<04:12, 16.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▊                     | 10818/15000 [09:27<05:11, 13.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▊                     | 10821/15000 [09:27<04:30, 15.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▊                     | 10826/15000 [09:28<05:18, 13.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▉                     | 10831/15000 [09:28<04:27, 15.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▉                     | 10833/15000 [09:28<06:11, 11.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▉                     | 10839/15000 [09:29<04:24, 15.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▉                     | 10845/15000 [09:29<03:40, 18.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▉                     | 10848/15000 [09:29<03:28, 19.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|██████████████████████████████████████████████████████▉                     | 10854/15000 [09:29<03:15, 21.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|███████████████████████████████████████████████████████                     | 10860/15000 [09:30<03:17, 20.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|███████████████████████████████████████████████████████                     | 10863/15000 [09:30<03:19, 20.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 72%|███████████████████████████████████████████████████████                     | 10869/15000 [09:30<04:32, 15.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 72%|███████████████████████████████████████████████████████                     | 10872/15000 [09:30<04:04, 16.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 73%|███████████████████████████████████████████████████████                     | 10878/15000 [09:31<03:32, 19.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 73%|███████████████████████████████████████████████████████▏                    | 10884/15000 [09:31<03:16, 20.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▏                    | 10887/15000 [09:31<03:11, 21.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▏                    | 10893/15000 [09:31<03:06, 22.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▏                    | 10896/15000 [09:32<03:06, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▏                    | 10902/15000 [09:32<03:04, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▎                    | 10908/15000 [09:32<03:02, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▎                    | 10914/15000 [09:32<03:04, 22.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▎                    | 10917/15000 [09:32<03:04, 22.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▎                    | 10923/15000 [09:33<03:02, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▎                    | 10929/15000 [09:33<03:01, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▍                    | 10932/15000 [09:33<03:00, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 73%|███████████████████████████████████████████████████████▍                    | 10938/15000 [09:33<03:05, 21.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▍                    | 10944/15000 [09:34<03:01, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▍                    | 10947/15000 [09:34<03:01, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▍                    | 10953/15000 [09:34<03:05, 21.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▌                    | 10959/15000 [09:34<03:06, 21.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▌                    | 10962/15000 [09:35<03:08, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▌                    | 10968/15000 [09:35<03:08, 21.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▌                    | 10974/15000 [09:35<03:01, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▌                    | 10977/15000 [09:35<03:02, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▋                    | 10983/15000 [09:35<03:00, 22.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▋                    | 10989/15000 [09:36<02:58, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 73%|███████████████████████████████████████████████████████▋                    | 10995/15000 [09:36<02:56, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▋                    | 10998/15000 [09:36<02:57, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▊                    | 11004/15000 [09:38<08:50,  7.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▊                    | 11007/15000 [09:38<07:03,  9.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650] and comparison is [False False False False False False False False False False False False
  True False  True False  True False False False False False False False
 False False False False False False False False False False False  True
 False  True False False False False  True False False False False False
 False False False False False False False False False False False False
 False False False False]
<<<<< Epsilon in training is 0.19147272150234995>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 

 73%|███████████████████████████████████████████████████████▊                    | 11011/15000 [09:38<07:42,  8.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▊                    | 11015/15000 [09:39<06:06, 10.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 73%|███████████████████████████████████████████████████████▊                    | 11018/15000 [09:39<05:00, 13.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 73%|███████████████████████████████████████████████████████▊                    | 11024/15000 [09:39<03:51, 17.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|███████████████████████████████████████████████████████▉                    | 11030/15000 [09:39<03:22, 19.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|███████████████████████████████████████████████████████▉                    | 11036/15000 [09:40<03:10, 20.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|███████████████████████████████████████████████████████▉                    | 11039/15000 [09:40<03:05, 21.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|███████████████████████████████████████████████████████▉                    | 11045/15000 [09:40<03:00, 21.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|███████████████████████████████████████████████████████▉                    | 11051/15000 [09:40<02:57, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████                    | 11057/15000 [09:40<02:54, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████                    | 11063/15000 [09:41<02:53, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████                    | 11069/15000 [09:41<02:53, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 74%|████████████████████████████████████████████████████████                    | 11072/15000 [09:41<02:53, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▏                   | 11078/15000 [09:41<02:52, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▏                   | 11084/15000 [09:42<02:52, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▏                   | 11090/15000 [09:42<02:52, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▏                   | 11093/15000 [09:42<02:51, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▏                   | 11099/15000 [09:42<02:52, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▎                   | 11105/15000 [09:43<02:52, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▎                   | 11108/15000 [09:43<02:51, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 74%|████████████████████████████████████████████████████████▎                   | 11114/15000 [09:43<02:51, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▎                   | 11120/15000 [09:43<02:50, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▎                   | 11126/15000 [09:44<02:50, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▍                   | 11129/15000 [09:44<02:50, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 74%|████████████████████████████████████████████████████████▍                   | 11135/15000 [09:44<02:50, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▍                   | 11138/15000 [09:44<02:49, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▍                   | 11144/15000 [09:44<02:50, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▍                   | 11150/15000 [09:45<02:51, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▌                   | 11153/15000 [09:45<03:02, 21.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▌                   | 11159/15000 [09:45<02:56, 21.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▌                   | 11165/15000 [09:45<02:51, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 74%|████████████████████████████████████████████████████████▌                   | 11168/15000 [09:45<02:51, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 74%|████████████████████████████████████████████████████████▌                   | 11174/15000 [09:46<02:49, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▋                   | 11180/15000 [09:46<02:48, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▋                   | 11186/15000 [09:46<02:50, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 75%|████████████████████████████████████████████████████████▋                   | 11189/15000 [09:46<02:49, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▋                   | 11195/15000 [09:47<02:50, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▊                   | 11201/15000 [09:47<02:50, 22.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 75%|████████████████████████████████████████████████████████▊                   | 11207/15000 [09:47<02:48, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▊                   | 11210/15000 [09:47<02:47, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▊                   | 11216/15000 [09:48<02:47, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▊                   | 11222/15000 [09:48<02:46, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▊                   | 11225/15000 [09:48<02:46, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▉                   | 11231/15000 [09:48<02:47, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▉                   | 11237/15000 [09:48<02:46, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▉                   | 11243/15000 [09:49<02:45, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|████████████████████████████████████████████████████████▉                   | 11249/15000 [09:49<02:49, 22.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 75%|█████████████████████████████████████████████████████████                   | 11252/15000 [09:49<02:48, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 75%|█████████████████████████████████████████████████████████                   | 11258/15000 [09:49<02:46, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████                   | 11261/15000 [09:50<02:46, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████                   | 11264/15000 [09:50<02:47, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False False False False False False False False
  True False False False  True False False False  True False False False
 False False  True  True False False False False False  True False False
 False False False  True False False False False False False  True False
 False False False False False False False False False False False False
 False  True False False]
<<<<< Epsilon in training is 0.18822016112458437>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650]>>>>>>>
<<<<< next_a is [ 4650  4650  4650  4

 75%|█████████████████████████████████████████████████████████                   | 11267/15000 [09:50<05:00, 12.42it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████                   | 11270/15000 [09:50<04:20, 14.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████▏                  | 11276/15000 [09:51<03:31, 17.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████▏                  | 11282/15000 [09:51<03:08, 19.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████▏                  | 11288/15000 [09:51<02:55, 21.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████▏                  | 11291/15000 [09:51<02:52, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 75%|█████████████████████████████████████████████████████████▏                  | 11297/15000 [09:52<02:46, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████▎                  | 11303/15000 [09:52<02:43, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████▎                  | 11306/15000 [09:52<02:42, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 75%|█████████████████████████████████████████████████████████▎                  | 11312/15000 [09:52<02:43, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████▎                  | 11318/15000 [09:52<02:44, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 75%|█████████████████████████████████████████████████████████▎                  | 11321/15000 [09:53<02:46, 22.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▍                  | 11327/15000 [09:53<02:44, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 76%|█████████████████████████████████████████████████████████▍                  | 11330/15000 [09:53<02:43, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▍                  | 11336/15000 [09:53<02:43, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▍                  | 11342/15000 [09:54<02:43, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▍                  | 11345/15000 [09:54<02:43, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▌                  | 11351/15000 [09:54<02:42, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▌                  | 11357/15000 [09:54<02:41, 22.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 76%|█████████████████████████████████████████████████████████▌                  | 11363/15000 [09:54<02:39, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▌                  | 11369/15000 [09:55<02:39, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▌                  | 11372/15000 [09:55<02:39, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▋                  | 11378/15000 [09:55<02:38, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▋                  | 11381/15000 [09:55<02:42, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 76%|█████████████████████████████████████████████████████████▋                  | 11387/15000 [09:56<02:41, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▋                  | 11393/15000 [09:56<02:39, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 76%|█████████████████████████████████████████████████████████▊                  | 11399/15000 [09:56<02:38, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▊                  | 11405/15000 [09:56<02:39, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▊                  | 11411/15000 [09:57<02:37, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 76%|█████████████████████████████████████████████████████████▊                  | 11414/15000 [09:57<02:41, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 76%|█████████████████████████████████████████████████████████▊                  | 11420/15000 [09:57<02:39, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▉                  | 11426/15000 [09:57<02:39, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▉                  | 11429/15000 [09:57<02:38, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▉                  | 11435/15000 [09:58<02:37, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▉                  | 11438/15000 [09:58<02:37, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 76%|█████████████████████████████████████████████████████████▉                  | 11444/15000 [09:58<02:44, 21.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|█████████████████████████████████████████████████████████▉                  | 11447/15000 [09:58<02:42, 21.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|██████████████████████████████████████████████████████████                  | 11453/15000 [09:59<03:39, 16.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|██████████████████████████████████████████████████████████                  | 11456/15000 [09:59<03:24, 17.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 76%|██████████████████████████████████████████████████████████                  | 11460/15000 [09:59<04:20, 13.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|██████████████████████████████████████████████████████████                  | 11463/15000 [09:59<03:45, 15.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|██████████████████████████████████████████████████████████                  | 11468/15000 [10:00<03:14, 18.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 76%|██████████████████████████████████████████████████████████▏                 | 11474/15000 [10:00<02:54, 20.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▏                 | 11480/15000 [10:00<02:52, 20.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 77%|██████████████████████████████████████████████████████████▏                 | 11483/15000 [10:01<04:09, 14.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▏                 | 11489/15000 [10:01<03:20, 17.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▏                 | 11492/15000 [10:01<03:14, 18.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▎                 | 11498/15000 [10:01<02:53, 20.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▎                 | 11504/15000 [10:02<02:46, 21.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▎                 | 11507/15000 [10:02<02:43, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▎                 | 11513/15000 [10:02<02:39, 21.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▎                 | 11519/15000 [10:02<02:36, 22.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False  True False False False False False False False  True  True
  True False False False False False False False False False False False
 False False False False  True False False False  True False  True False
 False False False  True False False False  True  True False False False
 False False  True False False False False  True  True False False False
 False False  True False]
<<<<< Epsilon in training is 0.18502285221516376>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 

 77%|██████████████████████████████████████████████████████████▍                 | 11522/15000 [10:03<04:39, 12.45it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▍                 | 11528/15000 [10:03<03:34, 16.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▍                 | 11534/15000 [10:03<03:04, 18.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▍                 | 11540/15000 [10:03<02:48, 20.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▍                 | 11546/15000 [10:04<02:39, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▌                 | 11549/15000 [10:04<02:37, 21.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▌                 | 11555/15000 [10:04<02:39, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 77%|██████████████████████████████████████████████████████████▌                 | 11561/15000 [10:05<03:33, 16.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▌                 | 11564/15000 [10:05<03:14, 17.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▌                 | 11570/15000 [10:05<02:52, 19.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▋                 | 11573/15000 [10:05<02:47, 20.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▋                 | 11579/15000 [10:05<02:38, 21.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▋                 | 11585/15000 [10:06<02:35, 21.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▋                 | 11591/15000 [10:06<02:37, 21.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 77%|██████████████████████████████████████████████████████████▋                 | 11594/15000 [10:06<02:42, 20.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 77%|██████████████████████████████████████████████████████████▊                 | 11600/15000 [10:06<02:36, 21.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▊                 | 11603/15000 [10:07<02:34, 21.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▊                 | 11609/15000 [10:07<02:30, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▊                 | 11612/15000 [10:07<02:30, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▊                 | 11618/15000 [10:07<02:31, 22.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 77%|██████████████████████████████████████████████████████████▉                 | 11624/15000 [10:08<02:30, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|██████████████████████████████████████████████████████████▉                 | 11630/15000 [10:08<02:32, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 78%|██████████████████████████████████████████████████████████▉                 | 11633/15000 [10:08<02:32, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 78%|██████████████████████████████████████████████████████████▉                 | 11639/15000 [10:08<02:31, 22.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████                 | 11645/15000 [10:08<02:29, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████                 | 11651/15000 [10:09<02:27, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████                 | 11657/15000 [10:09<02:27, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████                 | 11660/15000 [10:09<02:27, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████                 | 11666/15000 [10:09<02:26, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▏                | 11672/15000 [10:10<02:25, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▏                | 11675/15000 [10:10<02:25, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▏                | 11681/15000 [10:10<02:25, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▏                | 11687/15000 [10:10<02:27, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 78%|███████████████████████████████████████████████████████████▏                | 11693/15000 [10:11<02:26, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▎                | 11696/15000 [10:11<02:25, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▎                | 11702/15000 [10:11<02:24, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 78%|███████████████████████████████████████████████████████████▎                | 11708/15000 [10:11<02:24, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▎                | 11714/15000 [10:11<02:22, 23.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▎                | 11717/15000 [10:12<02:22, 22.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 78%|███████████████████████████████████████████████████████████▍                | 11723/15000 [10:12<02:23, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▍                | 11729/15000 [10:12<02:23, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▍                | 11735/15000 [10:12<02:23, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▍                | 11738/15000 [10:13<02:23, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▌                | 11744/15000 [10:13<02:24, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▌                | 11747/15000 [10:13<02:23, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▌                | 11753/15000 [10:13<02:23, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▌                | 11759/15000 [10:13<02:22, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▌                | 11765/15000 [10:14<02:23, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 78%|███████████████████████████████████████████████████████████▌                | 11768/15000 [10:14<02:22, 22.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 78%|███████████████████████████████████████████████████████████▋                | 11774/15000 [10:14<02:21, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False  True False False False False False False
  True False False False False False  True False False False  True False
 False  True False False False False False False False False  True False
 False False False False  True  True False False  True False False False
 False False False False False False False False False False False False
 False False False False]
<<<<< Epsilon in training is 0.18187985621356967>>>>>>>
<<<<< nex

 79%|███████████████████████████████████████████████████████████▋                | 11780/15000 [10:15<03:44, 14.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 79%|███████████████████████████████████████████████████████████▋                | 11785/15000 [10:15<03:08, 17.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|███████████████████████████████████████████████████████████▋                | 11791/15000 [10:15<02:44, 19.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|███████████████████████████████████████████████████████████▊                | 11794/15000 [10:15<02:37, 20.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|███████████████████████████████████████████████████████████▊                | 11799/15000 [10:16<02:50, 18.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|███████████████████████████████████████████████████████████▊                | 11804/15000 [10:16<02:47, 19.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|███████████████████████████████████████████████████████████▊                | 11807/15000 [10:16<02:42, 19.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 79%|███████████████████████████████████████████████████████████▊                | 11813/15000 [10:16<02:32, 20.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|███████████████████████████████████████████████████████████▊                | 11816/15000 [10:17<02:29, 21.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|███████████████████████████████████████████████████████████▉                | 11822/15000 [10:17<02:24, 21.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 79%|███████████████████████████████████████████████████████████▉                | 11828/15000 [10:17<02:21, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|███████████████████████████████████████████████████████████▉                | 11834/15000 [10:17<02:19, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|███████████████████████████████████████████████████████████▉                | 11840/15000 [10:18<02:21, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████                | 11843/15000 [10:18<02:21, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████                | 11846/15000 [10:18<02:32, 20.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 79%|████████████████████████████████████████████████████████████                | 11852/15000 [10:18<02:26, 21.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████                | 11855/15000 [10:18<02:23, 21.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████                | 11858/15000 [10:19<03:39, 14.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████                | 11864/15000 [10:19<02:57, 17.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 79%|████████████████████████████████████████████████████████████▏               | 11870/15000 [10:19<02:36, 19.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████▏               | 11873/15000 [10:19<02:31, 20.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████▏               | 11879/15000 [10:20<02:24, 21.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████▏               | 11885/15000 [10:20<02:21, 21.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████▏               | 11891/15000 [10:20<02:21, 21.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████▎               | 11894/15000 [10:20<02:26, 21.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████▎               | 11900/15000 [10:21<02:24, 21.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 79%|████████████████████████████████████████████████████████████▎               | 11906/15000 [10:21<02:19, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 79%|████████████████████████████████████████████████████████████▎               | 11909/15000 [10:21<02:17, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████▎               | 11915/15000 [10:21<02:18, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████▍               | 11921/15000 [10:21<02:16, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 79%|████████████████████████████████████████████████████████████▍               | 11924/15000 [10:22<02:18, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▍               | 11930/15000 [10:22<02:16, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▍               | 11936/15000 [10:22<02:15, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▍               | 11939/15000 [10:22<02:15, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▌               | 11945/15000 [10:23<02:13, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▌               | 11951/15000 [10:23<02:13, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 80%|████████████████████████████████████████████████████████████▌               | 11957/15000 [10:23<02:14, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▌               | 11960/15000 [10:23<02:14, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▋               | 11966/15000 [10:23<02:13, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 80%|████████████████████████████████████████████████████████████▋               | 11972/15000 [10:24<02:12, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▋               | 11975/15000 [10:24<02:13, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▋               | 11981/15000 [10:24<02:13, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 80%|████████████████████████████████████████████████████████████▋               | 11984/15000 [10:24<02:14, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▋               | 11990/15000 [10:25<02:13, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▊               | 11996/15000 [10:25<02:12, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▊               | 11999/15000 [10:25<02:13, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▊               | 12005/15000 [10:26<06:31,  7.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▊               | 12011/15000 [10:27<04:18, 11.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▊               | 12014/15000 [10:27<03:39, 13.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 80%|████████████████████████████████████████████████████████████▉               | 12020/15000 [10:27<02:54, 17.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▉               | 12026/15000 [10:28<03:23, 14.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|████████████████████████████████████████████████████████████▉               | 12032/15000 [10:28<02:45, 17.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [ True False  True False False False False False False False False  True
  True  True False False False False False  True False False False False
 False False  True False False False False False False  True False False
 False False False False False False  True False False  True False False
 False  True False False False False False False False False False  True
 False  True False False]
<<<<< Epsilon in training is 0.1787902505026763>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 46

 80%|████████████████████████████████████████████████████████████▉               | 12035/15000 [10:28<04:19, 11.41it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|█████████████████████████████████████████████████████████████               | 12041/15000 [10:29<03:13, 15.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|█████████████████████████████████████████████████████████████               | 12044/15000 [10:29<02:55, 16.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|█████████████████████████████████████████████████████████████               | 12052/15000 [10:29<03:25, 14.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|█████████████████████████████████████████████████████████████               | 12055/15000 [10:30<03:00, 16.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|█████████████████████████████████████████████████████████████               | 12061/15000 [10:30<02:33, 19.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|█████████████████████████████████████████████████████████████▏              | 12067/15000 [10:30<02:21, 20.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 80%|█████████████████████████████████████████████████████████████▏              | 12070/15000 [10:30<02:17, 21.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▏              | 12076/15000 [10:30<02:12, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 81%|█████████████████████████████████████████████████████████████▏              | 12082/15000 [10:31<02:10, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▏              | 12088/15000 [10:31<02:07, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▎              | 12091/15000 [10:31<02:09, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 81%|█████████████████████████████████████████████████████████████▎              | 12097/15000 [10:31<02:12, 21.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▎              | 12100/15000 [10:32<02:19, 20.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▎              | 12103/15000 [10:32<03:25, 14.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▎              | 12109/15000 [10:32<02:47, 17.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▍              | 12115/15000 [10:32<02:25, 19.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▍              | 12118/15000 [10:33<02:19, 20.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 81%|█████████████████████████████████████████████████████████████▍              | 12124/15000 [10:33<02:12, 21.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▍              | 12130/15000 [10:33<02:09, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▍              | 12133/15000 [10:33<02:09, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▌              | 12139/15000 [10:34<02:06, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▌              | 12142/15000 [10:34<02:06, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 81%|█████████████████████████████████████████████████████████████▌              | 12148/15000 [10:34<02:06, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▌              | 12154/15000 [10:34<02:06, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▌              | 12157/15000 [10:34<02:05, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▋              | 12163/15000 [10:35<02:05, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▋              | 12169/15000 [10:35<02:05, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▋              | 12175/15000 [10:35<02:06, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▋              | 12178/15000 [10:35<02:05, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▋              | 12184/15000 [10:35<02:04, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▊              | 12190/15000 [10:36<02:04, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▊              | 12193/15000 [10:36<02:03, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▊              | 12199/15000 [10:36<02:04, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▊              | 12205/15000 [10:36<02:04, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▊              | 12208/15000 [10:37<02:04, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▉              | 12214/15000 [10:37<02:04, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▉              | 12220/15000 [10:37<02:08, 21.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 81%|█████████████████████████████████████████████████████████████▉              | 12223/15000 [10:37<02:07, 21.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 82%|█████████████████████████████████████████████████████████████▉              | 12229/15000 [10:38<02:08, 21.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 82%|█████████████████████████████████████████████████████████████▉              | 12235/15000 [10:38<02:04, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████              | 12238/15000 [10:38<02:03, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████              | 12244/15000 [10:38<02:01, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████              | 12250/15000 [10:38<02:00, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████              | 12253/15000 [10:39<02:00, 22.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████              | 12259/15000 [10:39<01:59, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 82%|██████████████████████████████████████████████████████████████▏             | 12265/15000 [10:39<02:03, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▏             | 12271/15000 [10:39<02:01, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▏             | 12274/15000 [10:40<02:00, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▏             | 12280/15000 [10:40<01:59, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▏             | 12286/15000 [10:40<01:58, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False False False False False False False  True
 False False False False False  True  True  True False False False False
 False False  True False False False False False  True False False False
  True False  True False False False False False False  True False False
 False False False False  True False False False False False False False
 False False False 

 82%|██████████████████████████████████████████████████████████████▎             | 12289/15000 [10:41<03:38, 12.38it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▎             | 12295/15000 [10:41<02:46, 16.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▎             | 12301/15000 [10:41<02:25, 18.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 82%|██████████████████████████████████████████████████████████████▎             | 12307/15000 [10:41<02:11, 20.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▎             | 12310/15000 [10:41<02:07, 21.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▍             | 12316/15000 [10:42<02:04, 21.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▍             | 12322/15000 [10:42<02:01, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▍             | 12328/15000 [10:42<01:58, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▍             | 12331/15000 [10:42<01:57, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▍             | 12334/15000 [10:43<02:03, 21.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▌             | 12339/15000 [10:43<02:55, 15.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▌             | 12342/15000 [10:43<02:36, 16.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▌             | 12349/15000 [10:44<03:06, 14.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▌             | 12352/15000 [10:44<02:42, 16.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▌             | 12358/15000 [10:44<02:17, 19.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▋             | 12364/15000 [10:44<02:06, 20.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▋             | 12367/15000 [10:45<02:03, 21.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 82%|██████████████████████████████████████████████████████████████▋             | 12373/15000 [10:45<01:58, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|██████████████████████████████████████████████████████████████▋             | 12379/15000 [10:45<01:56, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|██████████████████████████████████████████████████████████████▋             | 12382/15000 [10:45<01:57, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 83%|██████████████████████████████████████████████████████████████▊             | 12388/15000 [10:45<01:57, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|██████████████████████████████████████████████████████████████▊             | 12394/15000 [10:46<01:55, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|██████████████████████████████████████████████████████████████▊             | 12400/15000 [10:46<01:55, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|██████████████████████████████████████████████████████████████▊             | 12406/15000 [10:46<01:54, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|██████████████████████████████████████████████████████████████▊             | 12409/15000 [10:46<01:54, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 83%|██████████████████████████████████████████████████████████████▉             | 12415/15000 [10:47<01:53, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|██████████████████████████████████████████████████████████████▉             | 12418/15000 [10:47<01:53, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 83%|██████████████████████████████████████████████████████████████▉             | 12424/15000 [10:47<01:54, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|██████████████████████████████████████████████████████████████▉             | 12430/15000 [10:47<01:54, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████             | 12436/15000 [10:48<01:52, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████             | 12439/15000 [10:48<01:52, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 83%|███████████████████████████████████████████████████████████████             | 12445/15000 [10:48<01:52, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 83%|███████████████████████████████████████████████████████████████             | 12448/15000 [10:48<02:50, 14.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████             | 12454/15000 [10:49<02:20, 18.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 83%|███████████████████████████████████████████████████████████████▏            | 12460/15000 [10:49<02:05, 20.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 83%|███████████████████████████████████████████████████████████████▏            | 12466/15000 [10:49<01:58, 21.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 83%|███████████████████████████████████████████████████████████████▏            | 12472/15000 [10:49<01:54, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▏            | 12475/15000 [10:50<01:53, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▏            | 12481/15000 [10:50<01:51, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▎            | 12484/15000 [10:50<01:51, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▎            | 12490/15000 [10:50<01:50, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▎            | 12496/15000 [10:50<01:51, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▎            | 12502/15000 [10:51<01:50, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▎            | 12508/15000 [10:51<01:49, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▍            | 12511/15000 [10:51<01:49, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▍            | 12517/15000 [10:51<01:48, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 83%|███████████████████████████████████████████████████████████████▍            | 12523/15000 [10:52<01:48, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▍            | 12526/15000 [10:52<01:48, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▍            | 12532/15000 [10:52<01:47, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▌            | 12538/15000 [10:52<01:48, 22.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 84%|███████████████████████████████████████████████████████████████▌            | 12541/15000 [10:52<01:47, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▌            | 12544/15000 [10:53<01:47, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False False False False False False False False
 False False False False False False False False  True  True  True False
 False  True False  True False  True False False False  True False False
  True False False False False False False False  True False False False
 False False False False False False False False False False  True False
 False False False  True]
<<<<< Epsilon in training is 0.17276759758106217>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 46

 84%|███████████████████████████████████████████████████████████████▌            | 12547/15000 [10:53<03:14, 12.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 84%|███████████████████████████████████████████████████████████████▌            | 12553/15000 [10:53<02:29, 16.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 84%|███████████████████████████████████████████████████████████████▋            | 12559/15000 [10:54<02:07, 19.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▋            | 12565/15000 [10:54<01:56, 20.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▋            | 12568/15000 [10:54<01:53, 21.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▋            | 12574/15000 [10:54<01:50, 22.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▋            | 12580/15000 [10:55<01:56, 20.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▊            | 12583/15000 [10:55<01:56, 20.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▊            | 12589/15000 [10:55<01:52, 21.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 84%|███████████████████████████████████████████████████████████████▊            | 12595/15000 [10:55<01:49, 22.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 84%|███████████████████████████████████████████████████████████████▊            | 12601/15000 [10:55<01:47, 22.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 84%|███████████████████████████████████████████████████████████████▊            | 12604/15000 [10:56<01:47, 22.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 84%|███████████████████████████████████████████████████████████████▉            | 12610/15000 [10:56<01:46, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▉            | 12616/15000 [10:56<01:46, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▉            | 12619/15000 [10:56<01:45, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▉            | 12625/15000 [10:57<01:45, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|███████████████████████████████████████████████████████████████▉            | 12631/15000 [10:57<01:43, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|████████████████████████████████████████████████████████████████            | 12634/15000 [10:57<01:43, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|████████████████████████████████████████████████████████████████            | 12640/15000 [10:57<01:44, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|████████████████████████████████████████████████████████████████            | 12646/15000 [10:57<01:45, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|████████████████████████████████████████████████████████████████            | 12649/15000 [10:58<01:45, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|████████████████████████████████████████████████████████████████            | 12655/15000 [10:58<01:46, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 84%|████████████████████████████████████████████████████████████████▏           | 12661/15000 [10:58<01:44, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 84%|████████████████████████████████████████████████████████████████▏           | 12664/15000 [10:58<01:45, 22.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 84%|████████████████████████████████████████████████████████████████▏           | 12670/15000 [10:59<01:43, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▏           | 12676/15000 [10:59<01:43, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▏           | 12679/15000 [10:59<01:48, 21.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 85%|████████████████████████████████████████████████████████████████▎           | 12685/15000 [10:59<01:49, 21.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▎           | 12688/15000 [10:59<01:46, 21.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▎           | 12694/15000 [11:00<01:44, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 85%|████████████████████████████████████████████████████████████████▎           | 12700/15000 [11:00<01:43, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▍           | 12706/15000 [11:00<01:45, 21.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▍           | 12709/15000 [11:00<01:47, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 85%|████████████████████████████████████████████████████████████████▍           | 12715/15000 [11:01<01:44, 21.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 85%|████████████████████████████████████████████████████████████████▍           | 12721/15000 [11:01<01:42, 22.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▍           | 12724/15000 [11:01<01:44, 21.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▍           | 12730/15000 [11:01<01:42, 22.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▌           | 12736/15000 [11:02<01:48, 20.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▌           | 12739/15000 [11:02<01:45, 21.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▌           | 12745/15000 [11:02<01:42, 21.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▌           | 12748/15000 [11:02<01:41, 22.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▌           | 12754/15000 [11:02<01:41, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▋           | 12760/15000 [11:03<01:40, 22.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 85%|████████████████████████████████████████████████████████████████▋           | 12766/15000 [11:03<01:40, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▋           | 12769/15000 [11:03<01:41, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 85%|████████████████████████████████████████████████████████████████▋           | 12775/15000 [11:03<01:41, 21.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▊           | 12781/15000 [11:04<01:39, 22.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▊           | 12784/15000 [11:04<01:39, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▊           | 12790/15000 [11:04<01:39, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▊           | 12796/15000 [11:04<01:39, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▊           | 12799/15000 [11:04<01:40, 21.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False  True False False  True False False False False False
 False False False False False False  True  True False  True False False
 False False False False False False False False False False False False
 False False False  True False  True False False False False False False
 False False False False  True False False False False False False False
  True False False False]
<<<<< Epsilon in training is 0.16983278243849356>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 465

 85%|████████████████████████████████████████████████████████████████▊           | 12802/15000 [11:05<03:00, 12.17it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▉           | 12808/15000 [11:05<02:17, 15.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▉           | 12811/15000 [11:05<02:08, 16.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▉           | 12817/15000 [11:06<01:52, 19.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 85%|████████████████████████████████████████████████████████████████▉           | 12823/15000 [11:06<01:46, 20.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|████████████████████████████████████████████████████████████████▉           | 12826/15000 [11:06<01:43, 21.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████           | 12832/15000 [11:06<01:40, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████           | 12838/15000 [11:07<01:38, 21.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 86%|█████████████████████████████████████████████████████████████████           | 12841/15000 [11:07<01:39, 21.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████           | 12847/15000 [11:07<01:39, 21.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████           | 12850/15000 [11:07<01:40, 21.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▏          | 12856/15000 [11:07<01:49, 19.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 86%|█████████████████████████████████████████████████████████████████▏          | 12862/15000 [11:08<01:43, 20.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▏          | 12865/15000 [11:08<01:42, 20.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▏          | 12868/15000 [11:08<01:41, 21.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▏          | 12874/15000 [11:08<01:41, 20.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▎          | 12880/15000 [11:09<01:38, 21.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 86%|█████████████████████████████████████████████████████████████████▎          | 12883/15000 [11:09<01:37, 21.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▎          | 12889/15000 [11:09<01:37, 21.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▎          | 12895/15000 [11:09<01:37, 21.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 86%|█████████████████████████████████████████████████████████████████▎          | 12898/15000 [11:09<01:37, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▍          | 12904/15000 [11:10<01:36, 21.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▍          | 12907/15000 [11:10<01:36, 21.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▍          | 12913/15000 [11:10<01:36, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▍          | 12919/15000 [11:10<01:37, 21.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▍          | 12922/15000 [11:11<01:36, 21.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▌          | 12928/15000 [11:11<01:36, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 86%|█████████████████████████████████████████████████████████████████▌          | 12931/15000 [11:11<01:35, 21.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▌          | 12937/15000 [11:11<01:37, 21.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 86%|█████████████████████████████████████████████████████████████████▌          | 12940/15000 [11:11<01:36, 21.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▌          | 12946/15000 [11:12<01:37, 21.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 86%|█████████████████████████████████████████████████████████████████▌          | 12949/15000 [11:12<01:36, 21.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▋          | 12955/15000 [11:12<01:38, 20.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▋          | 12958/15000 [11:12<01:40, 20.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▋          | 12961/15000 [11:13<02:27, 13.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▋          | 12967/15000 [11:13<01:59, 17.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 86%|█████████████████████████████████████████████████████████████████▋          | 12970/15000 [11:13<01:50, 18.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 87%|█████████████████████████████████████████████████████████████████▋          | 12976/15000 [11:13<01:40, 20.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|█████████████████████████████████████████████████████████████████▊          | 12982/15000 [11:14<01:36, 20.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|█████████████████████████████████████████████████████████████████▊          | 12985/15000 [11:14<01:34, 21.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|█████████████████████████████████████████████████████████████████▊          | 12991/15000 [11:14<01:31, 21.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|█████████████████████████████████████████████████████████████████▊          | 12997/15000 [11:14<01:30, 22.02it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|█████████████████████████████████████████████████████████████████▊          | 13000/15000 [11:14<01:31, 21.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|█████████████████████████████████████████████████████████████████▉          | 13003/15000 [11:16<05:49,  5.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|█████████████████████████████████████████████████████████████████▉          | 13011/15000 [11:16<03:44,  8.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|█████████████████████████████████████████████████████████████████▉          | 13014/15000 [11:17<03:03, 10.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 87%|█████████████████████████████████████████████████████████████████▉          | 13019/15000 [11:17<02:18, 14.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|█████████████████████████████████████████████████████████████████▉          | 13025/15000 [11:17<01:51, 17.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████          | 13028/15000 [11:17<01:45, 18.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 87%|██████████████████████████████████████████████████████████████████          | 13034/15000 [11:18<01:37, 20.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████          | 13040/15000 [11:18<01:32, 21.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████          | 13043/15000 [11:18<01:31, 21.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 87%|██████████████████████████████████████████████████████████████████          | 13049/15000 [11:18<01:37, 19.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▏         | 13055/15000 [11:19<01:32, 20.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False False  True False False False False  True False
 False False False False False  True False False False False False  True
 False False False False False  True False False  True False False False
 False  True False False False False  True False False False False False
 False False False  True False False False False False  True False  True
 False False False False]
<<<<< Epsilon in training is 0.1669478212039589

 87%|██████████████████████████████████████████████████████████████████▏         | 13060/15000 [11:19<02:31, 12.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 87%|██████████████████████████████████████████████████████████████████▏         | 13064/15000 [11:19<02:13, 14.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▏         | 13068/15000 [11:20<02:05, 15.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▏         | 13073/15000 [11:20<01:47, 17.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▎         | 13076/15000 [11:20<01:39, 19.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▎         | 13082/15000 [11:20<01:30, 21.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▎         | 13085/15000 [11:20<01:28, 21.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▎         | 13091/15000 [11:21<01:26, 21.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▎         | 13097/15000 [11:21<01:26, 21.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▎         | 13100/15000 [11:21<01:26, 21.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▍         | 13106/15000 [11:21<01:27, 21.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 87%|██████████████████████████████████████████████████████████████████▍         | 13112/15000 [11:22<01:26, 21.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▍         | 13118/15000 [11:22<01:24, 22.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 87%|██████████████████████████████████████████████████████████████████▍         | 13124/15000 [11:22<01:23, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▌         | 13127/15000 [11:22<01:23, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▌         | 13133/15000 [11:23<01:22, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▌         | 13136/15000 [11:23<01:26, 21.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▌         | 13142/15000 [11:23<01:25, 21.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▌         | 13148/15000 [11:23<01:24, 21.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▋         | 13154/15000 [11:24<01:22, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 88%|██████████████████████████████████████████████████████████████████▋         | 13159/15000 [11:24<01:59, 15.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▋         | 13164/15000 [11:24<01:44, 17.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▋         | 13167/15000 [11:24<01:35, 19.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▋         | 13173/15000 [11:25<01:27, 20.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 88%|██████████████████████████████████████████████████████████████████▊         | 13179/15000 [11:25<01:24, 21.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▊         | 13185/15000 [11:25<01:22, 22.13it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▊         | 13188/15000 [11:25<01:21, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▊         | 13194/15000 [11:26<01:21, 22.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▊         | 13197/15000 [11:26<01:29, 20.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▉         | 13200/15000 [11:26<02:12, 13.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 88%|██████████████████████████████████████████████████████████████████▉         | 13206/15000 [11:26<01:44, 17.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▉         | 13209/15000 [11:27<01:37, 18.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▉         | 13215/15000 [11:27<01:28, 20.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|██████████████████████████████████████████████████████████████████▉         | 13221/15000 [11:27<01:23, 21.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████         | 13227/15000 [11:27<01:20, 22.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████         | 13230/15000 [11:27<01:20, 22.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████         | 13236/15000 [11:28<01:21, 21.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 88%|███████████████████████████████████████████████████████████████████         | 13239/15000 [11:28<01:23, 21.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████         | 13245/15000 [11:28<01:20, 21.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████         | 13248/15000 [11:28<01:19, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████▏        | 13254/15000 [11:29<01:18, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████▏        | 13260/15000 [11:29<01:16, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████▏        | 13266/15000 [11:29<01:16, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████▏        | 13269/15000 [11:29<01:16, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 88%|███████████████████████████████████████████████████████████████████▎        | 13275/15000 [11:30<01:16, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 89%|███████████████████████████████████████████████████████████████████▎        | 13278/15000 [11:30<01:16, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▎        | 13284/15000 [11:30<01:15, 22.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▎        | 13290/15000 [11:30<01:16, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▎        | 13296/15000 [11:30<01:16, 22.27it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▍        | 13299/15000 [11:31<01:16, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 89%|███████████████████████████████████████████████████████████████████▍        | 13305/15000 [11:31<01:14, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▍        | 13311/15000 [11:31<01:14, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650] and comparison is [False  True  True False False False  True False  True False False False
 False False False  True False False False False False  True  True False
 False False  True False False  True False False False False False False
 False False False False False False  True  True False False False False
 False False False False False  True False False  True False  True  True
 False False False False]
<<<<< Epsilon in training is 0.1641118670056706>>>>>>>
<<<<< next_action is [4650 4650 4650 4

 89%|███████████████████████████████████████████████████████████████████▍        | 13314/15000 [11:32<02:15, 12.40it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▍        | 13320/15000 [11:32<01:44, 16.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▌        | 13323/15000 [11:32<01:35, 17.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▌        | 13329/15000 [11:32<01:23, 20.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▌        | 13335/15000 [11:33<01:18, 21.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▌        | 13341/15000 [11:33<01:15, 22.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 89%|███████████████████████████████████████████████████████████████████▌        | 13344/15000 [11:33<01:15, 21.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 89%|███████████████████████████████████████████████████████████████████▋        | 13350/15000 [11:33<01:14, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 89%|███████████████████████████████████████████████████████████████████▋        | 13356/15000 [11:33<01:12, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▋        | 13359/15000 [11:34<01:13, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▋        | 13365/15000 [11:34<01:11, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▋        | 13371/15000 [11:34<01:11, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▊        | 13374/15000 [11:34<01:11, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▊        | 13380/15000 [11:35<01:10, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 89%|███████████████████████████████████████████████████████████████████▊        | 13386/15000 [11:35<01:37, 16.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▊        | 13389/15000 [11:35<01:29, 18.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▊        | 13395/15000 [11:35<01:20, 19.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▉        | 13401/15000 [11:36<01:15, 21.22it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▉        | 13407/15000 [11:36<01:14, 21.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▉        | 13410/15000 [11:36<01:12, 21.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▉        | 13416/15000 [11:36<01:11, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 89%|███████████████████████████████████████████████████████████████████▉        | 13419/15000 [11:37<01:13, 21.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████        | 13425/15000 [11:37<01:44, 15.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 90%|████████████████████████████████████████████████████████████████████        | 13431/15000 [11:37<01:26, 18.14it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████        | 13437/15000 [11:38<01:45, 14.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████        | 13440/15000 [11:38<01:34, 16.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▏       | 13446/15000 [11:38<01:23, 18.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▏       | 13452/15000 [11:39<01:17, 19.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 90%|████████████████████████████████████████████████████████████████████▏       | 13455/15000 [11:39<01:14, 20.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▏       | 13461/15000 [11:39<01:10, 21.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▏       | 13464/15000 [11:39<01:09, 22.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 90%|████████████████████████████████████████████████████████████████████▏       | 13470/15000 [11:39<01:08, 22.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▎       | 13476/15000 [11:40<01:06, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▎       | 13482/15000 [11:40<01:06, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▎       | 13485/15000 [11:40<01:06, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▎       | 13491/15000 [11:40<01:06, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▎       | 13494/15000 [11:40<01:06, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▍       | 13500/15000 [11:41<01:05, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▍       | 13506/15000 [11:41<01:05, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 90%|████████████████████████████████████████████████████████████████████▍       | 13509/15000 [11:41<01:06, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▍       | 13515/15000 [11:41<01:04, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 90%|████████████████████████████████████████████████████████████████████▌       | 13521/15000 [11:42<01:04, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▌       | 13527/15000 [11:42<01:04, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▌       | 13530/15000 [11:42<01:04, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▌       | 13536/15000 [11:42<01:04, 22.70it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▌       | 13542/15000 [11:43<01:04, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▋       | 13545/15000 [11:43<01:04, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▋       | 13551/15000 [11:43<01:04, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▋       | 13554/15000 [11:43<01:03, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 90%|████████████████████████████████████████████████████████████████████▋       | 13560/15000 [11:43<01:03, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▋       | 13566/15000 [11:44<01:03, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False  True False False False False False False False False False False
 False False False  True False False False False False  True False  True
 False False False False False False False False False False False  True
 False False False False False  True False  True False False False False
 False False  True  True False False False  True False False  True False
 False False False False]
<<<<< Epsilon in training is 0.1613240873577105>>>>

 90%|████████████████████████████████████████████████████████████████████▊       | 13572/15000 [11:44<01:40, 14.23it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 90%|████████████████████████████████████████████████████████████████████▊       | 13575/15000 [11:44<01:29, 15.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|████████████████████████████████████████████████████████████████████▊       | 13581/15000 [11:45<01:17, 18.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|████████████████████████████████████████████████████████████████████▊       | 13584/15000 [11:45<01:13, 19.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|████████████████████████████████████████████████████████████████████▊       | 13587/15000 [11:45<01:09, 20.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|████████████████████████████████████████████████████████████████████▊       | 13593/15000 [11:45<01:33, 15.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|████████████████████████████████████████████████████████████████████▉       | 13599/15000 [11:46<01:16, 18.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|████████████████████████████████████████████████████████████████████▉       | 13602/15000 [11:46<01:14, 18.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 91%|████████████████████████████████████████████████████████████████████▉       | 13608/15000 [11:46<01:07, 20.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|████████████████████████████████████████████████████████████████████▉       | 13614/15000 [11:46<01:03, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|████████████████████████████████████████████████████████████████████▉       | 13617/15000 [11:47<01:03, 21.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████       | 13623/15000 [11:47<01:03, 21.77it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 91%|█████████████████████████████████████████████████████████████████████       | 13629/15000 [11:47<01:01, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████       | 13635/15000 [11:47<01:00, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████       | 13638/15000 [11:47<01:00, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▏      | 13644/15000 [11:48<01:00, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▏      | 13650/15000 [11:48<01:00, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▏      | 13653/15000 [11:48<01:00, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▏      | 13659/15000 [11:48<01:01, 21.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▏      | 13662/15000 [11:49<01:00, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▎      | 13668/15000 [11:49<00:59, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 91%|█████████████████████████████████████████████████████████████████████▎      | 13674/15000 [11:49<00:59, 22.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▎      | 13677/15000 [11:49<00:59, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▎      | 13683/15000 [11:49<00:58, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▎      | 13689/15000 [11:50<00:58, 22.33it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▎      | 13692/15000 [11:50<00:58, 22.40it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▍      | 13698/15000 [11:50<00:58, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 91%|█████████████████████████████████████████████████████████████████████▍      | 13704/15000 [11:50<00:57, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▍      | 13710/15000 [11:51<00:56, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▍      | 13713/15000 [11:51<00:56, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▌      | 13719/15000 [11:51<00:56, 22.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 91%|█████████████████████████████████████████████████████████████████████▌      | 13722/15000 [11:51<00:56, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▌      | 13728/15000 [11:51<00:56, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▌      | 13734/15000 [11:52<00:57, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▌      | 13740/15000 [11:52<00:56, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▋      | 13743/15000 [11:52<00:55, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▋      | 13749/15000 [11:52<00:55, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 92%|█████████████████████████████████████████████████████████████████████▋      | 13755/15000 [11:53<00:54, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 92%|█████████████████████████████████████████████████████████████████████▋      | 13761/15000 [11:53<00:54, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▋      | 13764/15000 [11:53<00:54, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 92%|█████████████████████████████████████████████████████████████████████▊      | 13770/15000 [11:53<00:53, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▊      | 13776/15000 [11:54<00:53, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▊      | 13782/15000 [11:54<00:53, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▊      | 13785/15000 [11:54<00:53, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▊      | 13791/15000 [11:54<00:54, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▉      | 13794/15000 [11:54<00:54, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▉      | 13800/15000 [11:55<00:53, 22.49it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▉      | 13806/15000 [11:55<00:54, 21.96it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 92%|█████████████████████████████████████████████████████████████████████▉      | 13809/15000 [11:55<00:54, 21.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|█████████████████████████████████████████████████████████████████████▉      | 13815/15000 [11:55<00:56, 21.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████      | 13818/15000 [11:56<00:55, 21.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████      | 13824/15000 [11:56<00:54, 21.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650] and comparison is [False  True  True False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False  True False False False False
 False False False False  True False False False False False False  True
 False False False False False  True False False False  True False False
 False False False False]
<<<<< Epsilon in training is 0.15858366391565668>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 46

 92%|██████████████████████████████████████████████████████████████████████      | 13827/15000 [11:56<01:38, 11.85it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████      | 13829/15000 [11:56<01:33, 12.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████      | 13833/15000 [11:57<01:23, 13.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████      | 13835/15000 [11:57<01:19, 14.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████      | 13840/15000 [11:57<01:26, 13.36it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████▏     | 13843/15000 [11:57<01:14, 15.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████▏     | 13849/15000 [11:58<01:01, 18.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████▏     | 13855/15000 [11:58<00:55, 20.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████▏     | 13858/15000 [11:58<00:53, 21.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████▏     | 13864/15000 [11:58<00:51, 21.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████▎     | 13870/15000 [11:59<00:50, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 92%|██████████████████████████████████████████████████████████████████████▎     | 13873/15000 [11:59<00:50, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▎     | 13879/15000 [11:59<00:49, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▎     | 13885/15000 [11:59<00:49, 22.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▎     | 13888/15000 [11:59<00:48, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▍     | 13894/15000 [12:00<00:48, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▍     | 13900/15000 [12:00<00:49, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▍     | 13906/15000 [12:00<00:48, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▍     | 13912/15000 [12:00<00:48, 22.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▌     | 13915/15000 [12:01<00:47, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▌     | 13921/15000 [12:01<00:47, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▌     | 13924/15000 [12:01<00:47, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▌     | 13930/15000 [12:01<00:47, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▌     | 13936/15000 [12:02<00:47, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▋     | 13942/15000 [12:02<00:46, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▋     | 13945/15000 [12:02<00:46, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▋     | 13951/15000 [12:02<00:46, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▋     | 13957/15000 [12:02<00:45, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▋     | 13960/15000 [12:03<00:45, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▊     | 13966/15000 [12:03<00:45, 22.75it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▊     | 13972/15000 [12:03<00:45, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▊     | 13975/15000 [12:03<00:45, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▊     | 13981/15000 [12:03<00:45, 22.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▊     | 13987/15000 [12:04<00:44, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▉     | 13993/15000 [12:04<00:44, 22.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 93%|██████████████████████████████████████████████████████████████████████▉     | 13996/15000 [12:04<00:44, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▉     | 13999/15000 [12:04<00:44, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▉     | 14005/15000 [12:06<02:09,  7.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|██████████████████████████████████████████████████████████████████████▉     | 14008/15000 [12:06<01:43,  9.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|███████████████████████████████████████████████████████████████████████     | 14014/15000 [12:06<01:12, 13.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 93%|███████████████████████████████████████████████████████████████████████     | 14020/15000 [12:06<00:57, 17.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████     | 14026/15000 [12:07<00:50, 19.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████     | 14029/15000 [12:07<00:49, 19.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████     | 14035/15000 [12:07<00:46, 20.54it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▏    | 14038/15000 [12:07<00:46, 20.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▏    | 14044/15000 [12:08<01:03, 15.12it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▏    | 14047/15000 [12:08<00:56, 16.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▏    | 14052/15000 [12:08<01:07, 14.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▏    | 14055/15000 [12:09<00:59, 15.88it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▏    | 14061/15000 [12:09<00:49, 18.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▎    | 14067/15000 [12:09<00:45, 20.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▎    | 14073/15000 [12:09<00:42, 21.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▎    | 14079/15000 [12:10<00:41, 22.04it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False  True False False False False False  True False False False
 False False False False False False False False False False False False
 False False False False False False False False False  True  True False
 False False False False False False False  True False False False False
 False  True False False False False False False  True False False False
  True False False False]
<<<<< Epsilon in training is 0.15588979223636043>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 

 94%|███████████████████████████████████████████████████████████████████████▎    | 14082/15000 [12:10<01:15, 12.10it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▍    | 14088/15000 [12:10<00:58, 15.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▍    | 14091/15000 [12:11<00:52, 17.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▍    | 14097/15000 [12:11<00:46, 19.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▍    | 14103/15000 [12:11<00:42, 20.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▍    | 14106/15000 [12:11<00:42, 21.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▌    | 14112/15000 [12:11<00:40, 21.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▌    | 14118/15000 [12:12<00:39, 22.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▌    | 14121/15000 [12:12<00:39, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▌    | 14127/15000 [12:12<00:39, 22.34it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▌    | 14133/15000 [12:12<00:38, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▋    | 14139/15000 [12:13<00:38, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▋    | 14142/15000 [12:13<00:38, 22.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▋    | 14148/15000 [12:13<00:38, 22.31it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▋    | 14151/15000 [12:13<00:38, 22.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▋    | 14157/15000 [12:14<00:37, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▊    | 14163/15000 [12:14<00:37, 22.08it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▊    | 14169/15000 [12:14<00:37, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 94%|███████████████████████████████████████████████████████████████████████▊    | 14172/15000 [12:14<00:37, 22.21it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 95%|███████████████████████████████████████████████████████████████████████▊    | 14178/15000 [12:14<00:36, 22.45it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 95%|███████████████████████████████████████████████████████████████████████▊    | 14184/15000 [12:15<00:36, 22.43it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|███████████████████████████████████████████████████████████████████████▉    | 14187/15000 [12:15<00:36, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|███████████████████████████████████████████████████████████████████████▉    | 14193/15000 [12:15<00:35, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|███████████████████████████████████████████████████████████████████████▉    | 14196/15000 [12:15<00:35, 22.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|███████████████████████████████████████████████████████████████████████▉    | 14202/15000 [12:16<00:35, 22.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 95%|███████████████████████████████████████████████████████████████████████▉    | 14208/15000 [12:16<00:35, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████    | 14211/15000 [12:16<00:35, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████    | 14217/15000 [12:16<00:34, 22.41it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████    | 14223/15000 [12:16<00:34, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 95%|████████████████████████████████████████████████████████████████████████    | 14226/15000 [12:17<00:34, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████    | 14232/15000 [12:17<00:34, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▏   | 14238/15000 [12:17<00:34, 22.18it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 95%|████████████████████████████████████████████████████████████████████████▏   | 14241/15000 [12:17<00:34, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 95%|████████████████████████████████████████████████████████████████████████▏   | 14247/15000 [12:18<00:33, 22.19it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▏   | 14253/15000 [12:18<00:33, 22.32it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▏   | 14256/15000 [12:18<00:33, 22.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▏   | 14259/15000 [12:18<00:33, 21.98it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▎   | 14264/15000 [12:19<00:49, 14.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▎   | 14267/15000 [12:19<00:43, 16.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▎   | 14272/15000 [12:19<00:39, 18.30it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▎   | 14278/15000 [12:19<00:35, 20.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 95%|████████████████████████████████████████████████████████████████████████▎   | 14284/15000 [12:19<00:33, 21.61it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▍   | 14290/15000 [12:20<00:32, 22.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▍   | 14296/15000 [12:20<00:31, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▍   | 14299/15000 [12:20<00:30, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 95%|████████████████████████████████████████████████████████████████████████▍   | 14305/15000 [12:20<00:30, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▍   | 14308/15000 [12:21<00:30, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▌   | 14314/15000 [12:21<00:31, 21.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▌   | 14317/15000 [12:21<00:31, 21.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 95%|████████████████████████████████████████████████████████████████████████▌   | 14323/15000 [12:21<00:31, 21.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▌   | 14329/15000 [12:22<00:30, 22.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▌   | 14332/15000 [12:22<00:30, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▋   | 14335/15000 [12:22<00:29, 22.20it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False  True False False False  True False False False  True
 False False False False False False False False  True False False  True
 False False False False False False False False False False False False
 False False  True  True False False False False False False  True False
 False  True False False  True False False False False False False False
 False False False False]
<<<<< Epsilon in training is 0.1532416815418045>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 465

 96%|████████████████████████████████████████████████████████████████████████▋   | 14341/15000 [12:22<00:46, 14.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▋   | 14344/15000 [12:23<00:42, 15.58it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▋   | 14350/15000 [12:23<00:35, 18.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▋   | 14356/15000 [12:23<00:31, 20.23it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▊   | 14362/15000 [12:23<00:29, 21.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 96%|████████████████████████████████████████████████████████████████████████▊   | 14365/15000 [12:24<00:29, 21.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▊   | 14371/15000 [12:24<00:28, 22.46it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▊   | 14377/15000 [12:24<00:27, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▊   | 14380/15000 [12:24<00:27, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▉   | 14386/15000 [12:24<00:26, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▉   | 14389/15000 [12:25<00:26, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▉   | 14395/15000 [12:25<00:26, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 96%|████████████████████████████████████████████████████████████████████████▉   | 14401/15000 [12:25<00:26, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|████████████████████████████████████████████████████████████████████████▉   | 14407/15000 [12:25<00:25, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 96%|█████████████████████████████████████████████████████████████████████████   | 14410/15000 [12:25<00:25, 22.87it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████   | 14416/15000 [12:26<00:25, 23.03it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████   | 14422/15000 [12:26<00:25, 22.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████   | 14425/15000 [12:26<00:25, 22.93it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████   | 14431/15000 [12:26<00:24, 22.86it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████▏  | 14434/15000 [12:27<00:24, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████▏  | 14440/15000 [12:27<00:24, 22.82it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████▏  | 14446/15000 [12:27<00:24, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 96%|█████████████████████████████████████████████████████████████████████████▏  | 14452/15000 [12:27<00:23, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████▏  | 14455/15000 [12:27<00:23, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████▎  | 14461/15000 [12:28<00:23, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████▎  | 14467/15000 [12:28<00:23, 22.90it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 96%|█████████████████████████████████████████████████████████████████████████▎  | 14473/15000 [12:28<00:23, 22.79it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▎  | 14479/15000 [12:28<00:23, 22.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▍  | 14485/15000 [12:29<00:22, 22.42it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▍  | 14491/15000 [12:29<00:22, 22.65it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▍  | 14494/15000 [12:29<00:22, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▍  | 14500/15000 [12:29<00:21, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▍  | 14503/15000 [12:30<00:21, 22.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 97%|█████████████████████████████████████████████████████████████████████████▌  | 14509/15000 [12:30<00:22, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▌  | 14515/15000 [12:30<00:21, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▌  | 14518/15000 [12:30<00:21, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▌  | 14524/15000 [12:30<00:21, 22.64it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 97%|█████████████████████████████████████████████████████████████████████████▌  | 14530/15000 [12:31<00:21, 22.05it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▋  | 14533/15000 [12:31<00:32, 14.16it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 97%|█████████████████████████████████████████████████████████████████████████▋  | 14539/15000 [12:31<00:26, 17.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▋  | 14545/15000 [12:32<00:22, 19.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 97%|█████████████████████████████████████████████████████████████████████████▋  | 14551/15000 [12:32<00:21, 21.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▋  | 14554/15000 [12:32<00:20, 21.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 97%|█████████████████████████████████████████████████████████████████████████▊  | 14560/15000 [12:32<00:19, 22.09it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▊  | 14566/15000 [12:33<00:19, 22.37it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▊  | 14569/15000 [12:33<00:19, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▊  | 14575/15000 [12:33<00:19, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▉  | 14581/15000 [12:33<00:18, 22.57it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▉  | 14584/15000 [12:33<00:18, 22.56it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▉  | 14590/15000 [12:34<00:18, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650] and comparison is [False  True False False False False False False False False False False
 False False False False False False  True False False False False False
  True False False False False False False False False False  True False
 False False False  True False False False False False False False False
 False False False False False False False False False False False  True
 False  True False False]
<<<<< Epsilon in training is 0.15063855448697272>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 

 97%|█████████████████████████████████████████████████████████████████████████▉  | 14593/15000 [12:34<00:32, 12.59it/s]

<<<<<<<<< Target Train is getting called >>>>>>>
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▉  | 14599/15000 [12:34<00:24, 16.29it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|█████████████████████████████████████████████████████████████████████████▉  | 14605/15000 [12:35<00:20, 19.10it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|██████████████████████████████████████████████████████████████████████████  | 14608/15000 [12:35<00:19, 19.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 97%|██████████████████████████████████████████████████████████████████████████  | 14614/15000 [12:35<00:18, 21.25it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|██████████████████████████████████████████████████████████████████████████  | 14620/15000 [12:35<00:17, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 97%|██████████████████████████████████████████████████████████████████████████  | 14623/15000 [12:35<00:16, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████  | 14629/15000 [12:36<00:16, 22.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▏ | 14635/15000 [12:36<00:16, 22.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▏ | 14638/15000 [12:36<00:15, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▏ | 14644/15000 [12:36<00:15, 22.94it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 98%|██████████████████████████████████████████████████████████████████████████▏ | 14650/15000 [12:37<00:15, 23.06it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▎ | 14656/15000 [12:37<00:15, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▎ | 14659/15000 [12:37<00:15, 22.62it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▎ | 14665/15000 [12:37<00:14, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▎ | 14671/15000 [12:38<00:14, 22.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▎ | 14677/15000 [12:38<00:14, 22.97it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▍ | 14680/15000 [12:38<00:13, 22.89it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▍ | 14686/15000 [12:38<00:13, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▍ | 14689/15000 [12:38<00:13, 22.80it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▍ | 14695/15000 [12:39<00:13, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▍ | 14701/15000 [12:39<00:13, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▌ | 14707/15000 [12:39<00:12, 22.76it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▌ | 14713/15000 [12:39<00:12, 22.99it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 98%|██████████████████████████████████████████████████████████████████████████▌ | 14716/15000 [12:40<00:12, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▌ | 14722/15000 [12:40<00:12, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▌ | 14725/15000 [12:40<00:12, 22.35it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 98%|██████████████████████████████████████████████████████████████████████████▋ | 14731/15000 [12:40<00:11, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▋ | 14737/15000 [12:40<00:11, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▋ | 14740/15000 [12:41<00:11, 22.60it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 98%|██████████████████████████████████████████████████████████████████████████▋ | 14746/15000 [12:41<00:11, 22.84it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▋ | 14752/15000 [12:41<00:11, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 98%|██████████████████████████████████████████████████████████████████████████▊ | 14758/15000 [12:41<00:10, 22.68it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▊ | 14761/15000 [12:42<00:10, 22.69it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▊ | 14767/15000 [12:42<00:10, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 98%|██████████████████████████████████████████████████████████████████████████▊ | 14773/15000 [12:42<00:09, 23.01it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|██████████████████████████████████████████████████████████████████████████▊ | 14776/15000 [12:42<00:09, 22.92it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|██████████████████████████████████████████████████████████████████████████▉ | 14782/15000 [12:42<00:09, 22.91it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|██████████████████████████████████████████████████████████████████████████▉ | 14788/15000 [12:43<00:09, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|██████████████████████████████████████████████████████████████████████████▉ | 14794/15000 [12:43<00:09, 22.59it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|██████████████████████████████████████████████████████████████████████████▉ | 14797/15000 [12:43<00:09, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████ | 14803/15000 [12:43<00:08, 22.50it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 99%|███████████████████████████████████████████████████████████████████████████ | 14809/15000 [12:44<00:08, 22.81it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████ | 14812/15000 [12:44<00:08, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████ | 14818/15000 [12:44<00:07, 22.95it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████ | 14821/15000 [12:44<00:08, 21.51it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 99%|███████████████████████████████████████████████████████████████████████████ | 14827/15000 [12:44<00:07, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▏| 14833/15000 [12:45<00:07, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▏| 14839/15000 [12:45<00:07, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▏| 14845/15000 [12:45<00:06, 22.66it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▏| 14848/15000 [12:45<00:06, 22.38it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650] and comparison is [False False False False  True False False False  True False False  True
  True False False  True  True False False False False False  True False
  True False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False  True False False False False False False False
 False  True False False]
<<<<< Epsilon in training is 0.14807964693166228>>>>>>>
<<<<< next_action is [4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650 4650
 4650 4650 4650 4650 4650 

 99%|███████████████████████████████████████████████████████████████████████████▏| 14851/15000 [12:46<00:11, 12.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▎| 14857/15000 [12:46<00:08, 16.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▎| 14863/15000 [12:46<00:07, 18.85it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▎| 14869/15000 [12:47<00:06, 20.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▎| 14875/15000 [12:47<00:05, 21.78it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▍| 14878/15000 [12:47<00:05, 22.15it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 99%|███████████████████████████████████████████████████████████████████████████▍| 14884/15000 [12:47<00:05, 22.48it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▍| 14887/15000 [12:47<00:05, 22.55it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▍| 14893/15000 [12:48<00:04, 22.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▍| 14899/15000 [12:48<00:04, 22.73it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▌| 14905/15000 [12:48<00:04, 22.71it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▌| 14908/15000 [12:48<00:04, 22.72it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


 99%|███████████████████████████████████████████████████████████████████████████▌| 14914/15000 [12:49<00:03, 22.83it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▌| 14917/15000 [12:49<00:03, 22.74it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


 99%|███████████████████████████████████████████████████████████████████████████▌| 14923/15000 [12:49<00:03, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▋| 14929/15000 [12:49<00:03, 22.53it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▋| 14935/15000 [12:50<00:02, 22.44it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▋| 14938/15000 [12:50<00:02, 22.47it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▋| 14944/15000 [12:50<00:02, 22.17it/s]

<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▋| 14950/15000 [12:50<00:02, 22.28it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▊| 14953/15000 [12:50<00:02, 22.07it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▊| 14959/15000 [12:51<00:01, 22.39it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▊| 14965/15000 [12:51<00:01, 21.24it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


100%|███████████████████████████████████████████████████████████████████████████▊| 14968/15000 [12:51<00:01, 21.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▊| 14974/15000 [12:51<00:01, 21.52it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


100%|███████████████████████████████████████████████████████████████████████████▉| 14977/15000 [12:52<00:01, 21.63it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▉| 14983/15000 [12:52<00:00, 21.67it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▉| 14989/15000 [12:52<00:00, 22.11it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


100%|███████████████████████████████████████████████████████████████████████████▉| 14992/15000 [12:52<00:00, 22.00it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]


100%|███████████████████████████████████████████████████████████████████████████▉| 14998/15000 [12:52<00:00, 22.26it/s]

<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [] and comparison is [False]
<<<<<<<<<<<<<< opt_policy_rand is [4650] and comparison is [ True]


100%|████████████████████████████████████████████████████████████████████████████| 15000/15000 [12:53<00:00, 19.39it/s]


In [17]:
env = grid2op.make("l2rpn_neurips_2020_track1_small", reward_class=L2RPNReward)
evaluate(env,
         name="DeepSarsa_Agent",
         load_path="./DSARSA_Agent/model",
         logs_path="./DSARSA_Agent/logs",
         nb_episode=10,
         nb_process=1,
         max_steps=-1,
         verbose=False,
         save_gif=False)

(<__main__.DeepQSimple at 0x26f1f483f40>,
 [('C:\\Users\\tejus_\\data_grid2op\\l2rpn_neurips_2020_track1_small\\chronics\\Scenario_april_000',
   'Scenario_april_000',
   321.8056640625,
   8,
   8062),
  ('C:\\Users\\tejus_\\data_grid2op\\l2rpn_neurips_2020_track1_small\\chronics\\Scenario_april_001',
   'Scenario_april_001',
   25791.228515625,
   511,
   8062),
  ('C:\\Users\\tejus_\\data_grid2op\\l2rpn_neurips_2020_track1_small\\chronics\\Scenario_april_002',
   'Scenario_april_002',
   317.3543395996094,
   8,
   8062),
  ('C:\\Users\\tejus_\\data_grid2op\\l2rpn_neurips_2020_track1_small\\chronics\\Scenario_april_003',
   'Scenario_april_003',
   25612.947265625,
   503,
   8062),
  ('C:\\Users\\tejus_\\data_grid2op\\l2rpn_neurips_2020_track1_small\\chronics\\Scenario_april_004',
   'Scenario_april_004',
   3164.750244140625,
   66,
   8062),
  ('C:\\Users\\tejus_\\data_grid2op\\l2rpn_neurips_2020_track1_small\\chronics\\Scenario_april_005',
   'Scenario_april_005',
   25956.78710

$width = \frac{n}{\gamma[N_1 + N_2]}$

$\alpha$

$10^{(-3)}$
$\epsilon$