In thia notebook we will define the environment and test it.

### 01. Define the environment
To define the environment we use the [OpenAI Gym](https://gym.openai.com/) library for because:
1. It provides a simple skeleton to define the environment.
2. Most libraries implementing DRL algorithms support *OpenAI Gym* interface as it is the case for [Stable Baselines](https://github.com/hill-a/stable-baselines) library that we will use later for training.

In summary, the environment we will define has the following characteristics:

    Description:
        An agent is presented with series of network traffic, the goal is not to miss any attack
        or the system will fail. However, the agent cannot raise an alert every time, this disturbs the admin and 
        waste his time.

    Observation:
        Type: Box(dataset_features)
        LOW: 0 and HIGH: 1 (the data is scaled between 0 and 1)

    Actions:
        Type: Discrete(2)
        Num   Action
        0     Do nothing (benign traffic)
        1     Raise an alert (attack detected)

    Reward:
        +1.0 for correctly detecting an attack
        -1.0 for missing an attack

        +0.0 for not raising an alert when it is not needed (helps the agent to focus on detecting attacks,
        deals with the problem of: number of benign traffic > number of attacks)
        -1.0 for raising an alarm although there is no attack

    Starting State:
        Start randomly at any position

    Episode Termination:
        - Missing an attack (can cause destruction of the system)
        - Episode length exceeded 1000 steps

In [1]:
# Import the necessary libraries
import gym
import pandas as pd
import numpy as np

In [2]:
# Our custom environment Named 'IDS_Env' inherits from the 'Env' class from the 'gym' library
class IDS_Env(gym.Env):
    def __init__(self, data):
        super().__init__()
        # The class takes one parameter which is the dataset (precisely the training dataset)
        self.data = data

        # Define action and observation spaces
        self.action_space = gym.spaces.Discrete(2)  # either 0 (NORMAL) or 1 (ATTACK)
        self.observation_space = gym.spaces.Box(
            low=0, high=1, shape=(data.shape[1] - 1,), dtype=np.float64
        )

        # Define reward function
        # (true_label, action) -> reward
        self.rewards = {(0, 0): 0, (0, 1): -1, (1, 1): 1, (1, 0): -1}

        # Define the maximim episode length
        self.max_steps = 1000

        # Keep track of steps that go beyond the termination of the episode
        self.steps_beyond_done = None

    def reset(self):
        # This method should be called to start a new episode

        self.steps_beyond_done = None
        self.current_step = 0

        # start from a random point in the data
        self.i = np.random.randint(0, self.data.shape[0])

        # extarct the observation
        self.obs = self.data.iloc[self.i]

        # extract the true label of the observation
        self.label = int(self.obs.pop("label"))

        # return the observation, it should be a numpy array so the '.values' is a must
        return self.obs.values

    def step(self, action):
        # This method is called after each action the agent take

        # Verify that the action took is valid (belongs to the action space, 0 or 1)
        err_msg = "%r (%s) invalid" % (action, type(action))
        assert self.action_space.contains(action), err_msg

        # Calculate if the episode should finish
        info = {}
        done = False
        self.current_step += 1
        if self.current_step >= self.max_steps:
            done = True
            # Update the info dict to signal that the limit was exceeded
            info["reason_to_stop"] = "episode_limit"

        if self.label == 1 and action == 0:
            done = True
            # Update the info dict to signal that the agent missd an attack
            info["reason_to_stop"] = "attack_missed"

        # Calculate the reward based on the true label of the observation and the action tooked
        reward = self.rewards[(self.label, action)]

        # Calculate next state if the episode is not finished yet
        if not done:
            self.i += 1
            if self.i >= self.data.shape[0]:
                self.i = 0

            self.obs = self.data.iloc[self.i]
            self.label = int(self.obs.pop("label"))

        elif self.steps_beyond_done is None:
            self.steps_beyond_done = 0
        else:
            # Warn the user of steps after the episode termination
            if self.steps_beyond_done == 0:
                gym.logger.warn(
                    "You are calling 'step()' even though this "
                    "environment has already returned done = True. You "
                    "should always call 'reset()' once you receive 'done = "
                    "True' -- any further steps are undefined behavior."
                )
            self.steps_beyond_done += 1
            reward = 0

        # Following the 'gym' interface this method return
        # 1. the new observation (old in case of episode termination),
        # 2. the reward,
        # 3. termination of episode,
        # 4. and, an empty 'dict'.
        return self.obs.values, reward, done, info

### 02. Create the environment
Now, we can create an instance of the defined class.

In [3]:
# Load the data
train_data = pd.read_csv("processed_data/train.csv")
train_data.head()

Unnamed: 0,bwd_packet_length_min,subflow_fwd_bytes,total_length_of_fwd_packets,fwd_packet_length_mean,bwd_packet_length_std,flow_iat_min,fwd_iat_min,flow_iat_mean,flow_duration,flow_iat_std,...,init_win_bytes_forward,ack_flag_count,fwd_psh_flags,syn_flag_count,fwd_packets/s,init_win_bytes_backward,bwd_packets/s,psh_flag_count,packet_length_mean,label
0,0.043095,2.8e-05,2.8e-05,0.006733,0.0,1.416667e-07,3.333333e-08,0.000173,0.000519,0.000424,...,0.0,0.0,0.0,0.0,1.1e-05,0.0,1.6e-05,0.0,0.01774,0
1,0.000979,0.000151,0.000151,0.024295,0.138037,1.583333e-07,0.0002054167,0.000126,0.000758,0.000195,...,0.125015,0.0,0.0,0.0,1.1e-05,0.458023,2.2e-05,1.0,0.101659,0
2,0.080803,4.9e-05,4.9e-05,0.005891,0.0,1.25e-07,3.333333e-08,0.000102,0.00051,0.000199,...,0.0,0.0,0.0,0.0,2.2e-05,0.0,1.6e-05,0.0,0.021618,0
3,0.065132,2.7e-05,2.7e-05,0.012961,0.0,0.000251825,8.333333e-09,0.000252,0.000252,0.0,...,0.0,0.0,0.0,0.0,1.1e-05,0.0,1.7e-05,0.0,0.028667,0
4,0.0,0.000112,0.000112,0.017955,0.265972,5.166666e-07,4.083333e-07,3.7e-05,0.000299,0.000113,...,0.445572,0.0,0.0,0.0,2.8e-05,0.003601,8.4e-05,1.0,0.357042,1


In [4]:
env = IDS_Env(train_data)

### 03. Validate the environment
*Stable Baselines* provides a helper to check that the custom environment follows the *Gym* interface. It also optionally checks that the environment is compatible with *Stable-Baselines*.

In [5]:
from stable_baselines.common.env_checker import check_env

check_env(env, warn=True)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


### 04. Test the environment

We can test the environment works correctly as we want. The code below run it for 10 episodes where:
1. In the first case actions are selected randomly.
2. Second case, agent always select the action 0 (every observation is benign) .
3. Third case, agent always select the action 1 (every observation is an attack).

In [6]:
# Random actions
all_rewards = []
all_lengths = []
for i in range(10):
    print("-" * 50)
    print(f"Episode {i+1:02}")
    print("-" * 50)

    ep_reward = 0
    obs = env.reset()
    done = False
    while not done:
        t_label = env.label
        action = env.action_space.sample()
        obs, rew, done, info = env.step(action)  # perform random action

        print(
            f"true_label: {t_label}, action: {action}, reward: {rew:>2}, done: {done}, {info}"
        )
        ep_reward += rew
    print(f"\n>>Episode reward: {ep_reward}")
    print(f">>Episode length: {env.current_step}")
    print("-" * 50)
    all_rewards.append(ep_reward)
    all_lengths.append(env.current_step)

print("\nRewards", all_rewards)
print(f"Mean Episodes Reward: {np.mean(all_rewards)}")
print("\nLengths", all_lengths)
print(f"Mean Episodes Length: {np.mean(all_lengths)}")

--------------------------------------------------
Episode 01
--------------------------------------------------
true_label: 0, action: 1, reward: -1, done: False, {}
true_label: 0, action: 1, reward: -1, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 1, action: 1, reward:  1, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 1, reward: -1, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 1, reward: -1, done: False, {}
true_label: 0, action: 1, reward: -1, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 1, reward: -1, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 1, reward: -1, done: False, {}
true_label: 0, action: 1, reward: -1, done: False, {}
true_label: 1, action: 1, reward:  1, done: False, {}
true_label: 0, action: 

In [7]:
# Always normal traffic

all_rewards = []
all_lengths = []
for i in range(10):
    print("-" * 50)
    print(f"Episode {i+1:02}")
    print("-" * 50)

    ep_reward = 0
    obs = env.reset()
    done = False
    while not done:
        t_label = env.label
        obs, rew, done, info = env.step(0)  # always normal traffic

        print(
            f"true_label: {t_label}, action: {action}, reward: {rew:>2}, done: {done}, {info}"
        )
        ep_reward += rew
    print(f"\n>>Episode reward: {ep_reward}")
    print(f">>Episode length: {env.current_step}")
    print("-" * 50)
    all_rewards.append(ep_reward)
    all_lengths.append(env.current_step)

print("\nRewards", all_rewards)
print(f"Mean Episodes Reward: {np.mean(all_rewards)}")
print("\nLengths", all_lengths)
print(f"Mean Episodes Length: {np.mean(all_lengths)}")

--------------------------------------------------
Episode 01
--------------------------------------------------
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 0, action: 0, reward:  0, done: False, {}
true_label: 1, action: 0, reward: -1, done: True, {'reason_to_stop': 'attack_missed'}

>>Episode reward: -1
>>Episode length: 7
--------------------------------------------------
--------------------------------------------------
Episode 02
--------------------------------------------------
true_label: 1, action: 0, reward: -1, done: True, {'reason_to_stop': 'attack_missed'}

>>Episode reward: -1
>>Episode length: 1
--------------------------------------------------
--------------------------------------------------
Episode 03
------------------------------

In [8]:
# Always attack

all_rewards = []
all_lengths = []
for i in range(10):
    print("-" * 50)
    print(f"Episode {i+1:02}")
    print("-" * 50)

    ep_reward = 0
    obs = env.reset()
    done = False
    while not done:
        t_label = env.label
        action = 1
        obs, rew, done, info = env.step(action)  # always attack

        print(
            f"true_label: {t_label}, action: {action}, reward: {rew:>2}, done: {done}, {info}"
        )
        ep_reward += rew
    print(f"\n>>Episode reward: {ep_reward}")
    print(f"\n>>Episode length: {env.current_step}")
    print("-" * 50)
    all_rewards.append(ep_reward)
    all_lengths.append(env.current_step)

print("\nRewards", all_rewards)
print(f"Mean Episodes Reward: {np.mean(all_rewards)}")
print("\nLengths", all_lengths)
print(f"Mean Episodes Length: {np.mean(all_lengths)}")

--------------------------------------------------
Episode 01
--------------------------------------------------
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, re

true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, re

true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 1, action: 0, reward:  1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, reward: -1, done: False, {}
true_label: 0, action: 0, re