# DQAN (Deep Q-Adverserial Nets): DQN (Deep Q-Nets) + GAN (Gen. Adv. Nets)

In this notebook, we'll combine a DQN (deep Q-net) with GAN (generative adverserial net) that can learn to play games through reinforcement learning without any reward function. We'll call this network DQAN (deep Q adverserial net). 
Adverserial nets learn to maximize the current reward based the past rewards.
Q-net learns to maximize the future rewards based on the current reward.
Given a task and known when the task is done or failed, we should be able to learn the task.

# DQN
More specifically, we'll use Q-learning to train an agent to play a game called [Cart-Pole](https://gym.openai.com/envs/CartPole-v0). In this game, a freely swinging pole is attached to a cart. The cart can move to the left and right, and the goal is to keep the pole upright as long as possible.

![Cart-Pole](assets/cart-pole.jpg)

We can simulate this game using [OpenAI Gym](https://gym.openai.com/). First, let's check out how OpenAI Gym works. Then, we'll get into training an agent to play the Cart-Pole game.

In [1]:
import gym
import tensorflow as tf
import numpy as np

>**Note:** Make sure you have OpenAI Gym cloned into the same directory with this notebook. I've included `gym` as a submodule, so you can run `git submodule --init --recursive` to pull the contents into the `gym` repo.

In [2]:
# Create the Cart-Pole game environment
env = gym.make('CartPole-v0')
# env = gym.make('CartPole-v1')
# env = gym.make('Acrobot-v1')

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m




We interact with the simulation through `env`. To show the simulation running, you can use `env.render()` to render one frame. Passing in an action as an integer to `env.step` will generate the next step in the simulation.  You can see how many actions are possible from `env.action_space` and to get a random action you can use `env.action_space.sample()`. This is general to all Gym games. In the Cart-Pole game, there are two possible actions, moving the cart left or right. So there are two actions we can take, encoded as 0 and 1.

Run the code below to watch the simulation run.

In [3]:
env.reset()
rewards, states, actions, dones = [], [], [], []
for _ in range(10):
    # env.render()
    action = env.action_space.sample()
    state, reward, done, info = env.step(action) # take a random action
    states.append(state)
    rewards.append(reward)
    actions.append(action)
    dones.append(done)
    print('state, action, reward, done, info')
    print(state, action, reward, done, info)
    if done:
        print('state, action, reward, done, info')
        print(state, action, reward, done, info)
        states.append(state)
        rewards.append(reward)
        actions.append(action)
        dones.append(done)

state, action, reward, done, info
[-0.03680108 -0.23589084 -0.01401266  0.24424619] 0 1.0 False {}
state, action, reward, done, info
[-0.0415189  -0.04057158 -0.00912774 -0.05282353] 1 1.0 False {}
state, action, reward, done, info
[-0.04233033  0.15468006 -0.01018421 -0.3483723 ] 1 1.0 False {}
state, action, reward, done, info
[-0.03923673 -0.04029557 -0.01715166 -0.05891812] 0 1.0 False {}
state, action, reward, done, info
[-0.04004264  0.15506805 -0.01833002 -0.35696279] 1 1.0 False {}
state, action, reward, done, info
[-0.03694128  0.35044574 -0.02546927 -0.65536882] 1 1.0 False {}
state, action, reward, done, info
[-0.02993236  0.54591285 -0.03857665 -0.95596155] 1 1.0 False {}
state, action, reward, done, info
[-0.0190141   0.74153182 -0.05769588 -1.26051042] 1 1.0 False {}
state, action, reward, done, info
[-0.00418347  0.93734235 -0.08290609 -1.57069067] 1 1.0 False {}
state, action, reward, done, info
[ 0.01456338  1.13335019 -0.1143199  -1.88803905] 1 1.0 False {}


To shut the window showing the simulation, use `env.close()`.

If you ran the simulation above, we can look at the rewards:

In [4]:
print(rewards[-20:])
print(np.array(rewards).shape, np.array(states).shape, np.array(actions).shape, np.array(dones).shape)
print(np.array(rewards).dtype, np.array(states).dtype, np.array(actions).dtype, np.array(dones).dtype)
print(np.max(np.array(actions)), np.min(np.array(actions)))
print((np.max(np.array(actions)) - np.min(np.array(actions)))+1)
print(np.max(np.array(rewards)), np.min(np.array(rewards)))
print(np.max(np.array(states)), np.min(np.array(states)))

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
(10,) (10, 4) (10,) (10,)
float64 float64 int64 bool
1 0
2
1.0 1.0
1.1333501883040074 -1.88803904803488


The game resets after the pole has fallen past a certain angle. For each frame while the simulation is running, it returns a reward of 1.0. The longer the game runs, the more reward we get. Then, our network's goal is to maximize the reward by keeping the pole vertical. It will do this by moving the cart to the left and the right.

## Q-Network

We train our Q-learning agent using the Bellman Equation:

$$
Q(s, a) = r + \gamma \max{Q(s', a')}
$$

where $s$ is a state, $a$ is an action, and $s'$ is the next state from state $s$ and action $a$.

Before we used this equation to learn values for a Q-_table_. However, for this game there are a huge number of states available. The state has four values: the position and velocity of the cart, and the position and velocity of the pole. These are all real-valued numbers, so ignoring floating point precisions, you practically have infinite states. Instead of using a table then, we'll replace it with a neural network that will approximate the Q-table lookup function.

<img src="assets/deep-q-learning.png" width=450px>

Now, our Q value, $Q(s, a)$ is calculated by passing in a state to the network. The output will be Q-values for each available action, with fully connected hidden layers.

<img src="assets/q-network.png" width=550px>


As I showed before, we can define our targets for training as $\hat{Q}(s,a) = r + \gamma \max{Q(s', a')}$. Then we update the weights by minimizing $(\hat{Q}(s,a) - Q(s,a))^2$. 

For this Cart-Pole game, we have four inputs, one for each value in the state, and two outputs, one for each action. To get $\hat{Q}$, we'll first choose an action, then simulate the game using that action. This will get us the next state, $s'$, and the reward. With that, we can calculate $\hat{Q}$ then pass it back into the $Q$ network to run the optimizer and update the weights.

Below is my implementation of the Q-network. I used two fully connected layers with ReLU activations. Two seems to be good enough, three might be better. Feel free to try it out.

In [5]:
def model_input(state_size):
    # States or observations as input
    states = tf.placeholder(tf.float32, [None, state_size], name='states')
    
    # Actions as output
    actions = tf.placeholder(tf.int32, [None], name='actions')

    # Target Q values for training
    targetQs = tf.placeholder(tf.float32, [None], name='targetQs')
    return states, actions, targetQs

In [6]:
# tf.layers.dense(
#     inputs, ????????????????????????
#     units, ??????????????????????
#     activation=None,
#     use_bias=True, OOOOOOOOOOOOOOOOOOOOOOOK
#     kernel_initializer=None,
#     bias_initializer=tf.zeros_initializer(), OOOOOOOOOOOOOOOK
#     kernel_regularizer=None,
#     bias_regularizer=None,
#     activity_regularizer=None,
#     kernel_constraint=None,
#     bias_constraint=None,
#     trainable=True, ??????????????????
#     name=None,
#     reuse=None
# )

In [7]:
# Q function
def generator(states, action_size, hidden_size, reuse=False, alpha=0.1): #training=True ~ batchnorm
    with tf.variable_scope('generator', reuse=reuse):
        # First fully connected layer
        h1 = tf.layers.dense(inputs=states, units=hidden_size)
        #bn1 = tf.layers.batch_normalization(h1, training=training) #training=True ~ batchnorm
        nl1 = tf.maximum(alpha * h1, h1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        #bn2 = tf.layers.batch_normalization(h2, training=training) #training=True ~ batchnorm
        nl2 = tf.maximum(alpha * h2, h2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=action_size)        
        #predictions = tf.nn.softmax(logits_actions)

        return logits

In [8]:
# This is a reward function: Rt(at) or Rt(~at)
def discriminator(actions, hidden_size, reuse=False, alpha=0.1): #training=True ~ batchnorm
    with tf.variable_scope('discriminator', reuse=reuse):
        # First fully connected layer
        h1 = tf.layers.dense(inputs=actions, units=hidden_size)
        #bn1 = tf.layers.batch_normalization(h1, training=True)
        nl1 = tf.maximum(alpha * h1, h1)
        
        # Second fully connected layer
        h2 = tf.layers.dense(inputs=nl1, units=hidden_size)
        #bn2 = tf.layers.batch_normalization(h2, training=True)
        nl2 = tf.maximum(alpha * h2, h2)
        
        # Output layer
        logits = tf.layers.dense(inputs=nl2, units=1)   
        #predictions = tf.sigmoid(logits)

        # logits for loss and reward/prob/out
        return logits

In [9]:
# Qt(St, At) = Rt(St+1, At) + max(alpha*Qt+1(St+1))
def model_loss(states, actions, action_size, hidden_size, targetQs, alpha=0.1):
    """
    Get the loss for the discriminator and generator
    :param states: real current input states or observations given
    :param actions: real actions given
    :return: A tuple of (discriminator loss, generator loss)
    """
    # The fake/generated actions
    actions_logits = generator(states=states, hidden_size=hidden_size, action_size=action_size)
    actions_fake = tf.nn.softmax(actions_logits)
    d_logits_fake = discriminator(actions=actions_fake, hidden_size=hidden_size, reuse=False)

    # The real onehot encoded actions
    actions_real = tf.one_hot(actions, action_size)
    d_logits_real = discriminator(actions=actions_real, hidden_size=hidden_size, reuse=True)

    # Training the rewarding function
    d_loss_real = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_real, labels=tf.ones_like(d_logits_real)))
    d_loss_fake = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake, labels=tf.zeros_like(d_logits_fake)))
    d_loss = d_loss_real + d_loss_fake
    
    # Train the generate to maximize the current reward 0-1
    g_loss = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake, labels=tf.ones_like(d_logits_fake)))

    # Train the generator to maximize the future rewards: Bellman equations: loss (targetQ - Q)^2
    Qs = tf.reduce_sum(tf.multiply(actions_logits, actions_real), axis=1)
    q_loss = tf.reduce_mean(tf.square(targetQs - Qs))

    # The generated rewards for Bellman equation
    rewards_fake = tf.sigmoid(d_logits_fake)
    rewards_real = tf.sigmoid(d_logits_real)

    return d_loss, g_loss, q_loss, actions_logits, Qs, rewards_fake, rewards_real

In [10]:
def model_opt(d_loss, g_loss, q_loss, learning_rate):
    """
    Get optimization operations
    :param d_loss: Discriminator/Reward loss Tensor for reward function
    :param g_loss: Generator/Q-value loss Tensor for action & next state predicton
    :param q_loss: Value loss Tensor
    :param learning_rate: Learning Rate Placeholder
    :return: A tuple of (discriminator training operation, generator training operation)
    """
    # Get weights and bias to update
    t_vars = tf.trainable_variables()
    d_vars = [var for var in t_vars if var.name.startswith('discriminator')]
    g_vars = [var for var in t_vars if var.name.startswith('generator')]

    # Optimize
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
        d_opt = tf.train.AdamOptimizer(learning_rate).minimize(d_loss, var_list=d_vars)
        g_opt = tf.train.AdamOptimizer(learning_rate).minimize(g_loss, var_list=g_vars)
        q_opt = tf.train.AdamOptimizer(learning_rate).minimize(q_loss, var_list=g_vars)

    return d_opt, g_opt, q_opt

In [11]:
class DQAN:
    def __init__(self, state_size, action_size, hidden_size, learning_rate):

        # Data of the Model: make the data available inside the framework
        self.states, self.actions, self.targetQs = model_input(state_size=state_size)

        # Create the Model: calculating the loss and forwad pass
        self.d_loss, self.g_loss, self.q_loss, self.actions_logits, self.Qs, self.rewards_fake, self.rewards_real = model_loss(
            action_size=action_size, actions=self.actions, states=self.states, hidden_size=hidden_size, 
            targetQs=self.targetQs)

        # Update the model: backward pass and backprop
        self.d_opt, self.g_opt, self.q_opt = model_opt(d_loss=self.d_loss, g_loss=self.g_loss, 
                                                       q_loss=self.q_loss, learning_rate=learning_rate)

## Experience replay

Reinforcement learning algorithms can have stability issues due to correlations between states. To reduce correlations when training, we can store the agent's experiences and later draw a random mini-batch of those experiences to train on. 

Here, we'll create a `Memory` object that will store our experiences, our transitions $<s, a, r, s'>$. This memory will have a maxmium capacity, so we can keep newer experiences in memory while getting rid of older experiences. Then, we'll sample a random mini-batch of transitions $<s, a, r, s'>$ and train on those.

Below, I've implemented a `Memory` object. If you're unfamiliar with `deque`, this is a double-ended queue. You can think of it like a tube open on both sides. You can put objects in either side of the tube. But if it's full, adding anything more will push an object out the other side. This is a great data structure to use for the memory buffer.

In [12]:
from collections import deque

class Memory():    
    def __init__(self, max_size = 1000):
        self.buffer = deque(maxlen=max_size)
    
    def add(self, experience):
        self.buffer.append(experience)
            
    def sample(self, batch_size):
        idx = np.random.choice(np.arange(len(self.buffer)), 
                               size=batch_size, 
                               replace=False)
        return [self.buffer[ii] for ii in idx]

## Exploration - Exploitation

To learn about the environment and rules of the game, the agent needs to explore by taking random actions. We'll do this by choosing a random action with some probability $\epsilon$ (epsilon).  That is, with some probability $\epsilon$ the agent will make a random action and with probability $1 - \epsilon$, the agent will choose an action from $Q(s,a)$. This is called an **$\epsilon$-greedy policy**.


At first, the agent needs to do a lot of exploring. Later when it has learned more, the agent can favor choosing actions based on what it has learned. This is called _exploitation_. We'll set it up so the agent is more likely to explore early in training, then more likely to exploit later in training.

## Q-Learning training algorithm

Putting all this together, we can list out the algorithm we'll use to train the network. We'll train the network in _episodes_. One *episode* is one simulation of the game. For this game, the goal is to keep the pole upright for 195 frames. So we can start a new episode once meeting that goal. The game ends if the pole tilts over too far, or if the cart moves too far the left or right. When a game ends, we'll start a new episode. Now, to train the agent:

* Initialize the memory $D$
* Initialize the action-value network $Q$ with random weights
* **For** episode = 1, $M$ **do**
  * **For** $t$, $T$ **do**
     * With probability $\epsilon$ select a random action $a_t$, otherwise select $a_t = \mathrm{argmax}_a Q(s,a)$
     * Execute action $a_t$ in simulator and observe reward $r_{t+1}$ and new state $s_{t+1}$
     * Store transition $<s_t, a_t, r_{t+1}, s_{t+1}>$ in memory $D$
     * Sample random mini-batch from $D$: $<s_j, a_j, r_j, s'_j>$
     * Set $\hat{Q}_j = r_j$ if the episode ends at $j+1$, otherwise set $\hat{Q}_j = r_j + \gamma \max_{a'}{Q(s'_j, a')}$
     * Make a gradient descent step with loss $(\hat{Q}_j - Q(s_j, a_j))^2$
  * **endfor**
* **endfor**

## Hyperparameters

One of the more difficult aspects of reinforcememt learning are the large number of hyperparameters. Not only are we tuning the network, but we're tuning the simulation.

In [15]:
train_episodes = 10000          # max number of episodes to learn from
max_steps = 200               # max steps in an episode
gamma = 0.99                   # future reward discount

# Exploration parameters
explore_start = 1.0            # exploration probability at start
explore_stop = 0.01            # minimum exploration probability 
decay_rate = 0.0001            # exponential decay rate for exploration prob

# Network parameters
hidden_size = 64              # number of units in each Q-network hidden layer -- simulation
state_size = 4                # number of units for the input state/observation -- simulation
action_size = 2               # number of units for the output actions -- simulation

# Memory parameters
memory_size = 10000            # memory capacity
batch_size = 10                # experience mini-batch size
learning_rate = 0.001          # learning rate for adam

In [16]:
tf.reset_default_graph()
model = DQAN(action_size=action_size, hidden_size=hidden_size, state_size=state_size, 
                 learning_rate=learning_rate)

## Populate the experience memory

Here I'm re-initializing the simulation and pre-populating the memory. The agent is taking random actions and storing the transitions in memory. This will help the agent with exploring the game.

In [17]:
# Initialize the simulation
env.reset()

# Take one random step to get the pole and cart moving
state, reward, done, _ = env.step(env.action_space.sample())

# init memory
memory = Memory(max_size=memory_size)

# Make a bunch of random actions and store the experiences
for _ in range(batch_size):
    # Uncomment the line below to watch the simulation
    # env.render()

    # Make a random action
    action = env.action_space.sample()
    next_state, reward, done, _ = env.step(action)

    if done:
        # The simulation fails so no next state
        next_state = np.zeros(state.shape)
        
        # Add experience to memory
        memory.add((state, action, reward, next_state))
        
        # Start new episode
        env.reset()
        
        # Take one random step to get the pole and cart moving
        state, reward, done, _ = env.step(env.action_space.sample())
    else:
        # Add experience to memory
        memory.add((state, action, reward, next_state))
        state = next_state

## Training

Below we'll train our agent. If you want to watch it train, uncomment the `env.render()` line. This is slow because it's rendering the frames slower than the network can train. But, it's cool to watch the agent get better at the game.

In [None]:
# Now train with experiences
saver = tf.train.Saver()

# Total rewards and losses list for plotting
rewards_list, rewards_fake_list, rewards_real_list = [], [], []
d_loss_list, g_loss_list, q_loss_list = [], [], [] 

# TF session for training
with tf.Session() as sess:
    
    # Initialize variables
    sess.run(tf.global_variables_initializer())

    # Training episodes/epochs
    step = 0
    for ep in range(train_episodes):
        
        # Env/agent steps/batches/minibatches
        total_reward, rewards_fake_mean, rewards_real_mean = 0, 0, 0
        d_loss, g_loss, q_loss = 0, 0, 0
        t = 0
        while t < max_steps:
            step += 1
            
            # Uncomment this next line to watch the training
            # env.render() 
            
            # Explore or Exploit
            explore_p = explore_stop + (explore_start - explore_stop)*np.exp(-decay_rate*step) 
            if explore_p > np.random.rand():
                # Make a random action
                action = env.action_space.sample()
            else:
                # Get action from model
                feed_dict = {model.states: state.reshape((1, *state.shape))}
                actions_logits = sess.run(model.actions_logits, feed_dict)
                action = np.argmax(actions_logits)
            
            # Take action, get new state and reward
            next_state, reward, done, _ = env.step(action)
    
            # Cumulative reward
            total_reward += reward
            
            # Episode/epoch training is done/failed!
            if done:
                # the episode ends so no next state
                next_state = np.zeros(state.shape)
                t = max_steps
                
                print('-------------------------------------------------------------------------------')
                print('Episode: {}'.format(ep),
                      'Total reward: {}'.format(total_reward),
                      'Average reward fake: {}'.format(rewards_fake_mean),
                      'Average reward real: {}'.format(rewards_real_mean),
                      'Training d_loss: {:.4f}'.format(d_loss),
                      'Training g_loss: {:.4f}'.format(g_loss),
                      'Training q_loss: {:.4f}'.format(q_loss),
                      'Explore P: {:.4f}'.format(explore_p))
                print('-------------------------------------------------------------------------------')
                
                # total rewards and losses for plotting
                rewards_list.append((ep, total_reward))
                rewards_fake_list.append((ep, rewards_fake_mean))
                d_loss_list.append((ep, d_loss))
                g_loss_list.append((ep, g_loss))
                q_loss_list.append((ep, q_loss))
                
                # Add experience to memory
                memory.add((state, action, reward, next_state))
                
                # Start new episode
                env.reset()
                
                # Take one random step to get the pole and cart moving
                state, reward, done, _ = env.step(env.action_space.sample())

            else:
                # Add experience to memory
                memory.add((state, action, reward, next_state))
                state = next_state
                t += 1
            
            # Sample mini-batch from memory
            batch = memory.sample(batch_size)
            states = np.array([each[0] for each in batch])
            actions = np.array([each[1] for each in batch])
            #rewards = np.array([each[2] for each in batch])
            next_states = np.array([each[3] for each in batch])
            
            # Current reward required for the Q-learning/ targetQs
            feed_dict = {model.states: states, model.actions: actions}
            rewards_fake, rewards_real = sess.run([model.rewards_fake, model.rewards_real], feed_dict)

            # Mean/average fake and real rewards or rewarded generated/given actions
            rewards_fake_mean = np.mean(rewards_fake.reshape(-1))
            rewards_real_mean = np.mean(rewards_real.reshape(-1))

            # Next action required for the Q-learning/ targetQs
            feed_dict={model.states: next_states}
            next_actions_logits = sess.run(model.actions_logits, feed_dict)
            
            # Set target_Qs to 0 for states where episode ends
            episode_ends = (next_states == np.zeros(states[0].shape)).all(axis=1)
            next_actions_logits[episode_ends] = (0, 0)

            # Bellman equation: Qt = Rt + max(Qt+1)
            targetQs = rewards_fake.reshape(-1) + (gamma * np.max(next_actions_logits, axis=1))

            # Updating the model
            feed_dict = {model.states: states, model.actions: actions, model.targetQs: targetQs}
            d_loss, _ = sess.run([model.d_loss, model.d_opt], feed_dict)
            g_loss, _ = sess.run([model.g_loss, model.g_opt], feed_dict)
            q_loss, _ = sess.run([model.q_loss, model.q_opt], feed_dict)
            
    # Save the trained model 
    saver.save(sess, 'checkpoints/DQAN-cartpole.ckpt')

-------------------------------------------------------------------------------
Episode: 0 Total reward: 1.0 Average reward fake: 0 Average reward real: 0 Training d_loss: 0.0000 Training g_loss: 0.0000 Training q_loss: 0.0000 Explore P: 0.9999
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1 Total reward: 17.0 Average reward fake: 0.4641086459159851 Average reward real: 0.5704373121261597 Training d_loss: 1.1868 Training g_loss: 0.7745 Training q_loss: 0.9216 Explore P: 0.9982
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2 Total reward: 14.0 Average reward fake: 0.4311087131500244 Average reward real: 0.5510165095329285 Training d_loss: 1.1607 Training g_loss: 0.8514 Training q_loss: 1.0209 Explore P: 0.9968
-----------------------------------

-------------------------------------------------------------------------------
Episode: 23 Total reward: 41.0 Average reward fake: 0.36789044737815857 Average reward real: 0.6863042116165161 Training d_loss: 0.9481 Training g_loss: 1.0748 Training q_loss: 25.0769 Explore P: 0.9531
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 24 Total reward: 12.0 Average reward fake: 0.25584185123443604 Average reward real: 0.6294175386428833 Training d_loss: 0.9650 Training g_loss: 1.3231 Training q_loss: 17.4917 Explore P: 0.9520
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 25 Total reward: 20.0 Average reward fake: 0.2634539008140564 Average reward real: 0.5673037767410278 Training d_loss: 1.0687 Training g_loss: 1.3842 Training q_loss: 10.4928 Explore P:

-------------------------------------------------------------------------------
Episode: 46 Total reward: 20.0 Average reward fake: 0.2775499224662781 Average reward real: 0.5696394443511963 Training d_loss: 1.0828 Training g_loss: 1.2275 Training q_loss: 33.7069 Explore P: 0.9055
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 47 Total reward: 12.0 Average reward fake: 0.3612041771411896 Average reward real: 0.7488159537315369 Training d_loss: 0.8434 Training g_loss: 1.0438 Training q_loss: 20.6581 Explore P: 0.9044
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 48 Total reward: 24.0 Average reward fake: 0.32060888409614563 Average reward real: 0.7341236472129822 Training d_loss: 0.8234 Training g_loss: 1.1519 Training q_loss: 52.9267 Explore P: 

-------------------------------------------------------------------------------
Episode: 69 Total reward: 13.0 Average reward fake: 0.3472665846347809 Average reward real: 0.5514259338378906 Training d_loss: 1.1435 Training g_loss: 1.0847 Training q_loss: 53.4705 Explore P: 0.8574
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 70 Total reward: 17.0 Average reward fake: 0.2990943491458893 Average reward real: 0.7222402691841125 Training d_loss: 0.8294 Training g_loss: 1.2122 Training q_loss: 61.8209 Explore P: 0.8560
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 71 Total reward: 32.0 Average reward fake: 0.3197419047355652 Average reward real: 0.6753751635551453 Training d_loss: 0.9101 Training g_loss: 1.1154 Training q_loss: 220.7040 Explore P: 

-------------------------------------------------------------------------------
Episode: 92 Total reward: 16.0 Average reward fake: 0.394722044467926 Average reward real: 0.6983250379562378 Training d_loss: 0.9643 Training g_loss: 0.9724 Training q_loss: 446.3813 Explore P: 0.8156
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 93 Total reward: 17.0 Average reward fake: 0.3209111988544464 Average reward real: 0.7988066077232361 Training d_loss: 0.7201 Training g_loss: 1.1292 Training q_loss: 79.5456 Explore P: 0.8142
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 94 Total reward: 16.0 Average reward fake: 0.40575656294822693 Average reward real: 0.526252269744873 Training d_loss: 1.2380 Training g_loss: 0.9121 Training q_loss: 733.9203 Explore P: 

-------------------------------------------------------------------------------
Episode: 115 Total reward: 16.0 Average reward fake: 0.3675175607204437 Average reward real: 0.6218525767326355 Training d_loss: 1.0550 Training g_loss: 1.0430 Training q_loss: 57.3801 Explore P: 0.7839
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 116 Total reward: 13.0 Average reward fake: 0.4314154088497162 Average reward real: 0.7768546342849731 Training d_loss: 0.8912 Training g_loss: 0.8329 Training q_loss: 72.9096 Explore P: 0.7829
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 117 Total reward: 20.0 Average reward fake: 0.3928956091403961 Average reward real: 0.641146183013916 Training d_loss: 1.0460 Training g_loss: 0.9106 Training q_loss: 44.1653 Explore P:

-------------------------------------------------------------------------------
Episode: 138 Total reward: 19.0 Average reward fake: 0.34987708926200867 Average reward real: 0.7427546381950378 Training d_loss: 0.8427 Training g_loss: 1.0540 Training q_loss: 446.2812 Explore P: 0.7520
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 139 Total reward: 14.0 Average reward fake: 0.3145255446434021 Average reward real: 0.5292319059371948 Training d_loss: 1.1591 Training g_loss: 1.1533 Training q_loss: 40.2291 Explore P: 0.7509
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 140 Total reward: 14.0 Average reward fake: 0.30120837688446045 Average reward real: 0.7222092747688293 Training d_loss: 0.8326 Training g_loss: 1.1347 Training q_loss: 110.8371 Explo

-------------------------------------------------------------------------------
Episode: 161 Total reward: 12.0 Average reward fake: 0.35564154386520386 Average reward real: 0.8088920712471008 Training d_loss: 0.7436 Training g_loss: 1.0926 Training q_loss: 205.3278 Explore P: 0.7190
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 162 Total reward: 10.0 Average reward fake: 0.34890252351760864 Average reward real: 0.6101022958755493 Training d_loss: 1.0586 Training g_loss: 1.0105 Training q_loss: 159.0507 Explore P: 0.7183
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 163 Total reward: 35.0 Average reward fake: 0.36267438530921936 Average reward real: 0.4919252395629883 Training d_loss: 1.2570 Training g_loss: 0.9965 Training q_loss: 55.3187 Expl

-------------------------------------------------------------------------------
Episode: 184 Total reward: 12.0 Average reward fake: 0.3878856599330902 Average reward real: 0.6362673044204712 Training d_loss: 1.0501 Training g_loss: 0.9969 Training q_loss: 361.5169 Explore P: 0.6907
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 185 Total reward: 8.0 Average reward fake: 0.26545771956443787 Average reward real: 0.6347430944442749 Training d_loss: 0.9641 Training g_loss: 1.3383 Training q_loss: 114.8149 Explore P: 0.6901
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 186 Total reward: 25.0 Average reward fake: 0.3360412120819092 Average reward real: 0.6748534440994263 Training d_loss: 0.9350 Training g_loss: 1.0778 Training q_loss: 86.4397 Explore

-------------------------------------------------------------------------------
Episode: 207 Total reward: 17.0 Average reward fake: 0.35714930295944214 Average reward real: 0.679562509059906 Training d_loss: 0.9539 Training g_loss: 1.0882 Training q_loss: 593.7682 Explore P: 0.6628
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 208 Total reward: 11.0 Average reward fake: 0.41835516691207886 Average reward real: 0.6531885862350464 Training d_loss: 1.0596 Training g_loss: 0.8642 Training q_loss: 63.7706 Explore P: 0.6620
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 209 Total reward: 18.0 Average reward fake: 0.3461897671222687 Average reward real: 0.8094808459281921 Training d_loss: 0.7283 Training g_loss: 1.0781 Training q_loss: 86.1937 Explore

-------------------------------------------------------------------------------
Episode: 231 Total reward: 11.0 Average reward fake: 0.3846762478351593 Average reward real: 0.6408072710037231 Training d_loss: 1.0339 Training g_loss: 0.9602 Training q_loss: 62.4730 Explore P: 0.6383
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 232 Total reward: 21.0 Average reward fake: 0.33885735273361206 Average reward real: 0.6706951856613159 Training d_loss: 0.9511 Training g_loss: 1.0678 Training q_loss: 526.0009 Explore P: 0.6370
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 233 Total reward: 11.0 Average reward fake: 0.39184442162513733 Average reward real: 0.6393100023269653 Training d_loss: 1.0489 Training g_loss: 0.9594 Training q_loss: 71.9362 Explor

-------------------------------------------------------------------------------
Episode: 255 Total reward: 15.0 Average reward fake: 0.36147022247314453 Average reward real: 0.5566037893295288 Training d_loss: 1.1511 Training g_loss: 0.9661 Training q_loss: 290.0165 Explore P: 0.6148
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 256 Total reward: 9.0 Average reward fake: 0.4159548878669739 Average reward real: 0.7765793204307556 Training d_loss: 0.8699 Training g_loss: 0.9701 Training q_loss: 389.7115 Explore P: 0.6143
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 257 Total reward: 11.0 Average reward fake: 0.35326534509658813 Average reward real: 0.6145554780960083 Training d_loss: 1.0528 Training g_loss: 0.9948 Training q_loss: 461.3981 Explo

-------------------------------------------------------------------------------
Episode: 278 Total reward: 22.0 Average reward fake: 0.2993714213371277 Average reward real: 0.6783461570739746 Training d_loss: 0.8685 Training g_loss: 1.7357 Training q_loss: 71.5266 Explore P: 0.5896
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 279 Total reward: 21.0 Average reward fake: 0.34396615624427795 Average reward real: 0.7582821846008301 Training d_loss: 0.7920 Training g_loss: 1.0965 Training q_loss: 84.9986 Explore P: 0.5884
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 280 Total reward: 18.0 Average reward fake: 0.43867596983909607 Average reward real: 0.5145369172096252 Training d_loss: 1.2681 Training g_loss: 0.8472 Training q_loss: 44.9129 Explore

-------------------------------------------------------------------------------
Episode: 302 Total reward: 15.0 Average reward fake: 0.34415555000305176 Average reward real: 0.625442385673523 Training d_loss: 0.9830 Training g_loss: 1.2141 Training q_loss: 60.0915 Explore P: 0.5698
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 303 Total reward: 17.0 Average reward fake: 0.27517086267471313 Average reward real: 0.6498547196388245 Training d_loss: 0.8308 Training g_loss: 1.6968 Training q_loss: 136.3357 Explore P: 0.5689
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 304 Total reward: 26.0 Average reward fake: 0.4386724531650543 Average reward real: 0.5392141938209534 Training d_loss: 1.4701 Training g_loss: 1.2025 Training q_loss: 41.2381 Explore

-------------------------------------------------------------------------------
Episode: 325 Total reward: 15.0 Average reward fake: 0.3963608145713806 Average reward real: 0.7213808298110962 Training d_loss: 1.0190 Training g_loss: 1.1796 Training q_loss: 148.6159 Explore P: 0.5526
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 326 Total reward: 58.0 Average reward fake: 0.24938984215259552 Average reward real: 0.6515247821807861 Training d_loss: 0.7884 Training g_loss: 2.0626 Training q_loss: 32.7672 Explore P: 0.5495
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 327 Total reward: 16.0 Average reward fake: 0.3158878982067108 Average reward real: 0.6007620096206665 Training d_loss: 0.9829 Training g_loss: 1.2868 Training q_loss: 75.4059 Explore

-------------------------------------------------------------------------------
Episode: 348 Total reward: 17.0 Average reward fake: 0.34460026025772095 Average reward real: 0.702803909778595 Training d_loss: 0.9384 Training g_loss: 1.1618 Training q_loss: 65.7124 Explore P: 0.5319
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 349 Total reward: 20.0 Average reward fake: 0.42689943313598633 Average reward real: 0.6720629930496216 Training d_loss: 1.0662 Training g_loss: 1.4268 Training q_loss: 60.9785 Explore P: 0.5309
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 350 Total reward: 49.0 Average reward fake: 0.2366763800382614 Average reward real: 0.6902742981910706 Training d_loss: 0.7944 Training g_loss: 1.4886 Training q_loss: 97.6126 Explore 

-------------------------------------------------------------------------------
Episode: 371 Total reward: 33.0 Average reward fake: 0.34479591250419617 Average reward real: 0.6482375264167786 Training d_loss: 1.0862 Training g_loss: 1.1013 Training q_loss: 18.1982 Explore P: 0.5044
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 372 Total reward: 15.0 Average reward fake: 0.4583553373813629 Average reward real: 0.7259077429771423 Training d_loss: 1.0047 Training g_loss: 0.7980 Training q_loss: 58.6095 Explore P: 0.5037
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 373 Total reward: 19.0 Average reward fake: 0.3837025463581085 Average reward real: 0.7518876194953918 Training d_loss: 0.8990 Training g_loss: 1.0183 Training q_loss: 28.7251 Explore 

-------------------------------------------------------------------------------
Episode: 394 Total reward: 35.0 Average reward fake: 0.3303726613521576 Average reward real: 0.6222724914550781 Training d_loss: 0.9760 Training g_loss: 1.1570 Training q_loss: 48.7235 Explore P: 0.4834
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 395 Total reward: 18.0 Average reward fake: 0.3234071135520935 Average reward real: 0.6677039861679077 Training d_loss: 1.0702 Training g_loss: 2.3474 Training q_loss: 57.3991 Explore P: 0.4825
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 396 Total reward: 29.0 Average reward fake: 0.3796181082725525 Average reward real: 0.5836641788482666 Training d_loss: 1.0730 Training g_loss: 1.0485 Training q_loss: 76.9933 Explore P

-------------------------------------------------------------------------------
Episode: 417 Total reward: 80.0 Average reward fake: 0.4499521851539612 Average reward real: 0.5620868802070618 Training d_loss: 1.3702 Training g_loss: 1.8546 Training q_loss: 23.3915 Explore P: 0.4477
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 418 Total reward: 68.0 Average reward fake: 0.4865254759788513 Average reward real: 0.5397276878356934 Training d_loss: 1.3053 Training g_loss: 0.7171 Training q_loss: 54.3835 Explore P: 0.4447
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 419 Total reward: 45.0 Average reward fake: 0.30484503507614136 Average reward real: 0.6897244453430176 Training d_loss: 0.8189 Training g_loss: 1.9053 Training q_loss: 131.8607 Explore

-------------------------------------------------------------------------------
Episode: 440 Total reward: 28.0 Average reward fake: 0.46452078223228455 Average reward real: 0.4649282991886139 Training d_loss: 1.6156 Training g_loss: 0.8576 Training q_loss: 36.5260 Explore P: 0.3878
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 441 Total reward: 91.0 Average reward fake: 0.5253093838691711 Average reward real: 0.5222196578979492 Training d_loss: 1.4360 Training g_loss: 0.6919 Training q_loss: 70.8770 Explore P: 0.3844
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 442 Total reward: 198.0 Average reward fake: 0.4674679636955261 Average reward real: 0.5875921249389648 Training d_loss: 1.2215 Training g_loss: 1.1060 Training q_loss: 37.2094 Explore

-------------------------------------------------------------------------------
Episode: 463 Total reward: 127.0 Average reward fake: 0.4891902506351471 Average reward real: 0.5515397191047668 Training d_loss: 1.3044 Training g_loss: 1.3079 Training q_loss: 1264.9945 Explore P: 0.2832
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 464 Total reward: 109.0 Average reward fake: 0.5135067701339722 Average reward real: 0.5582572221755981 Training d_loss: 1.3438 Training g_loss: 0.8703 Training q_loss: 458.8685 Explore P: 0.2802
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 465 Total reward: 94.0 Average reward fake: 0.4809810519218445 Average reward real: 0.49269595742225647 Training d_loss: 1.3658 Training g_loss: 0.7302 Training q_loss: 57.7880 Exp

-------------------------------------------------------------------------------
Episode: 486 Total reward: 199.0 Average reward fake: 0.4596712589263916 Average reward real: 0.5772811770439148 Training d_loss: 1.2935 Training g_loss: 1.6866 Training q_loss: 42.2282 Explore P: 0.2092
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 487 Total reward: 164.0 Average reward fake: 0.5748203992843628 Average reward real: 0.5324152708053589 Training d_loss: 1.5596 Training g_loss: 0.5774 Training q_loss: 41.7616 Explore P: 0.2060
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 488 Total reward: 199.0 Average reward fake: 0.48255500197410583 Average reward real: 0.5051563382148743 Training d_loss: 1.3670 Training g_loss: 0.7445 Training q_loss: 83.3174 Explo

-------------------------------------------------------------------------------
Episode: 509 Total reward: 158.0 Average reward fake: 0.3728906512260437 Average reward real: 0.551246166229248 Training d_loss: 1.1321 Training g_loss: 2.1376 Training q_loss: 43.0386 Explore P: 0.1421
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 510 Total reward: 150.0 Average reward fake: 0.5318707227706909 Average reward real: 0.5915141105651855 Training d_loss: 1.3139 Training g_loss: 0.6560 Training q_loss: 45.3447 Explore P: 0.1401
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 511 Total reward: 199.0 Average reward fake: 0.3725840747356415 Average reward real: 0.5066894888877869 Training d_loss: 1.1933 Training g_loss: 2.0866 Training q_loss: 25.5552 Explore

-------------------------------------------------------------------------------
Episode: 532 Total reward: 199.0 Average reward fake: 0.38255301117897034 Average reward real: 0.5493301749229431 Training d_loss: 1.1757 Training g_loss: 2.6324 Training q_loss: 962.8826 Explore P: 0.0961
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 533 Total reward: 199.0 Average reward fake: 0.4161360263824463 Average reward real: 0.4822154939174652 Training d_loss: 1.3205 Training g_loss: 0.9239 Training q_loss: 27.4079 Explore P: 0.0944
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 534 Total reward: 199.0 Average reward fake: 0.5765479803085327 Average reward real: 0.6171889305114746 Training d_loss: 1.3595 Training g_loss: 0.5424 Training q_loss: 32.0731 Expl

-------------------------------------------------------------------------------
Episode: 555 Total reward: 141.0 Average reward fake: 0.4681721329689026 Average reward real: 0.5924431681632996 Training d_loss: 1.2445 Training g_loss: 1.7941 Training q_loss: 48.5923 Explore P: 0.0667
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 556 Total reward: 199.0 Average reward fake: 0.38557958602905273 Average reward real: 0.5781499743461609 Training d_loss: 1.1218 Training g_loss: 2.3136 Training q_loss: 13.6591 Explore P: 0.0656
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 557 Total reward: 138.0 Average reward fake: 0.28731244802474976 Average reward real: 0.6600494384765625 Training d_loss: 0.8383 Training g_loss: 2.9242 Training q_loss: 42.7921 Expl

-------------------------------------------------------------------------------
Episode: 578 Total reward: 199.0 Average reward fake: 0.42986607551574707 Average reward real: 0.5451110601425171 Training d_loss: 1.2255 Training g_loss: 1.8657 Training q_loss: 6.1906 Explore P: 0.0469
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 579 Total reward: 154.0 Average reward fake: 0.48505109548568726 Average reward real: 0.6571728587150574 Training d_loss: 1.1772 Training g_loss: 1.1743 Training q_loss: 98.7474 Explore P: 0.0463
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 580 Total reward: 171.0 Average reward fake: 0.3836847245693207 Average reward real: 0.572982668876648 Training d_loss: 1.1037 Training g_loss: 2.8165 Training q_loss: 10.8873 Explor

-------------------------------------------------------------------------------
Episode: 601 Total reward: 141.0 Average reward fake: 0.42773646116256714 Average reward real: 0.586247444152832 Training d_loss: 1.1593 Training g_loss: 2.3148 Training q_loss: 13.5873 Explore P: 0.0350
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 602 Total reward: 182.0 Average reward fake: 0.35675716400146484 Average reward real: 0.5526841878890991 Training d_loss: 1.0963 Training g_loss: 2.6790 Training q_loss: 16.0142 Explore P: 0.0345
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 603 Total reward: 166.0 Average reward fake: 0.53435218334198 Average reward real: 0.6365912556648254 Training d_loss: 1.2805 Training g_loss: 1.7355 Training q_loss: 20.6422 Explore

-------------------------------------------------------------------------------
Episode: 624 Total reward: 199.0 Average reward fake: 0.4827226996421814 Average reward real: 0.5104066729545593 Training d_loss: 1.3432 Training g_loss: 0.7483 Training q_loss: 1.1603 Explore P: 0.0267
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 625 Total reward: 199.0 Average reward fake: 0.40958961844444275 Average reward real: 0.5476614832878113 Training d_loss: 1.1628 Training g_loss: 0.9826 Training q_loss: 4.9290 Explore P: 0.0264
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 626 Total reward: 199.0 Average reward fake: 0.5012219548225403 Average reward real: 0.5518411993980408 Training d_loss: 1.3364 Training g_loss: 1.8558 Training q_loss: 2.8792 Explore 

-------------------------------------------------------------------------------
Episode: 647 Total reward: 199.0 Average reward fake: 0.5693159699440002 Average reward real: 0.5763232111930847 Training d_loss: 1.4096 Training g_loss: 0.5605 Training q_loss: 4.6493 Explore P: 0.0208
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 648 Total reward: 174.0 Average reward fake: 0.4591895043849945 Average reward real: 0.5108132362365723 Training d_loss: 1.3185 Training g_loss: 1.6094 Training q_loss: 14.1754 Explore P: 0.0206
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 649 Total reward: 143.0 Average reward fake: 0.49909907579421997 Average reward real: 0.6019675731658936 Training d_loss: 1.2617 Training g_loss: 2.4365 Training q_loss: 8.1512 Explore

-------------------------------------------------------------------------------
Episode: 670 Total reward: 118.0 Average reward fake: 0.5133056044578552 Average reward real: 0.5856447815895081 Training d_loss: 1.3141 Training g_loss: 2.6017 Training q_loss: 3.9509 Explore P: 0.0177
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 671 Total reward: 123.0 Average reward fake: 0.3875727355480194 Average reward real: 0.5558308362960815 Training d_loss: 1.1399 Training g_loss: 3.1599 Training q_loss: 5.4207 Explore P: 0.0176
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 672 Total reward: 118.0 Average reward fake: 0.43311262130737305 Average reward real: 0.45328307151794434 Training d_loss: 1.3625 Training g_loss: 0.8055 Training q_loss: 4.2324 Explore

-------------------------------------------------------------------------------
Episode: 693 Total reward: 56.0 Average reward fake: 0.47315534949302673 Average reward real: 0.5719746351242065 Training d_loss: 1.2385 Training g_loss: 0.9101 Training q_loss: 9.5272 Explore P: 0.0160
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 694 Total reward: 56.0 Average reward fake: 0.5084160566329956 Average reward real: 0.6815405488014221 Training d_loss: 1.2176 Training g_loss: 2.6580 Training q_loss: 10.3765 Explore P: 0.0159
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 695 Total reward: 67.0 Average reward fake: 0.3553377091884613 Average reward real: 0.583062469959259 Training d_loss: 1.0300 Training g_loss: 3.5315 Training q_loss: 2.9167 Explore P: 

-------------------------------------------------------------------------------
Episode: 716 Total reward: 47.0 Average reward fake: 0.36295467615127563 Average reward real: 0.47773200273513794 Training d_loss: 1.2305 Training g_loss: 3.3234 Training q_loss: 5.5830 Explore P: 0.0152
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 717 Total reward: 69.0 Average reward fake: 0.35534006357192993 Average reward real: 0.650126039981842 Training d_loss: 0.9719 Training g_loss: 1.9721 Training q_loss: 8.7397 Explore P: 0.0152
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 718 Total reward: 69.0 Average reward fake: 0.33499568700790405 Average reward real: 0.5036360621452332 Training d_loss: 1.1431 Training g_loss: 6.4826 Training q_loss: 15.7060 Explore 

-------------------------------------------------------------------------------
Episode: 739 Total reward: 83.0 Average reward fake: 0.41220933198928833 Average reward real: 0.5419451594352722 Training d_loss: 1.2093 Training g_loss: 5.6382 Training q_loss: 4.4681 Explore P: 0.0145
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 740 Total reward: 53.0 Average reward fake: 0.4632848799228668 Average reward real: 0.5469305515289307 Training d_loss: 1.2515 Training g_loss: 0.8301 Training q_loss: 4.8198 Explore P: 0.0145
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 741 Total reward: 76.0 Average reward fake: 0.5131424069404602 Average reward real: 0.5295937657356262 Training d_loss: 1.3606 Training g_loss: 0.6753 Training q_loss: 9.3737 Explore P: 

-------------------------------------------------------------------------------
Episode: 762 Total reward: 104.0 Average reward fake: 0.3779144585132599 Average reward real: 0.5443512201309204 Training d_loss: 1.1522 Training g_loss: 5.9892 Training q_loss: 5.1775 Explore P: 0.0136
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 763 Total reward: 73.0 Average reward fake: 0.40312376618385315 Average reward real: 0.5195084810256958 Training d_loss: 1.2340 Training g_loss: 4.4184 Training q_loss: 3.2647 Explore P: 0.0136
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 764 Total reward: 114.0 Average reward fake: 0.4782978892326355 Average reward real: 0.5340548753738403 Training d_loss: 1.3101 Training g_loss: 2.9590 Training q_loss: 3.0926 Explore P

-------------------------------------------------------------------------------
Episode: 785 Total reward: 100.0 Average reward fake: 0.4487229287624359 Average reward real: 0.5521303415298462 Training d_loss: 1.2569 Training g_loss: 5.7387 Training q_loss: 2.8722 Explore P: 0.0129
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 786 Total reward: 104.0 Average reward fake: 0.4726448059082031 Average reward real: 0.5717199444770813 Training d_loss: 1.2356 Training g_loss: 0.8742 Training q_loss: 6.9306 Explore P: 0.0128
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 787 Total reward: 118.0 Average reward fake: 0.4169798493385315 Average reward real: 0.5388566851615906 Training d_loss: 1.2099 Training g_loss: 2.1532 Training q_loss: 5.9050 Explore P

-------------------------------------------------------------------------------
Episode: 808 Total reward: 83.0 Average reward fake: 0.4073966145515442 Average reward real: 0.5199915170669556 Training d_loss: 1.2486 Training g_loss: 4.2769 Training q_loss: 5.7218 Explore P: 0.0122
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 809 Total reward: 105.0 Average reward fake: 0.48047757148742676 Average reward real: 0.5205802321434021 Training d_loss: 1.3230 Training g_loss: 0.7188 Training q_loss: 0.8193 Explore P: 0.0122
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 810 Total reward: 71.0 Average reward fake: 0.46307873725891113 Average reward real: 0.5446059107780457 Training d_loss: 1.2688 Training g_loss: 0.9934 Training q_loss: 3.3199 Explore P

-------------------------------------------------------------------------------
Episode: 831 Total reward: 109.0 Average reward fake: 0.3320329487323761 Average reward real: 0.48260369896888733 Training d_loss: 1.1649 Training g_loss: 2.7201 Training q_loss: 7.5833 Explore P: 0.0117
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 832 Total reward: 127.0 Average reward fake: 0.47683897614479065 Average reward real: 0.5246018171310425 Training d_loss: 1.3246 Training g_loss: 0.9779 Training q_loss: 2.0557 Explore P: 0.0117
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 833 Total reward: 123.0 Average reward fake: 0.5046709179878235 Average reward real: 0.5055063366889954 Training d_loss: 1.3875 Training g_loss: 0.6836 Training q_loss: 1.6024 Explore

-------------------------------------------------------------------------------
Episode: 854 Total reward: 107.0 Average reward fake: 0.5501841902732849 Average reward real: 0.5550117492675781 Training d_loss: 1.3908 Training g_loss: 0.6419 Training q_loss: 1.4498 Explore P: 0.0113
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 855 Total reward: 106.0 Average reward fake: 0.5561190247535706 Average reward real: 0.5700196027755737 Training d_loss: 1.3758 Training g_loss: 0.5969 Training q_loss: 5.4301 Explore P: 0.0113
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 856 Total reward: 98.0 Average reward fake: 0.5254127383232117 Average reward real: 0.5525498390197754 Training d_loss: 1.3525 Training g_loss: 0.6777 Training q_loss: 4.6857 Explore P:

-------------------------------------------------------------------------------
Episode: 877 Total reward: 110.0 Average reward fake: 0.46237674355506897 Average reward real: 0.5258973240852356 Training d_loss: 1.2795 Training g_loss: 0.7911 Training q_loss: 5.1976 Explore P: 0.0110
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 878 Total reward: 119.0 Average reward fake: 0.45443350076675415 Average reward real: 0.4682832360267639 Training d_loss: 1.4201 Training g_loss: 0.7835 Training q_loss: 3.7330 Explore P: 0.0110
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 879 Total reward: 137.0 Average reward fake: 0.5163917541503906 Average reward real: 0.5388663411140442 Training d_loss: 1.3511 Training g_loss: 0.6942 Training q_loss: 6.6137 Explore

-------------------------------------------------------------------------------
Episode: 900 Total reward: 132.0 Average reward fake: 0.47836628556251526 Average reward real: 0.5300610661506653 Training d_loss: 1.3214 Training g_loss: 0.8667 Training q_loss: 2.2686 Explore P: 0.0108
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 901 Total reward: 140.0 Average reward fake: 0.4810400903224945 Average reward real: 0.535477876663208 Training d_loss: 1.2973 Training g_loss: 0.7694 Training q_loss: 1.4065 Explore P: 0.0107
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 902 Total reward: 109.0 Average reward fake: 0.45199233293533325 Average reward real: 0.5074068903923035 Training d_loss: 1.3065 Training g_loss: 1.0633 Training q_loss: 2.2185 Explore 

-------------------------------------------------------------------------------
Episode: 923 Total reward: 129.0 Average reward fake: 0.5391222238540649 Average reward real: 0.541340172290802 Training d_loss: 1.4351 Training g_loss: 0.6554 Training q_loss: 8.2504 Explore P: 0.0106
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 924 Total reward: 132.0 Average reward fake: 0.5167191028594971 Average reward real: 0.5203105211257935 Training d_loss: 1.3807 Training g_loss: 0.6858 Training q_loss: 5.4211 Explore P: 0.0106
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 925 Total reward: 122.0 Average reward fake: 0.48729997873306274 Average reward real: 0.5381758809089661 Training d_loss: 1.3269 Training g_loss: 3.5532 Training q_loss: 0.8465 Explore P

-------------------------------------------------------------------------------
Episode: 946 Total reward: 127.0 Average reward fake: 0.5250502228736877 Average reward real: 0.5535750985145569 Training d_loss: 1.3468 Training g_loss: 0.6689 Training q_loss: 3.1011 Explore P: 0.0104
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 947 Total reward: 112.0 Average reward fake: 0.4712773263454437 Average reward real: 0.48946529626846313 Training d_loss: 1.3705 Training g_loss: 0.7548 Training q_loss: 16.8218 Explore P: 0.0104
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 948 Total reward: 142.0 Average reward fake: 0.4741992950439453 Average reward real: 0.576607346534729 Training d_loss: 1.2321 Training g_loss: 3.6072 Training q_loss: 7.2632 Explore 

-------------------------------------------------------------------------------
Episode: 969 Total reward: 119.0 Average reward fake: 0.40552282333374023 Average reward real: 0.518481969833374 Training d_loss: 1.2254 Training g_loss: 4.3020 Training q_loss: 1.5552 Explore P: 0.0103
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 970 Total reward: 107.0 Average reward fake: 0.5110208988189697 Average reward real: 0.5789395570755005 Training d_loss: 1.2915 Training g_loss: 0.7661 Training q_loss: 7.6241 Explore P: 0.0103
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 971 Total reward: 168.0 Average reward fake: 0.38213875889778137 Average reward real: 0.4634262025356293 Training d_loss: 1.3129 Training g_loss: 5.1219 Training q_loss: 16.1195 Explore

-------------------------------------------------------------------------------
Episode: 992 Total reward: 125.0 Average reward fake: 0.5127637982368469 Average reward real: 0.5159759521484375 Training d_loss: 1.3931 Training g_loss: 0.7052 Training q_loss: 3.4368 Explore P: 0.0102
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 993 Total reward: 115.0 Average reward fake: 0.5228455662727356 Average reward real: 0.5273541212081909 Training d_loss: 1.3935 Training g_loss: 0.6723 Training q_loss: 0.5498 Explore P: 0.0102
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 994 Total reward: 122.0 Average reward fake: 0.5505579710006714 Average reward real: 0.5511242151260376 Training d_loss: 1.4151 Training g_loss: 0.6268 Training q_loss: 1.6813 Explore P

-------------------------------------------------------------------------------
Episode: 1015 Total reward: 143.0 Average reward fake: 0.5709677934646606 Average reward real: 0.5740657448768616 Training d_loss: 1.4031 Training g_loss: 0.6276 Training q_loss: 0.9915 Explore P: 0.0102
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1016 Total reward: 115.0 Average reward fake: 0.526197075843811 Average reward real: 0.5481255054473877 Training d_loss: 1.3572 Training g_loss: 0.6743 Training q_loss: 1.0508 Explore P: 0.0102
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1017 Total reward: 135.0 Average reward fake: 0.5255584120750427 Average reward real: 0.5328856706619263 Training d_loss: 1.3757 Training g_loss: 0.7098 Training q_loss: 12.2326 Explor

-------------------------------------------------------------------------------
Episode: 1038 Total reward: 124.0 Average reward fake: 0.313409686088562 Average reward real: 0.5092520713806152 Training d_loss: 1.1764 Training g_loss: 5.8428 Training q_loss: 2.1852 Explore P: 0.0101
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1039 Total reward: 110.0 Average reward fake: 0.5257359743118286 Average reward real: 0.5349414944648743 Training d_loss: 1.3744 Training g_loss: 0.6627 Training q_loss: 5.4308 Explore P: 0.0101
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1040 Total reward: 103.0 Average reward fake: 0.4077078402042389 Average reward real: 0.5018383860588074 Training d_loss: 1.2434 Training g_loss: 1.7433 Training q_loss: 1.7390 Explore

-------------------------------------------------------------------------------
Episode: 1061 Total reward: 187.0 Average reward fake: 0.5519918203353882 Average reward real: 0.5548914670944214 Training d_loss: 1.3958 Training g_loss: 0.6707 Training q_loss: 4.0472 Explore P: 0.0101
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1062 Total reward: 114.0 Average reward fake: 0.40525054931640625 Average reward real: 0.49022960662841797 Training d_loss: 1.2658 Training g_loss: 1.4152 Training q_loss: 2.7098 Explore P: 0.0101
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1063 Total reward: 129.0 Average reward fake: 0.5117546916007996 Average reward real: 0.5166717767715454 Training d_loss: 1.3808 Training g_loss: 0.6669 Training q_loss: 1.6752 Expl

-------------------------------------------------------------------------------
Episode: 1084 Total reward: 118.0 Average reward fake: 0.5198737382888794 Average reward real: 0.5888834595680237 Training d_loss: 1.3158 Training g_loss: 0.8894 Training q_loss: 20.6596 Explore P: 0.0101
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1085 Total reward: 141.0 Average reward fake: 0.5320419073104858 Average reward real: 0.5667861104011536 Training d_loss: 1.3524 Training g_loss: 0.7135 Training q_loss: 3.2256 Explore P: 0.0101
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1086 Total reward: 115.0 Average reward fake: 0.4984639286994934 Average reward real: 0.5679391622543335 Training d_loss: 1.3061 Training g_loss: 3.5504 Training q_loss: 6.9558 Explo

-------------------------------------------------------------------------------
Episode: 1107 Total reward: 116.0 Average reward fake: 0.4090577960014343 Average reward real: 0.5800261497497559 Training d_loss: 1.1444 Training g_loss: 5.0996 Training q_loss: 4.9148 Explore P: 0.0101
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1108 Total reward: 113.0 Average reward fake: 0.4066714346408844 Average reward real: 0.49640440940856934 Training d_loss: 1.2632 Training g_loss: 1.3917 Training q_loss: 10.5108 Explore P: 0.0101
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1109 Total reward: 130.0 Average reward fake: 0.46187782287597656 Average reward real: 0.5295621156692505 Training d_loss: 1.2876 Training g_loss: 3.5897 Training q_loss: 10.9645 Ex

-------------------------------------------------------------------------------
Episode: 1130 Total reward: 116.0 Average reward fake: 0.4054572582244873 Average reward real: 0.500952959060669 Training d_loss: 1.2391 Training g_loss: 3.3773 Training q_loss: 1.4173 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1131 Total reward: 112.0 Average reward fake: 0.44157034158706665 Average reward real: 0.47537001967430115 Training d_loss: 1.3350 Training g_loss: 0.8657 Training q_loss: 3.7622 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1132 Total reward: 117.0 Average reward fake: 0.47888270020484924 Average reward real: 0.4821871221065521 Training d_loss: 1.4358 Training g_loss: 0.7394 Training q_loss: 5.4951 Expl

-------------------------------------------------------------------------------
Episode: 1153 Total reward: 129.0 Average reward fake: 0.42903319001197815 Average reward real: 0.508276104927063 Training d_loss: 1.2644 Training g_loss: 3.4734 Training q_loss: 11.6085 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1154 Total reward: 109.0 Average reward fake: 0.5061822533607483 Average reward real: 0.5015473961830139 Training d_loss: 1.4131 Training g_loss: 0.6925 Training q_loss: 36.0301 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1155 Total reward: 149.0 Average reward fake: 0.42743340134620667 Average reward real: 0.4909296929836273 Training d_loss: 1.3159 Training g_loss: 1.3587 Training q_loss: 13.7058 Ex

-------------------------------------------------------------------------------
Episode: 1176 Total reward: 102.0 Average reward fake: 0.4554021954536438 Average reward real: 0.49317365884780884 Training d_loss: 1.3730 Training g_loss: 0.9750 Training q_loss: 5.9028 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1177 Total reward: 115.0 Average reward fake: 0.4856034219264984 Average reward real: 0.5163180828094482 Training d_loss: 1.3355 Training g_loss: 0.7710 Training q_loss: 1.5978 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1178 Total reward: 120.0 Average reward fake: 0.5384114980697632 Average reward real: 0.5386303663253784 Training d_loss: 1.3924 Training g_loss: 0.6496 Training q_loss: 0.7622 Explo

-------------------------------------------------------------------------------
Episode: 1199 Total reward: 108.0 Average reward fake: 0.44035762548446655 Average reward real: 0.49235406517982483 Training d_loss: 1.3159 Training g_loss: 1.7677 Training q_loss: 1.4138 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1200 Total reward: 106.0 Average reward fake: 0.5470291972160339 Average reward real: 0.5494639873504639 Training d_loss: 1.3946 Training g_loss: 0.6155 Training q_loss: 4.3737 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1201 Total reward: 110.0 Average reward fake: 0.4712947905063629 Average reward real: 0.4965815544128418 Training d_loss: 1.3509 Training g_loss: 0.7775 Training q_loss: 8.9904 Expl

-------------------------------------------------------------------------------
Episode: 1222 Total reward: 134.0 Average reward fake: 0.4834381639957428 Average reward real: 0.5326016545295715 Training d_loss: 1.3179 Training g_loss: 0.8644 Training q_loss: 40.3011 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1223 Total reward: 99.0 Average reward fake: 0.46620267629623413 Average reward real: 0.475400447845459 Training d_loss: 1.3895 Training g_loss: 0.7337 Training q_loss: 20.7389 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1224 Total reward: 110.0 Average reward fake: 0.5271692872047424 Average reward real: 0.5383912324905396 Training d_loss: 1.3706 Training g_loss: 0.6835 Training q_loss: 13.7648 Expl

-------------------------------------------------------------------------------
Episode: 1245 Total reward: 20.0 Average reward fake: 0.361400306224823 Average reward real: 0.6074838638305664 Training d_loss: 1.2839 Training g_loss: 2.8830 Training q_loss: 1576.8511 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1246 Total reward: 23.0 Average reward fake: 0.26833581924438477 Average reward real: 0.5098998546600342 Training d_loss: 1.0366 Training g_loss: 5.7484 Training q_loss: 102.0051 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1247 Total reward: 11.0 Average reward fake: 0.48661407828330994 Average reward real: 0.566306471824646 Training d_loss: 1.3027 Training g_loss: 0.6874 Training q_loss: 71.4359 Exp

-------------------------------------------------------------------------------
Episode: 1268 Total reward: 11.0 Average reward fake: 0.3275704085826874 Average reward real: 0.8606259226799011 Training d_loss: 0.6250 Training g_loss: 1.2550 Training q_loss: 10784.6436 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1269 Total reward: 10.0 Average reward fake: 0.33236733078956604 Average reward real: 0.7293275594711304 Training d_loss: 0.8483 Training g_loss: 1.0314 Training q_loss: 1133.6055 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1270 Total reward: 14.0 Average reward fake: 0.31778088212013245 Average reward real: 0.7321950197219849 Training d_loss: 0.8190 Training g_loss: 1.1779 Training q_loss: 177719.

-------------------------------------------------------------------------------
Episode: 1291 Total reward: 11.0 Average reward fake: 0.13265886902809143 Average reward real: 0.46383562684059143 Training d_loss: 1.3704 Training g_loss: 1.9845 Training q_loss: 337650.2500 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1292 Total reward: 16.0 Average reward fake: 0.2758910059928894 Average reward real: 0.8195492625236511 Training d_loss: 0.6088 Training g_loss: 3.4803 Training q_loss: 3757.0757 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1293 Total reward: 42.0 Average reward fake: 0.3674243986606598 Average reward real: 0.5938957929611206 Training d_loss: 1.0404 Training g_loss: 1.1245 Training q_loss: 2458.8

-------------------------------------------------------------------------------
Episode: 1314 Total reward: 96.0 Average reward fake: 0.43061771988868713 Average reward real: 0.47998061776161194 Training d_loss: 1.3727 Training g_loss: 2.9514 Training q_loss: 6509.1636 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1315 Total reward: 16.0 Average reward fake: 0.3659079670906067 Average reward real: 0.5385732650756836 Training d_loss: 1.2210 Training g_loss: 2.8614 Training q_loss: 2966.8384 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1316 Total reward: 13.0 Average reward fake: 0.2896396517753601 Average reward real: 0.6464258432388306 Training d_loss: 0.8782 Training g_loss: 1.4917 Training q_loss: 3676.863

-------------------------------------------------------------------------------
Episode: 1337 Total reward: 72.0 Average reward fake: 0.4757097661495209 Average reward real: 0.4757121205329895 Training d_loss: 1.4998 Training g_loss: 0.7834 Training q_loss: 859.9303 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1338 Total reward: 16.0 Average reward fake: 0.47042304277420044 Average reward real: 0.5150704383850098 Training d_loss: 1.3468 Training g_loss: 0.7664 Training q_loss: 724.5262 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1339 Total reward: 25.0 Average reward fake: 0.46063438057899475 Average reward real: 0.5213978290557861 Training d_loss: 1.3495 Training g_loss: 0.7675 Training q_loss: 410.8957 E

-------------------------------------------------------------------------------
Episode: 1361 Total reward: 10.0 Average reward fake: 0.4086046814918518 Average reward real: 0.5743745565414429 Training d_loss: 1.1644 Training g_loss: 0.9722 Training q_loss: 2769.1343 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1362 Total reward: 151.0 Average reward fake: 0.42949342727661133 Average reward real: 0.46865034103393555 Training d_loss: 1.3553 Training g_loss: 0.8100 Training q_loss: 67486.1562 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1363 Total reward: 15.0 Average reward fake: 0.49626508355140686 Average reward real: 0.5585451126098633 Training d_loss: 1.3186 Training g_loss: 0.9437 Training q_loss: 1343.

-------------------------------------------------------------------------------
Episode: 1384 Total reward: 131.0 Average reward fake: 0.5683118104934692 Average reward real: 0.5871785283088684 Training d_loss: 1.5060 Training g_loss: 1.8389 Training q_loss: 820.2256 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1385 Total reward: 179.0 Average reward fake: 0.48111122846603394 Average reward real: 0.46908092498779297 Training d_loss: 1.4243 Training g_loss: 0.7256 Training q_loss: 298.7675 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1386 Total reward: 159.0 Average reward fake: 0.40377846360206604 Average reward real: 0.5374479293823242 Training d_loss: 1.2201 Training g_loss: 1.9530 Training q_loss: 8951.8

-------------------------------------------------------------------------------
Episode: 1407 Total reward: 142.0 Average reward fake: 0.4866175055503845 Average reward real: 0.5616746544837952 Training d_loss: 1.3425 Training g_loss: 2.4762 Training q_loss: 106.9852 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1408 Total reward: 126.0 Average reward fake: 0.5513898730278015 Average reward real: 0.5484986305236816 Training d_loss: 1.4048 Training g_loss: 0.5975 Training q_loss: 81.2386 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1409 Total reward: 70.0 Average reward fake: 0.5568786859512329 Average reward real: 0.5756796598434448 Training d_loss: 1.3702 Training g_loss: 0.6245 Training q_loss: 184.3247 Ex

-------------------------------------------------------------------------------
Episode: 1430 Total reward: 124.0 Average reward fake: 0.36040452122688293 Average reward real: 0.5645569562911987 Training d_loss: 1.1286 Training g_loss: 4.4350 Training q_loss: 76.6915 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1431 Total reward: 92.0 Average reward fake: 0.47236576676368713 Average reward real: 0.5266520380973816 Training d_loss: 1.3110 Training g_loss: 2.3322 Training q_loss: 108.7840 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1432 Total reward: 111.0 Average reward fake: 0.3751078248023987 Average reward real: 0.5352877974510193 Training d_loss: 1.1622 Training g_loss: 3.8933 Training q_loss: 39.0641 E

-------------------------------------------------------------------------------
Episode: 1453 Total reward: 24.0 Average reward fake: 0.2791193127632141 Average reward real: 0.7162708044052124 Training d_loss: 0.8100 Training g_loss: 1.7947 Training q_loss: 86.4124 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1454 Total reward: 8.0 Average reward fake: 0.41984957456588745 Average reward real: 0.6272073984146118 Training d_loss: 1.0945 Training g_loss: 2.2596 Training q_loss: 6397.4800 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1455 Total reward: 178.0 Average reward fake: 0.46525439620018005 Average reward real: 0.5144920349121094 Training d_loss: 1.3213 Training g_loss: 1.9765 Training q_loss: 40.1283 Ex

-------------------------------------------------------------------------------
Episode: 1476 Total reward: 148.0 Average reward fake: 0.3352054953575134 Average reward real: 0.5063742399215698 Training d_loss: 1.1393 Training g_loss: 4.8181 Training q_loss: 11.8692 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1477 Total reward: 121.0 Average reward fake: 0.47018352150917053 Average reward real: 0.5206485986709595 Training d_loss: 1.3201 Training g_loss: 1.1916 Training q_loss: 74.9013 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1478 Total reward: 105.0 Average reward fake: 0.5156742930412292 Average reward real: 0.5864457488059998 Training d_loss: 1.3786 Training g_loss: 0.8186 Training q_loss: 12.3125 Ex

-------------------------------------------------------------------------------
Episode: 1499 Total reward: 137.0 Average reward fake: 0.42407774925231934 Average reward real: 0.5283613204956055 Training d_loss: 1.2408 Training g_loss: 1.5294 Training q_loss: 34.0972 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1500 Total reward: 132.0 Average reward fake: 0.4063146114349365 Average reward real: 0.5127202272415161 Training d_loss: 1.2356 Training g_loss: 3.1833 Training q_loss: 17.1703 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1501 Total reward: 129.0 Average reward fake: 0.47631677985191345 Average reward real: 0.48666414618492126 Training d_loss: 1.3759 Training g_loss: 0.7526 Training q_loss: 128.1685

-------------------------------------------------------------------------------
Episode: 1522 Total reward: 142.0 Average reward fake: 0.4469168186187744 Average reward real: 0.5019627809524536 Training d_loss: 1.3059 Training g_loss: 1.1098 Training q_loss: 24.4178 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1523 Total reward: 137.0 Average reward fake: 0.36676281690597534 Average reward real: 0.47148919105529785 Training d_loss: 1.2496 Training g_loss: 2.5255 Training q_loss: 39.2557 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1524 Total reward: 119.0 Average reward fake: 0.44812601804733276 Average reward real: 0.5035808086395264 Training d_loss: 1.3083 Training g_loss: 2.1658 Training q_loss: 42.6222 

-------------------------------------------------------------------------------
Episode: 1545 Total reward: 128.0 Average reward fake: 0.4916223883628845 Average reward real: 0.4916231632232666 Training d_loss: 1.3948 Training g_loss: 0.7160 Training q_loss: 16.7890 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1546 Total reward: 112.0 Average reward fake: 0.4509080946445465 Average reward real: 0.5114496946334839 Training d_loss: 1.3088 Training g_loss: 1.5981 Training q_loss: 29.6484 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1547 Total reward: 139.0 Average reward fake: 0.43928247690200806 Average reward real: 0.5182393193244934 Training d_loss: 1.2745 Training g_loss: 1.2933 Training q_loss: 37.4937 Ex

-------------------------------------------------------------------------------
Episode: 1568 Total reward: 132.0 Average reward fake: 0.4517940878868103 Average reward real: 0.5373419523239136 Training d_loss: 1.2633 Training g_loss: 1.8779 Training q_loss: 82.2641 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1569 Total reward: 140.0 Average reward fake: 0.471719354391098 Average reward real: 0.6124711036682129 Training d_loss: 1.2158 Training g_loss: 2.9179 Training q_loss: 55.3069 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1570 Total reward: 128.0 Average reward fake: 0.49417075514793396 Average reward real: 0.5578603744506836 Training d_loss: 1.3249 Training g_loss: 1.4094 Training q_loss: 67.1942 Exp

-------------------------------------------------------------------------------
Episode: 1591 Total reward: 129.0 Average reward fake: 0.5146673917770386 Average reward real: 0.5438628196716309 Training d_loss: 1.3468 Training g_loss: 0.6968 Training q_loss: 46.7297 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1592 Total reward: 106.0 Average reward fake: 0.44396448135375977 Average reward real: 0.4974868893623352 Training d_loss: 1.3106 Training g_loss: 2.0165 Training q_loss: 53.3476 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1593 Total reward: 119.0 Average reward fake: 0.4084378182888031 Average reward real: 0.5148600339889526 Training d_loss: 1.2274 Training g_loss: 1.9668 Training q_loss: 24.2098 Ex

-------------------------------------------------------------------------------
Episode: 1614 Total reward: 23.0 Average reward fake: 0.48841986060142517 Average reward real: 0.5312877893447876 Training d_loss: 1.3614 Training g_loss: 0.7256 Training q_loss: 167.6000 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1615 Total reward: 27.0 Average reward fake: 0.4672042727470398 Average reward real: 0.5665836334228516 Training d_loss: 1.2728 Training g_loss: 0.8394 Training q_loss: 19.2799 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1616 Total reward: 92.0 Average reward fake: 0.5188308954238892 Average reward real: 0.5457527041435242 Training d_loss: 1.3723 Training g_loss: 0.6782 Training q_loss: 231.0135 Exp

-------------------------------------------------------------------------------
Episode: 1637 Total reward: 121.0 Average reward fake: 0.44153207540512085 Average reward real: 0.6074353456497192 Training d_loss: 1.1691 Training g_loss: 2.9885 Training q_loss: 43.5993 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1638 Total reward: 139.0 Average reward fake: 0.44862446188926697 Average reward real: 0.5056736469268799 Training d_loss: 1.3277 Training g_loss: 1.2883 Training q_loss: 99.8343 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1639 Total reward: 157.0 Average reward fake: 0.4512598514556885 Average reward real: 0.4830660820007324 Training d_loss: 1.3344 Training g_loss: 0.8110 Training q_loss: 13.0050 E

-------------------------------------------------------------------------------
Episode: 1660 Total reward: 119.0 Average reward fake: 0.5061683058738708 Average reward real: 0.5090523958206177 Training d_loss: 1.3816 Training g_loss: 0.6735 Training q_loss: 115.1735 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1661 Total reward: 149.0 Average reward fake: 0.5097972750663757 Average reward real: 0.5098910331726074 Training d_loss: 1.3869 Training g_loss: 0.6735 Training q_loss: 24.8997 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1662 Total reward: 120.0 Average reward fake: 0.5048408508300781 Average reward real: 0.5924416780471802 Training d_loss: 1.2949 Training g_loss: 1.7745 Training q_loss: 11.2242 Ex

-------------------------------------------------------------------------------
Episode: 1683 Total reward: 153.0 Average reward fake: 0.4750703275203705 Average reward real: 0.5313067436218262 Training d_loss: 1.3102 Training g_loss: 1.8878 Training q_loss: 38.8214 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1684 Total reward: 117.0 Average reward fake: 0.5106614828109741 Average reward real: 0.5366902351379395 Training d_loss: 1.3473 Training g_loss: 0.7143 Training q_loss: 13.2796 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1685 Total reward: 110.0 Average reward fake: 0.49640774726867676 Average reward real: 0.5478660464286804 Training d_loss: 1.3272 Training g_loss: 1.8579 Training q_loss: 26.8726 Ex

-------------------------------------------------------------------------------
Episode: 1706 Total reward: 126.0 Average reward fake: 0.5899516344070435 Average reward real: 0.613911509513855 Training d_loss: 1.4121 Training g_loss: 0.5691 Training q_loss: 159.2048 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1707 Total reward: 130.0 Average reward fake: 0.463967502117157 Average reward real: 0.5342104434967041 Training d_loss: 1.2759 Training g_loss: 0.8561 Training q_loss: 142.5410 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1708 Total reward: 124.0 Average reward fake: 0.4215218424797058 Average reward real: 0.43844833970069885 Training d_loss: 1.4107 Training g_loss: 0.9375 Training q_loss: 46.5941 Ex

-------------------------------------------------------------------------------
Episode: 1729 Total reward: 146.0 Average reward fake: 0.4140484929084778 Average reward real: 0.5221914052963257 Training d_loss: 1.2482 Training g_loss: 1.6293 Training q_loss: 28.5388 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1730 Total reward: 134.0 Average reward fake: 0.4800266623497009 Average reward real: 0.49917030334472656 Training d_loss: 1.3529 Training g_loss: 0.7711 Training q_loss: 27.7238 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1731 Total reward: 144.0 Average reward fake: 0.42849355936050415 Average reward real: 0.5342718958854675 Training d_loss: 1.2414 Training g_loss: 1.6832 Training q_loss: 179.8579 

-------------------------------------------------------------------------------
Episode: 1752 Total reward: 120.0 Average reward fake: 0.42993107438087463 Average reward real: 0.4854159355163574 Training d_loss: 1.3258 Training g_loss: 1.4620 Training q_loss: 9.7502 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1753 Total reward: 161.0 Average reward fake: 0.5155826210975647 Average reward real: 0.5763036608695984 Training d_loss: 1.3217 Training g_loss: 1.7083 Training q_loss: 426.6564 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1754 Total reward: 199.0 Average reward fake: 0.4622724652290344 Average reward real: 0.5112875699996948 Training d_loss: 1.3199 Training g_loss: 1.1172 Training q_loss: 26.0443 Ex

-------------------------------------------------------------------------------
Episode: 1775 Total reward: 155.0 Average reward fake: 0.5005054473876953 Average reward real: 0.4852696359157562 Training d_loss: 1.4227 Training g_loss: 0.6715 Training q_loss: 20.5132 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1776 Total reward: 79.0 Average reward fake: 0.4229504466056824 Average reward real: 0.4792235791683197 Training d_loss: 1.3123 Training g_loss: 1.9368 Training q_loss: 24.3406 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1777 Total reward: 153.0 Average reward fake: 0.5248733162879944 Average reward real: 0.5776067972183228 Training d_loss: 1.3355 Training g_loss: 0.6878 Training q_loss: 58.5785 Expl

-------------------------------------------------------------------------------
Episode: 1798 Total reward: 116.0 Average reward fake: 0.5261661410331726 Average reward real: 0.5747723579406738 Training d_loss: 1.3480 Training g_loss: 1.2776 Training q_loss: 15.9677 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1799 Total reward: 130.0 Average reward fake: 0.4804733395576477 Average reward real: 0.5466176867485046 Training d_loss: 1.2943 Training g_loss: 1.9595 Training q_loss: 27.6418 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1800 Total reward: 119.0 Average reward fake: 0.3999420702457428 Average reward real: 0.49240928888320923 Training d_loss: 1.2724 Training g_loss: 0.9495 Training q_loss: 137.4806 E

-------------------------------------------------------------------------------
Episode: 1821 Total reward: 199.0 Average reward fake: 0.5101677179336548 Average reward real: 0.5249811410903931 Training d_loss: 1.3615 Training g_loss: 0.7016 Training q_loss: 47.5311 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1822 Total reward: 169.0 Average reward fake: 0.5327104330062866 Average reward real: 0.5370339155197144 Training d_loss: 1.3877 Training g_loss: 0.6337 Training q_loss: 8.3114 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1823 Total reward: 168.0 Average reward fake: 0.3897620439529419 Average reward real: 0.552075982093811 Training d_loss: 1.1546 Training g_loss: 3.1355 Training q_loss: 33.8379 Explo

-------------------------------------------------------------------------------
Episode: 1844 Total reward: 167.0 Average reward fake: 0.46569499373435974 Average reward real: 0.5629452466964722 Training d_loss: 1.2504 Training g_loss: 2.1241 Training q_loss: 10.0137 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1845 Total reward: 126.0 Average reward fake: 0.41474151611328125 Average reward real: 0.5396633744239807 Training d_loss: 1.2055 Training g_loss: 1.5902 Training q_loss: 9.0787 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1846 Total reward: 143.0 Average reward fake: 0.4591712951660156 Average reward real: 0.5598787665367126 Training d_loss: 1.2644 Training g_loss: 2.3598 Training q_loss: 15.4816 Ex

-------------------------------------------------------------------------------
Episode: 1867 Total reward: 162.0 Average reward fake: 0.45060521364212036 Average reward real: 0.49790114164352417 Training d_loss: 1.3247 Training g_loss: 2.3510 Training q_loss: 7.4334 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1868 Total reward: 166.0 Average reward fake: 0.30485644936561584 Average reward real: 0.5144084095954895 Training d_loss: 1.0908 Training g_loss: 5.9946 Training q_loss: 19.8940 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1869 Total reward: 188.0 Average reward fake: 0.5374490022659302 Average reward real: 0.537457287311554 Training d_loss: 1.3944 Training g_loss: 0.6138 Training q_loss: 11.0694 Ex

-------------------------------------------------------------------------------
Episode: 1890 Total reward: 173.0 Average reward fake: 0.4908461570739746 Average reward real: 0.5159760117530823 Training d_loss: 1.3431 Training g_loss: 0.7149 Training q_loss: 19.9813 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1891 Total reward: 133.0 Average reward fake: 0.5499955415725708 Average reward real: 0.5503470301628113 Training d_loss: 1.3972 Training g_loss: 0.6074 Training q_loss: 14.7009 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1892 Total reward: 190.0 Average reward fake: 0.29541319608688354 Average reward real: 0.5020337104797363 Training d_loss: 1.1006 Training g_loss: 5.9836 Training q_loss: 18.8576 Ex

-------------------------------------------------------------------------------
Episode: 1913 Total reward: 130.0 Average reward fake: 0.4739936292171478 Average reward real: 0.4985281527042389 Training d_loss: 1.3494 Training g_loss: 0.7472 Training q_loss: 64.3770 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1914 Total reward: 148.0 Average reward fake: 0.42832151055336 Average reward real: 0.5274075865745544 Training d_loss: 1.2402 Training g_loss: 1.3934 Training q_loss: 69.3795 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1915 Total reward: 120.0 Average reward fake: 0.5421221852302551 Average reward real: 0.6115821003913879 Training d_loss: 1.3277 Training g_loss: 0.6402 Training q_loss: 5216.4619 Exp

-------------------------------------------------------------------------------
Episode: 1936 Total reward: 140.0 Average reward fake: 0.3918817937374115 Average reward real: 0.45280665159225464 Training d_loss: 1.3445 Training g_loss: 1.9267 Training q_loss: 116.5945 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1937 Total reward: 138.0 Average reward fake: 0.4574356973171234 Average reward real: 0.5029247999191284 Training d_loss: 1.3393 Training g_loss: 1.2332 Training q_loss: 2649.7820 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1938 Total reward: 146.0 Average reward fake: 0.4099009931087494 Average reward real: 0.5105112195014954 Training d_loss: 1.2513 Training g_loss: 2.7391 Training q_loss: 22.3743

-------------------------------------------------------------------------------
Episode: 1959 Total reward: 15.0 Average reward fake: 0.22743971645832062 Average reward real: 0.46526771783828735 Training d_loss: 1.0877 Training g_loss: 5.3462 Training q_loss: 494.9361 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1960 Total reward: 141.0 Average reward fake: 0.5411649942398071 Average reward real: 0.5411651730537415 Training d_loss: 1.3947 Training g_loss: 0.5950 Training q_loss: 6.0249 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1961 Total reward: 157.0 Average reward fake: 0.5708208680152893 Average reward real: 0.5626407861709595 Training d_loss: 1.4415 Training g_loss: 0.5954 Training q_loss: 47.1941 Ex

-------------------------------------------------------------------------------
Episode: 1982 Total reward: 136.0 Average reward fake: 0.565522313117981 Average reward real: 0.5722638964653015 Training d_loss: 1.3964 Training g_loss: 0.5911 Training q_loss: 1573.4821 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1983 Total reward: 130.0 Average reward fake: 0.5406445264816284 Average reward real: 0.5416920781135559 Training d_loss: 1.4082 Training g_loss: 0.6203 Training q_loss: 5.9207 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 1984 Total reward: 130.0 Average reward fake: 0.3627474904060364 Average reward real: 0.5005055665969849 Training d_loss: 1.2062 Training g_loss: 2.9167 Training q_loss: 26.9922 Exp

-------------------------------------------------------------------------------
Episode: 2005 Total reward: 150.0 Average reward fake: 0.4469723701477051 Average reward real: 0.5855755805969238 Training d_loss: 1.1895 Training g_loss: 2.7992 Training q_loss: 63.5032 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2006 Total reward: 122.0 Average reward fake: 0.5602672696113586 Average reward real: 0.5813889503479004 Training d_loss: 1.3686 Training g_loss: 0.5883 Training q_loss: 26.8510 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2007 Total reward: 124.0 Average reward fake: 0.43797463178634644 Average reward real: 0.5594358444213867 Training d_loss: 1.2173 Training g_loss: 4.5391 Training q_loss: 105.1331 E

-------------------------------------------------------------------------------
Episode: 2028 Total reward: 135.0 Average reward fake: 0.3211236596107483 Average reward real: 0.5276048183441162 Training d_loss: 1.0952 Training g_loss: 5.4341 Training q_loss: 31.5925 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2029 Total reward: 133.0 Average reward fake: 0.42356353998184204 Average reward real: 0.5435510873794556 Training d_loss: 1.2263 Training g_loss: 2.8543 Training q_loss: 82.4300 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2030 Total reward: 151.0 Average reward fake: 0.39430171251296997 Average reward real: 0.5713688135147095 Training d_loss: 1.1307 Training g_loss: 3.4055 Training q_loss: 59.9212 E

-------------------------------------------------------------------------------
Episode: 2051 Total reward: 150.0 Average reward fake: 0.44970959424972534 Average reward real: 0.5069707036018372 Training d_loss: 1.3093 Training g_loss: 2.9123 Training q_loss: 6.2032 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2052 Total reward: 158.0 Average reward fake: 0.5606015920639038 Average reward real: 0.5640792846679688 Training d_loss: 1.3961 Training g_loss: 0.5900 Training q_loss: 4.3670 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2053 Total reward: 140.0 Average reward fake: 0.4583517909049988 Average reward real: 0.5622437000274658 Training d_loss: 1.2501 Training g_loss: 2.9145 Training q_loss: 76.4794 Expl

-------------------------------------------------------------------------------
Episode: 2074 Total reward: 151.0 Average reward fake: 0.45249781012535095 Average reward real: 0.5749477744102478 Training d_loss: 1.2223 Training g_loss: 4.4952 Training q_loss: 25.8836 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2075 Total reward: 116.0 Average reward fake: 0.4230380952358246 Average reward real: 0.5409854054450989 Training d_loss: 1.2248 Training g_loss: 4.7395 Training q_loss: 30.1199 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2076 Total reward: 157.0 Average reward fake: 0.390669047832489 Average reward real: 0.5843188762664795 Training d_loss: 1.1283 Training g_loss: 6.3167 Training q_loss: 22.1004 Exp

-------------------------------------------------------------------------------
Episode: 2097 Total reward: 189.0 Average reward fake: 0.4874495565891266 Average reward real: 0.5428494215011597 Training d_loss: 1.3130 Training g_loss: 2.7720 Training q_loss: 9.2431 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2098 Total reward: 140.0 Average reward fake: 0.4713045656681061 Average reward real: 0.5241335034370422 Training d_loss: 1.3135 Training g_loss: 3.3022 Training q_loss: 12.5984 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2099 Total reward: 192.0 Average reward fake: 0.47684144973754883 Average reward real: 0.5361538529396057 Training d_loss: 1.2989 Training g_loss: 0.8298 Training q_loss: 17.2804 Exp

-------------------------------------------------------------------------------
Episode: 2120 Total reward: 153.0 Average reward fake: 0.4782448410987854 Average reward real: 0.5209789872169495 Training d_loss: 1.3213 Training g_loss: 0.8370 Training q_loss: 12.4667 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2121 Total reward: 199.0 Average reward fake: 0.4984578490257263 Average reward real: 0.49957913160324097 Training d_loss: 1.3850 Training g_loss: 0.7005 Training q_loss: 30.2976 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2122 Total reward: 173.0 Average reward fake: 0.5210673213005066 Average reward real: 0.5763821005821228 Training d_loss: 1.3317 Training g_loss: 3.5269 Training q_loss: 5.5962 Exp

-------------------------------------------------------------------------------
Episode: 2143 Total reward: 161.0 Average reward fake: 0.39650753140449524 Average reward real: 0.5056045651435852 Training d_loss: 1.2313 Training g_loss: 3.4286 Training q_loss: 8.0304 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2144 Total reward: 184.0 Average reward fake: 0.5327368974685669 Average reward real: 0.5328412055969238 Training d_loss: 1.3924 Training g_loss: 0.6357 Training q_loss: 7.7134 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2145 Total reward: 169.0 Average reward fake: 0.5039133429527283 Average reward real: 0.5117405652999878 Training d_loss: 1.3754 Training g_loss: 0.6883 Training q_loss: 19.3798 Expl

-------------------------------------------------------------------------------
Episode: 2166 Total reward: 136.0 Average reward fake: 0.4893629550933838 Average reward real: 0.5407912731170654 Training d_loss: 1.3312 Training g_loss: 3.7479 Training q_loss: 9.0189 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2167 Total reward: 162.0 Average reward fake: 0.47516489028930664 Average reward real: 0.5286473035812378 Training d_loss: 1.3131 Training g_loss: 1.4544 Training q_loss: 6.4707 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2168 Total reward: 159.0 Average reward fake: 0.39725741744041443 Average reward real: 0.5053485035896301 Training d_loss: 1.2378 Training g_loss: 6.0089 Training q_loss: 4.5889 Expl

-------------------------------------------------------------------------------
Episode: 2189 Total reward: 167.0 Average reward fake: 0.3730963170528412 Average reward real: 0.5281680822372437 Training d_loss: 1.1713 Training g_loss: 7.5145 Training q_loss: 89.1673 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2190 Total reward: 184.0 Average reward fake: 0.4108700752258301 Average reward real: 0.5070797204971313 Training d_loss: 1.2621 Training g_loss: 3.5315 Training q_loss: 60.5887 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2191 Total reward: 172.0 Average reward fake: 0.35752469301223755 Average reward real: 0.5085422396659851 Training d_loss: 1.1815 Training g_loss: 6.1725 Training q_loss: 16.3666 Ex

-------------------------------------------------------------------------------
Episode: 2212 Total reward: 192.0 Average reward fake: 0.488590806722641 Average reward real: 0.5545212030410767 Training d_loss: 1.2993 Training g_loss: 2.5706 Training q_loss: 29.5051 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2213 Total reward: 199.0 Average reward fake: 0.45077061653137207 Average reward real: 0.5584815144538879 Training d_loss: 1.2472 Training g_loss: 4.4135 Training q_loss: 3.2797 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2214 Total reward: 190.0 Average reward fake: 0.41019517183303833 Average reward real: 0.4493195414543152 Training d_loss: 1.3682 Training g_loss: 4.1581 Training q_loss: 23.7793 Exp

-------------------------------------------------------------------------------
Episode: 2235 Total reward: 184.0 Average reward fake: 0.33258718252182007 Average reward real: 0.47439664602279663 Training d_loss: 1.2040 Training g_loss: 6.7740 Training q_loss: 111.7815 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2236 Total reward: 199.0 Average reward fake: 0.461151659488678 Average reward real: 0.5866755247116089 Training d_loss: 1.2222 Training g_loss: 5.4026 Training q_loss: 18.0716 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2237 Total reward: 199.0 Average reward fake: 0.5517438650131226 Average reward real: 0.5520204305648804 Training d_loss: 1.3967 Training g_loss: 0.5917 Training q_loss: 40.1121 E

-------------------------------------------------------------------------------
Episode: 2258 Total reward: 199.0 Average reward fake: 0.5657106637954712 Average reward real: 0.5660408735275269 Training d_loss: 1.4099 Training g_loss: 0.5758 Training q_loss: 11.7025 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2259 Total reward: 199.0 Average reward fake: 0.29476842284202576 Average reward real: 0.4415695071220398 Training d_loss: 1.2320 Training g_loss: 4.0599 Training q_loss: 44.2887 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2260 Total reward: 199.0 Average reward fake: 0.5398627519607544 Average reward real: 0.5399410724639893 Training d_loss: 1.3932 Training g_loss: 0.6356 Training q_loss: 30.9153 Ex

-------------------------------------------------------------------------------
Episode: 2281 Total reward: 191.0 Average reward fake: 0.37517470121383667 Average reward real: 0.5900164842605591 Training d_loss: 1.0864 Training g_loss: 7.0856 Training q_loss: 38.3997 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2282 Total reward: 116.0 Average reward fake: 0.42311960458755493 Average reward real: 0.5798717737197876 Training d_loss: 1.1609 Training g_loss: 6.0874 Training q_loss: 18.4196 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2283 Total reward: 199.0 Average reward fake: 0.5111573934555054 Average reward real: 0.5011434555053711 Training d_loss: 1.4144 Training g_loss: 0.6366 Training q_loss: 46.9466 E

-------------------------------------------------------------------------------
Episode: 2304 Total reward: 199.0 Average reward fake: 0.5172762870788574 Average reward real: 0.5173077583312988 Training d_loss: 1.3958 Training g_loss: 0.6679 Training q_loss: 3.6162 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2305 Total reward: 199.0 Average reward fake: 0.43287330865859985 Average reward real: 0.4876459538936615 Training d_loss: 1.3097 Training g_loss: 4.6849 Training q_loss: 20.9078 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2306 Total reward: 159.0 Average reward fake: 0.4889759421348572 Average reward real: 0.5442438125610352 Training d_loss: 1.3138 Training g_loss: 4.4759 Training q_loss: 24.2353 Exp

-------------------------------------------------------------------------------
Episode: 2327 Total reward: 199.0 Average reward fake: 0.4716639518737793 Average reward real: 0.5289571285247803 Training d_loss: 1.3048 Training g_loss: 1.1412 Training q_loss: 7.5526 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2328 Total reward: 161.0 Average reward fake: 0.4697260856628418 Average reward real: 0.5267413854598999 Training d_loss: 1.3121 Training g_loss: 4.7318 Training q_loss: 14.7792 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2329 Total reward: 169.0 Average reward fake: 0.5337440371513367 Average reward real: 0.5428370237350464 Training d_loss: 1.3772 Training g_loss: 0.6391 Training q_loss: 40.6643 Expl

-------------------------------------------------------------------------------
Episode: 2350 Total reward: 199.0 Average reward fake: 0.4577743113040924 Average reward real: 0.5023900270462036 Training d_loss: 1.3236 Training g_loss: 0.9558 Training q_loss: 46.6183 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2351 Total reward: 199.0 Average reward fake: 0.4995017945766449 Average reward real: 0.5137590169906616 Training d_loss: 1.3640 Training g_loss: 0.7168 Training q_loss: 28.4053 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2352 Total reward: 199.0 Average reward fake: 0.5065504312515259 Average reward real: 0.5347422361373901 Training d_loss: 1.3461 Training g_loss: 0.7464 Training q_loss: 11.1731 Exp

-------------------------------------------------------------------------------
Episode: 2373 Total reward: 170.0 Average reward fake: 0.5093172192573547 Average reward real: 0.5094026923179626 Training d_loss: 1.3866 Training g_loss: 0.6640 Training q_loss: 60.2360 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2374 Total reward: 146.0 Average reward fake: 0.5304824113845825 Average reward real: 0.5476828813552856 Training d_loss: 1.3669 Training g_loss: 0.6790 Training q_loss: 15.5315 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2375 Total reward: 144.0 Average reward fake: 0.36027300357818604 Average reward real: 0.5037962198257446 Training d_loss: 1.1761 Training g_loss: 4.8146 Training q_loss: 24.4041 Ex

-------------------------------------------------------------------------------
Episode: 2396 Total reward: 199.0 Average reward fake: 0.5687838196754456 Average reward real: 0.5816316604614258 Training d_loss: 1.3884 Training g_loss: 0.5724 Training q_loss: 54.4558 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2397 Total reward: 199.0 Average reward fake: 0.5468384623527527 Average reward real: 0.5501962900161743 Training d_loss: 1.3936 Training g_loss: 0.6040 Training q_loss: 20.0250 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2398 Total reward: 199.0 Average reward fake: 0.44391974806785583 Average reward real: 0.5666347742080688 Training d_loss: 1.2351 Training g_loss: 3.3954 Training q_loss: 39.1851 Ex

-------------------------------------------------------------------------------
Episode: 2419 Total reward: 199.0 Average reward fake: 0.44048041105270386 Average reward real: 0.5501374006271362 Training d_loss: 1.2375 Training g_loss: 6.7054 Training q_loss: 20.7937 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2420 Total reward: 199.0 Average reward fake: 0.42800840735435486 Average reward real: 0.5427751541137695 Training d_loss: 1.2116 Training g_loss: 5.9040 Training q_loss: 5.1739 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2421 Total reward: 199.0 Average reward fake: 0.5133494734764099 Average reward real: 0.5143860578536987 Training d_loss: 1.3880 Training g_loss: 0.6583 Training q_loss: 16.3747 Ex

-------------------------------------------------------------------------------
Episode: 2442 Total reward: 177.0 Average reward fake: 0.5424100756645203 Average reward real: 0.5439333915710449 Training d_loss: 1.3953 Training g_loss: 0.6112 Training q_loss: 10.3439 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2443 Total reward: 199.0 Average reward fake: 0.46081313490867615 Average reward real: 0.6026076078414917 Training d_loss: 1.2004 Training g_loss: 5.4177 Training q_loss: 10.6168 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2444 Total reward: 161.0 Average reward fake: 0.32981884479522705 Average reward real: 0.5354669690132141 Training d_loss: 1.1076 Training g_loss: 12.7670 Training q_loss: 5.8866 E

-------------------------------------------------------------------------------
Episode: 2465 Total reward: 129.0 Average reward fake: 0.3474074900150299 Average reward real: 0.5029116868972778 Training d_loss: 1.1742 Training g_loss: 10.3418 Training q_loss: 1684.0105 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2466 Total reward: 199.0 Average reward fake: 0.4685676097869873 Average reward real: 0.538622260093689 Training d_loss: 1.2883 Training g_loss: 6.4131 Training q_loss: 11.2037 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2467 Total reward: 199.0 Average reward fake: 0.5163890719413757 Average reward real: 0.5683028697967529 Training d_loss: 1.3311 Training g_loss: 0.9525 Training q_loss: 13.5638 E

-------------------------------------------------------------------------------
Episode: 2488 Total reward: 199.0 Average reward fake: 0.42471200227737427 Average reward real: 0.5257483720779419 Training d_loss: 1.2420 Training g_loss: 6.6448 Training q_loss: 212.3383 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2489 Total reward: 199.0 Average reward fake: 0.36478063464164734 Average reward real: 0.4792179465293884 Training d_loss: 1.2472 Training g_loss: 5.6684 Training q_loss: 33.7262 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2490 Total reward: 199.0 Average reward fake: 0.4403045177459717 Average reward real: 0.5630297660827637 Training d_loss: 1.2173 Training g_loss: 7.7334 Training q_loss: 31.0864 

-------------------------------------------------------------------------------
Episode: 2511 Total reward: 51.0 Average reward fake: 0.3233502507209778 Average reward real: 0.6594581604003906 Training d_loss: 0.9009 Training g_loss: 5.3846 Training q_loss: 885.3234 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2512 Total reward: 10.0 Average reward fake: 0.37504369020462036 Average reward real: 0.5033085346221924 Training d_loss: 1.4671 Training g_loss: 1.0149 Training q_loss: 109.5928 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2513 Total reward: 26.0 Average reward fake: 0.42119598388671875 Average reward real: 0.5394867062568665 Training d_loss: 1.1811 Training g_loss: 0.9303 Training q_loss: 239.3221 E

-------------------------------------------------------------------------------
Episode: 2534 Total reward: 9.0 Average reward fake: 0.3201825022697449 Average reward real: 0.6059659719467163 Training d_loss: 1.1680 Training g_loss: 7.8985 Training q_loss: 65.3391 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2535 Total reward: 27.0 Average reward fake: 0.4560033679008484 Average reward real: 0.5756610631942749 Training d_loss: 1.1761 Training g_loss: 0.8148 Training q_loss: 532.5746 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2536 Total reward: 11.0 Average reward fake: 0.47666066884994507 Average reward real: 0.6435960531234741 Training d_loss: 1.2979 Training g_loss: 4.2552 Training q_loss: 79.0596 Explo

-------------------------------------------------------------------------------
Episode: 2557 Total reward: 9.0 Average reward fake: 0.30207523703575134 Average reward real: 0.6527377963066101 Training d_loss: 0.9621 Training g_loss: 16.0637 Training q_loss: 2092.0142 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2558 Total reward: 11.0 Average reward fake: 0.43464452028274536 Average reward real: 0.6073979735374451 Training d_loss: 1.2322 Training g_loss: 1.3879 Training q_loss: 78.3966 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2559 Total reward: 46.0 Average reward fake: 0.5174279808998108 Average reward real: 0.6246005296707153 Training d_loss: 1.3475 Training g_loss: 5.8767 Training q_loss: 64.7711 Ex

-------------------------------------------------------------------------------
Episode: 2580 Total reward: 48.0 Average reward fake: 0.389112263917923 Average reward real: 0.5661994814872742 Training d_loss: 1.1508 Training g_loss: 6.3813 Training q_loss: 2375.0469 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2581 Total reward: 68.0 Average reward fake: 0.49105411767959595 Average reward real: 0.5123985409736633 Training d_loss: 1.3842 Training g_loss: 0.7421 Training q_loss: 43.1513 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2582 Total reward: 83.0 Average reward fake: 0.3985908627510071 Average reward real: 0.4982174336910248 Training d_loss: 1.2482 Training g_loss: 5.9499 Training q_loss: 1543.7667 Ex

-------------------------------------------------------------------------------
Episode: 2603 Total reward: 100.0 Average reward fake: 0.44357866048812866 Average reward real: 0.5362070798873901 Training d_loss: 1.2422 Training g_loss: 5.1744 Training q_loss: 900.5539 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2604 Total reward: 65.0 Average reward fake: 0.47464480996131897 Average reward real: 0.5078088045120239 Training d_loss: 1.3488 Training g_loss: 0.7974 Training q_loss: 3491.6289 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2605 Total reward: 83.0 Average reward fake: 0.5663134455680847 Average reward real: 0.5665464997291565 Training d_loss: 1.4244 Training g_loss: 0.5762 Training q_loss: 262.8293

-------------------------------------------------------------------------------
Episode: 2626 Total reward: 132.0 Average reward fake: 0.45134973526000977 Average reward real: 0.5222405791282654 Training d_loss: 1.2895 Training g_loss: 0.8230 Training q_loss: 801.0266 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2627 Total reward: 110.0 Average reward fake: 0.46781620383262634 Average reward real: 0.522809624671936 Training d_loss: 1.3072 Training g_loss: 1.0221 Training q_loss: 887.8469 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2628 Total reward: 117.0 Average reward fake: 0.5661726593971252 Average reward real: 0.572475254535675 Training d_loss: 1.4218 Training g_loss: 0.5841 Training q_loss: 344.3505 

-------------------------------------------------------------------------------
Episode: 2649 Total reward: 115.0 Average reward fake: 0.38878193497657776 Average reward real: 0.5150322914123535 Training d_loss: 1.2001 Training g_loss: 3.1760 Training q_loss: 425.9957 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2650 Total reward: 110.0 Average reward fake: 0.30446168780326843 Average reward real: 0.4962776303291321 Training d_loss: 1.1235 Training g_loss: 14.9836 Training q_loss: 189.3732 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2651 Total reward: 129.0 Average reward fake: 0.3823583722114563 Average reward real: 0.555749773979187 Training d_loss: 1.1421 Training g_loss: 13.9994 Training q_loss: 262.56

-------------------------------------------------------------------------------
Episode: 2672 Total reward: 178.0 Average reward fake: 0.40736645460128784 Average reward real: 0.5503005981445312 Training d_loss: 1.1899 Training g_loss: 6.3962 Training q_loss: 155.2588 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2673 Total reward: 190.0 Average reward fake: 0.5451167821884155 Average reward real: 0.5451194047927856 Training d_loss: 1.4144 Training g_loss: 0.6028 Training q_loss: 62.8229 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2674 Total reward: 185.0 Average reward fake: 0.47393307089805603 Average reward real: 0.5354830622673035 Training d_loss: 1.3024 Training g_loss: 2.0168 Training q_loss: 22.0689 

-------------------------------------------------------------------------------
Episode: 2695 Total reward: 199.0 Average reward fake: 0.5270217657089233 Average reward real: 0.5312334895133972 Training d_loss: 1.3845 Training g_loss: 0.6398 Training q_loss: 27.7094 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2696 Total reward: 195.0 Average reward fake: 0.4699316620826721 Average reward real: 0.5751113891601562 Training d_loss: 1.2575 Training g_loss: 2.0134 Training q_loss: 60.2632 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2697 Total reward: 82.0 Average reward fake: 0.5102354884147644 Average reward real: 0.5947762727737427 Training d_loss: 1.2889 Training g_loss: 0.9793 Training q_loss: 91.5865 Expl

-------------------------------------------------------------------------------
Episode: 2718 Total reward: 199.0 Average reward fake: 0.4722938537597656 Average reward real: 0.5452605485916138 Training d_loss: 1.2760 Training g_loss: 0.8509 Training q_loss: 29.7312 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2719 Total reward: 199.0 Average reward fake: 0.5235366821289062 Average reward real: 0.5951284170150757 Training d_loss: 1.3056 Training g_loss: 1.5531 Training q_loss: 44.1252 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2720 Total reward: 199.0 Average reward fake: 0.4187021851539612 Average reward real: 0.4750896394252777 Training d_loss: 1.3146 Training g_loss: 6.2488 Training q_loss: 41.8582 Exp

-------------------------------------------------------------------------------
Episode: 2741 Total reward: 199.0 Average reward fake: 0.5069402456283569 Average reward real: 0.5703179240226746 Training d_loss: 1.3084 Training g_loss: 2.4199 Training q_loss: 32.2158 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2742 Total reward: 199.0 Average reward fake: 0.4589482247829437 Average reward real: 0.5704008340835571 Training d_loss: 1.2405 Training g_loss: 7.2650 Training q_loss: 41.8197 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2743 Total reward: 125.0 Average reward fake: 0.4632642865180969 Average reward real: 0.5549160242080688 Training d_loss: 1.2575 Training g_loss: 0.9452 Training q_loss: 30.2986 Exp

-------------------------------------------------------------------------------
Episode: 2764 Total reward: 181.0 Average reward fake: 0.42344847321510315 Average reward real: 0.5551108717918396 Training d_loss: 1.2069 Training g_loss: 5.4280 Training q_loss: 19.6989 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2765 Total reward: 117.0 Average reward fake: 0.5176529884338379 Average reward real: 0.590206503868103 Training d_loss: 1.2997 Training g_loss: 2.0318 Training q_loss: 453.0030 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2766 Total reward: 199.0 Average reward fake: 0.34147173166275024 Average reward real: 0.48112908005714417 Training d_loss: 1.1989 Training g_loss: 6.5818 Training q_loss: 21.4624 

-------------------------------------------------------------------------------
Episode: 2787 Total reward: 163.0 Average reward fake: 0.46820059418678284 Average reward real: 0.5267127752304077 Training d_loss: 1.3055 Training g_loss: 7.9917 Training q_loss: 12.0714 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2788 Total reward: 188.0 Average reward fake: 0.42476168274879456 Average reward real: 0.5328348278999329 Training d_loss: 1.2400 Training g_loss: 8.4558 Training q_loss: 14.5462 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2789 Total reward: 199.0 Average reward fake: 0.510823667049408 Average reward real: 0.5670154094696045 Training d_loss: 1.3220 Training g_loss: 5.3093 Training q_loss: 32.4092 Ex

-------------------------------------------------------------------------------
Episode: 2810 Total reward: 125.0 Average reward fake: 0.4609929621219635 Average reward real: 0.5188143849372864 Training d_loss: 1.3136 Training g_loss: 6.7750 Training q_loss: 29.5518 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2811 Total reward: 199.0 Average reward fake: 0.4581308960914612 Average reward real: 0.5122846961021423 Training d_loss: 1.3024 Training g_loss: 0.8994 Training q_loss: 10.4561 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2812 Total reward: 153.0 Average reward fake: 0.4166586995124817 Average reward real: 0.5998974442481995 Training d_loss: 1.1485 Training g_loss: 12.2094 Training q_loss: 10.1480 Ex

-------------------------------------------------------------------------------
Episode: 2833 Total reward: 165.0 Average reward fake: 0.5344166159629822 Average reward real: 0.5344504117965698 Training d_loss: 1.3918 Training g_loss: 0.6404 Training q_loss: 14.2493 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2834 Total reward: 138.0 Average reward fake: 0.467355340719223 Average reward real: 0.531045138835907 Training d_loss: 1.3057 Training g_loss: 1.1361 Training q_loss: 23.8058 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2835 Total reward: 183.0 Average reward fake: 0.5136090517044067 Average reward real: 0.51381915807724 Training d_loss: 1.3872 Training g_loss: 0.6524 Training q_loss: 24.4747 Explore

-------------------------------------------------------------------------------
Episode: 2856 Total reward: 190.0 Average reward fake: 0.519991397857666 Average reward real: 0.5358449220657349 Training d_loss: 1.3752 Training g_loss: 0.7047 Training q_loss: 22.6690 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2857 Total reward: 199.0 Average reward fake: 0.5660731792449951 Average reward real: 0.5706826448440552 Training d_loss: 1.4023 Training g_loss: 0.5690 Training q_loss: 25.4138 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2858 Total reward: 149.0 Average reward fake: 0.46392855048179626 Average reward real: 0.5824946165084839 Training d_loss: 1.2358 Training g_loss: 12.5496 Training q_loss: 9.9673 Exp

-------------------------------------------------------------------------------
Episode: 2879 Total reward: 199.0 Average reward fake: 0.3413757085800171 Average reward real: 0.5319165587425232 Training d_loss: 1.1103 Training g_loss: 1.8443 Training q_loss: 60.0371 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2880 Total reward: 184.0 Average reward fake: 0.5620278120040894 Average reward real: 0.5629509091377258 Training d_loss: 1.4039 Training g_loss: 0.5783 Training q_loss: 25.1241 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2881 Total reward: 199.0 Average reward fake: 0.4593620300292969 Average reward real: 0.5626474618911743 Training d_loss: 1.2487 Training g_loss: 1.0956 Training q_loss: 13.9226 Exp

-------------------------------------------------------------------------------
Episode: 2902 Total reward: 199.0 Average reward fake: 0.4743095338344574 Average reward real: 0.5461486577987671 Training d_loss: 1.2831 Training g_loss: 3.6925 Training q_loss: 58.6255 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2903 Total reward: 199.0 Average reward fake: 0.43574056029319763 Average reward real: 0.4969523549079895 Training d_loss: 1.3100 Training g_loss: 2.1618 Training q_loss: 14.6778 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2904 Total reward: 199.0 Average reward fake: 0.4313758909702301 Average reward real: 0.5811209678649902 Training d_loss: 1.1829 Training g_loss: 10.0327 Training q_loss: 4.6270 Ex

-------------------------------------------------------------------------------
Episode: 2925 Total reward: 199.0 Average reward fake: 0.521040678024292 Average reward real: 0.5213395953178406 Training d_loss: 1.3877 Training g_loss: 0.7054 Training q_loss: 9.7022 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2926 Total reward: 199.0 Average reward fake: 0.2870842516422272 Average reward real: 0.49228420853614807 Training d_loss: 1.0987 Training g_loss: 11.8292 Training q_loss: 41.7086 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2927 Total reward: 199.0 Average reward fake: 0.4184403419494629 Average reward real: 0.5364545583724976 Training d_loss: 1.2212 Training g_loss: 10.6383 Training q_loss: 7.5442 Exp

-------------------------------------------------------------------------------
Episode: 2948 Total reward: 199.0 Average reward fake: 0.46044525504112244 Average reward real: 0.5744910836219788 Training d_loss: 1.2429 Training g_loss: 5.7105 Training q_loss: 12.6733 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2949 Total reward: 199.0 Average reward fake: 0.384552001953125 Average reward real: 0.5441686511039734 Training d_loss: 1.1737 Training g_loss: 15.1992 Training q_loss: 16.5542 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2950 Total reward: 199.0 Average reward fake: 0.4302882254123688 Average reward real: 0.5534113049507141 Training d_loss: 1.2119 Training g_loss: 16.3419 Training q_loss: 11.0697 E

-------------------------------------------------------------------------------
Episode: 2971 Total reward: 199.0 Average reward fake: 0.4399283826351166 Average reward real: 0.5721888542175293 Training d_loss: 1.2024 Training g_loss: 20.0489 Training q_loss: 15.5587 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2972 Total reward: 199.0 Average reward fake: 0.4449027478694916 Average reward real: 0.5702125430107117 Training d_loss: 1.2140 Training g_loss: 21.5165 Training q_loss: 6.5810 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2973 Total reward: 199.0 Average reward fake: 0.4166622757911682 Average reward real: 0.5699241161346436 Training d_loss: 1.1621 Training g_loss: 3.4993 Training q_loss: 19.4840 Ex

-------------------------------------------------------------------------------
Episode: 2994 Total reward: 71.0 Average reward fake: 0.46298885345458984 Average reward real: 0.5169153213500977 Training d_loss: 1.3102 Training g_loss: 11.5228 Training q_loss: 35.1361 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2995 Total reward: 199.0 Average reward fake: 0.36948129534721375 Average reward real: 0.5355855822563171 Training d_loss: 1.1503 Training g_loss: 23.2269 Training q_loss: 189.7744 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 2996 Total reward: 154.0 Average reward fake: 0.48277974128723145 Average reward real: 0.5641894340515137 Training d_loss: 1.2721 Training g_loss: 1.3249 Training q_loss: 193.61

-------------------------------------------------------------------------------
Episode: 3017 Total reward: 19.0 Average reward fake: 0.32164639234542847 Average reward real: 0.5885474681854248 Training d_loss: 1.0731 Training g_loss: 1.1677 Training q_loss: 8028.1826 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3018 Total reward: 14.0 Average reward fake: 0.34173935651779175 Average reward real: 0.4710676670074463 Training d_loss: 1.2795 Training g_loss: 1.0387 Training q_loss: 9872.2461 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3019 Total reward: 13.0 Average reward fake: 0.36183589696884155 Average reward real: 0.6755236387252808 Training d_loss: 0.9629 Training g_loss: 1.0160 Training q_loss: 14164.9

-------------------------------------------------------------------------------
Episode: 3041 Total reward: 14.0 Average reward fake: 0.3366391360759735 Average reward real: 0.6957966089248657 Training d_loss: 0.8861 Training g_loss: 1.1003 Training q_loss: 15054.1592 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3042 Total reward: 17.0 Average reward fake: 0.3195144534111023 Average reward real: 0.5714604258537292 Training d_loss: 1.0903 Training g_loss: 1.1516 Training q_loss: 5993.1421 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3043 Total reward: 17.0 Average reward fake: 0.34296727180480957 Average reward real: 0.5213740468025208 Training d_loss: 1.1884 Training g_loss: 1.0544 Training q_loss: 5136.337

-------------------------------------------------------------------------------
Episode: 3064 Total reward: 21.0 Average reward fake: 0.45306673645973206 Average reward real: 0.5269543528556824 Training d_loss: 1.2978 Training g_loss: 0.8474 Training q_loss: 1037.5817 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3065 Total reward: 28.0 Average reward fake: 0.5201354622840881 Average reward real: 0.5201355218887329 Training d_loss: 1.4191 Training g_loss: 0.6749 Training q_loss: 1012.7416 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3066 Total reward: 40.0 Average reward fake: 0.3515518307685852 Average reward real: 0.5218005180358887 Training d_loss: 1.1427 Training g_loss: 26.1974 Training q_loss: 2344.531

-------------------------------------------------------------------------------
Episode: 3087 Total reward: 151.0 Average reward fake: 0.527511477470398 Average reward real: 0.5542024374008179 Training d_loss: 1.3544 Training g_loss: 0.6388 Training q_loss: 1179.8817 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3088 Total reward: 132.0 Average reward fake: 0.4910784363746643 Average reward real: 0.48839688301086426 Training d_loss: 1.3928 Training g_loss: 0.7172 Training q_loss: 2740.6511 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3089 Total reward: 199.0 Average reward fake: 0.46465787291526794 Average reward real: 0.522588849067688 Training d_loss: 1.3031 Training g_loss: 9.0535 Training q_loss: 3014.49

-------------------------------------------------------------------------------
Episode: 3110 Total reward: 199.0 Average reward fake: 0.48724764585494995 Average reward real: 0.5246716737747192 Training d_loss: 1.3329 Training g_loss: 0.7343 Training q_loss: 1643.6152 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3111 Total reward: 158.0 Average reward fake: 0.4851958155632019 Average reward real: 0.5198181867599487 Training d_loss: 1.3289 Training g_loss: 0.7434 Training q_loss: 403.5362 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3112 Total reward: 199.0 Average reward fake: 0.3575788140296936 Average reward real: 0.49473077058792114 Training d_loss: 1.2106 Training g_loss: 5.4770 Training q_loss: 940.52

-------------------------------------------------------------------------------
Episode: 3133 Total reward: 199.0 Average reward fake: 0.45833906531333923 Average reward real: 0.539811909198761 Training d_loss: 1.2958 Training g_loss: 8.9264 Training q_loss: 212.2897 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3134 Total reward: 199.0 Average reward fake: 0.41910067200660706 Average reward real: 0.6036694645881653 Training d_loss: 1.1588 Training g_loss: 2.0628 Training q_loss: 268.3026 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3135 Total reward: 199.0 Average reward fake: 0.35155290365219116 Average reward real: 0.5812610387802124 Training d_loss: 1.0481 Training g_loss: 10.1026 Training q_loss: 223.87

-------------------------------------------------------------------------------
Episode: 3156 Total reward: 199.0 Average reward fake: 0.48446089029312134 Average reward real: 0.5907869338989258 Training d_loss: 1.3098 Training g_loss: 8.7532 Training q_loss: 496.5477 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3157 Total reward: 199.0 Average reward fake: 0.37266743183135986 Average reward real: 0.5051614046096802 Training d_loss: 1.3046 Training g_loss: 5.9026 Training q_loss: 113.3928 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3158 Total reward: 199.0 Average reward fake: 0.4591611921787262 Average reward real: 0.48411646485328674 Training d_loss: 1.3608 Training g_loss: 0.8063 Training q_loss: 85.664

-------------------------------------------------------------------------------
Episode: 3179 Total reward: 199.0 Average reward fake: 0.32936084270477295 Average reward real: 0.5108615159988403 Training d_loss: 1.1257 Training g_loss: 17.8590 Training q_loss: 41.7807 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3180 Total reward: 199.0 Average reward fake: 0.4483146667480469 Average reward real: 0.5952070355415344 Training d_loss: 1.1734 Training g_loss: 1.2005 Training q_loss: 81.9763 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3181 Total reward: 175.0 Average reward fake: 0.4091005325317383 Average reward real: 0.5055252909660339 Training d_loss: 1.2974 Training g_loss: 1.3791 Training q_loss: 66.4183 E

-------------------------------------------------------------------------------
Episode: 3202 Total reward: 199.0 Average reward fake: 0.3080597221851349 Average reward real: 0.4950772225856781 Training d_loss: 1.1190 Training g_loss: 25.7914 Training q_loss: 31.4226 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3203 Total reward: 199.0 Average reward fake: 0.34280505776405334 Average reward real: 0.5713182687759399 Training d_loss: 1.0703 Training g_loss: 16.1413 Training q_loss: 91.1536 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3204 Total reward: 199.0 Average reward fake: 0.34364861249923706 Average reward real: 0.5452484488487244 Training d_loss: 1.1254 Training g_loss: 26.1877 Training q_loss: 40.686

-------------------------------------------------------------------------------
Episode: 3225 Total reward: 105.0 Average reward fake: 0.458055317401886 Average reward real: 0.5216398239135742 Training d_loss: 1.3219 Training g_loss: 0.8001 Training q_loss: 74.8494 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3226 Total reward: 107.0 Average reward fake: 0.5558406710624695 Average reward real: 0.6362358927726746 Training d_loss: 1.3353 Training g_loss: 0.6185 Training q_loss: 157.7955 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3227 Total reward: 102.0 Average reward fake: 0.354805052280426 Average reward real: 0.4539501667022705 Training d_loss: 1.2686 Training g_loss: 16.6657 Training q_loss: 51.2870 Exp

-------------------------------------------------------------------------------
Episode: 3248 Total reward: 34.0 Average reward fake: 0.3982706069946289 Average reward real: 0.520139753818512 Training d_loss: 1.2733 Training g_loss: 1.0299 Training q_loss: 41.0963 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3249 Total reward: 48.0 Average reward fake: 0.3396853804588318 Average reward real: 0.5281142592430115 Training d_loss: 1.1111 Training g_loss: 11.1793 Training q_loss: 86.4272 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3250 Total reward: 16.0 Average reward fake: 0.30458325147628784 Average reward real: 0.5416210889816284 Training d_loss: 1.0512 Training g_loss: 8.4707 Training q_loss: 139.1049 Expl

-------------------------------------------------------------------------------
Episode: 3271 Total reward: 40.0 Average reward fake: 0.3790854811668396 Average reward real: 0.5467556715011597 Training d_loss: 1.1339 Training g_loss: 1.1904 Training q_loss: 8111.0117 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3272 Total reward: 38.0 Average reward fake: 0.44678130745887756 Average reward real: 0.5810778141021729 Training d_loss: 1.2346 Training g_loss: 7.5903 Training q_loss: 26721.0000 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3273 Total reward: 45.0 Average reward fake: 0.33643588423728943 Average reward real: 0.6847032308578491 Training d_loss: 0.8729 Training g_loss: 6.6704 Training q_loss: 250.221

-------------------------------------------------------------------------------
Episode: 3294 Total reward: 162.0 Average reward fake: 0.45249873399734497 Average reward real: 0.4902917742729187 Training d_loss: 1.3411 Training g_loss: 0.9740 Training q_loss: 489.3270 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3295 Total reward: 170.0 Average reward fake: 0.48508110642433167 Average reward real: 0.5195678472518921 Training d_loss: 1.3327 Training g_loss: 0.7768 Training q_loss: 231.7520 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3296 Total reward: 199.0 Average reward fake: 0.540342390537262 Average reward real: 0.517086386680603 Training d_loss: 1.4432 Training g_loss: 0.6145 Training q_loss: 199.4764 

-------------------------------------------------------------------------------
Episode: 3317 Total reward: 163.0 Average reward fake: 0.4707600474357605 Average reward real: 0.5520325899124146 Training d_loss: 1.2831 Training g_loss: 7.7216 Training q_loss: 972.0829 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3318 Total reward: 151.0 Average reward fake: 0.5003244876861572 Average reward real: 0.5619572401046753 Training d_loss: 1.3166 Training g_loss: 3.2850 Training q_loss: 256.5147 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3319 Total reward: 155.0 Average reward fake: 0.437995582818985 Average reward real: 0.49862080812454224 Training d_loss: 1.2996 Training g_loss: 8.0379 Training q_loss: 189.3888 

-------------------------------------------------------------------------------
Episode: 3340 Total reward: 136.0 Average reward fake: 0.41004300117492676 Average reward real: 0.5076457858085632 Training d_loss: 1.2434 Training g_loss: 8.6757 Training q_loss: 60.3036 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3341 Total reward: 151.0 Average reward fake: 0.5428556799888611 Average reward real: 0.5576754808425903 Training d_loss: 1.3698 Training g_loss: 0.6011 Training q_loss: 103.7217 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3342 Total reward: 161.0 Average reward fake: 0.369511216878891 Average reward real: 0.5809370279312134 Training d_loss: 1.0824 Training g_loss: 10.7659 Training q_loss: 91.8743 E

-------------------------------------------------------------------------------
Episode: 3363 Total reward: 164.0 Average reward fake: 0.3963385224342346 Average reward real: 0.5955003499984741 Training d_loss: 1.1222 Training g_loss: 11.8675 Training q_loss: 18.1940 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3364 Total reward: 165.0 Average reward fake: 0.4129944443702698 Average reward real: 0.592459499835968 Training d_loss: 1.1645 Training g_loss: 8.5934 Training q_loss: 46.5023 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3365 Total reward: 199.0 Average reward fake: 0.5726445913314819 Average reward real: 0.5792611837387085 Training d_loss: 1.3973 Training g_loss: 0.5554 Training q_loss: 70.8052 Exp

-------------------------------------------------------------------------------
Episode: 3386 Total reward: 129.0 Average reward fake: 0.37242376804351807 Average reward real: 0.5282412767410278 Training d_loss: 1.1856 Training g_loss: 21.3232 Training q_loss: 83.5603 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3387 Total reward: 111.0 Average reward fake: 0.40596145391464233 Average reward real: 0.5647903680801392 Training d_loss: 1.1666 Training g_loss: 15.0008 Training q_loss: 30.6493 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3388 Total reward: 106.0 Average reward fake: 0.547775149345398 Average reward real: 0.5599795579910278 Training d_loss: 1.3889 Training g_loss: 0.6473 Training q_loss: 201.8792

-------------------------------------------------------------------------------
Episode: 3409 Total reward: 199.0 Average reward fake: 0.3960488438606262 Average reward real: 0.5155889391899109 Training d_loss: 1.2144 Training g_loss: 10.2032 Training q_loss: 163.6859 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3410 Total reward: 199.0 Average reward fake: 0.4486088752746582 Average reward real: 0.5649572610855103 Training d_loss: 1.2326 Training g_loss: 8.5582 Training q_loss: 78.2744 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3411 Total reward: 199.0 Average reward fake: 0.5480791926383972 Average reward real: 0.5426853895187378 Training d_loss: 1.4058 Training g_loss: 0.5969 Training q_loss: 115.4151 

-------------------------------------------------------------------------------
Episode: 3432 Total reward: 110.0 Average reward fake: 0.47345924377441406 Average reward real: 0.5318964719772339 Training d_loss: 1.3148 Training g_loss: 0.8483 Training q_loss: 72.6526 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3433 Total reward: 121.0 Average reward fake: 0.36344021558761597 Average reward real: 0.578161358833313 Training d_loss: 1.0670 Training g_loss: 21.3189 Training q_loss: 156.0114 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3434 Total reward: 124.0 Average reward fake: 0.30192479491233826 Average reward real: 0.5352510213851929 Training d_loss: 1.0550 Training g_loss: 25.8340 Training q_loss: 117.10

-------------------------------------------------------------------------------
Episode: 3455 Total reward: 199.0 Average reward fake: 0.3026182949542999 Average reward real: 0.5943247079849243 Training d_loss: 0.9798 Training g_loss: 5.0059 Training q_loss: 89.3323 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3456 Total reward: 188.0 Average reward fake: 0.47914284467697144 Average reward real: 0.5970943570137024 Training d_loss: 1.2909 Training g_loss: 9.1958 Training q_loss: 74.0950 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3457 Total reward: 199.0 Average reward fake: 0.4134082794189453 Average reward real: 0.5024830102920532 Training d_loss: 1.2537 Training g_loss: 1.0113 Training q_loss: 391.4105 E

-------------------------------------------------------------------------------
Episode: 3478 Total reward: 50.0 Average reward fake: 0.34804949164390564 Average reward real: 0.6103118658065796 Training d_loss: 1.0100 Training g_loss: 17.4171 Training q_loss: 56.9421 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3479 Total reward: 162.0 Average reward fake: 0.45359963178634644 Average reward real: 0.5676653981208801 Training d_loss: 1.2470 Training g_loss: 16.2969 Training q_loss: 51.2048 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3480 Total reward: 128.0 Average reward fake: 0.31518182158470154 Average reward real: 0.5174539685249329 Training d_loss: 1.0998 Training g_loss: 17.0313 Training q_loss: 72.441

-------------------------------------------------------------------------------
Episode: 3501 Total reward: 139.0 Average reward fake: 0.5540553331375122 Average reward real: 0.5801551938056946 Training d_loss: 1.3665 Training g_loss: 0.6045 Training q_loss: 1011.5551 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3502 Total reward: 55.0 Average reward fake: 0.4578213095664978 Average reward real: 0.5699364542961121 Training d_loss: 1.2346 Training g_loss: 1.7377 Training q_loss: 5342.0771 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3503 Total reward: 38.0 Average reward fake: 0.5171371102333069 Average reward real: 0.6071559190750122 Training d_loss: 1.2697 Training g_loss: 0.6662 Training q_loss: 255.8258 

-------------------------------------------------------------------------------
Episode: 3524 Total reward: 199.0 Average reward fake: 0.4411400854587555 Average reward real: 0.5283923149108887 Training d_loss: 1.3069 Training g_loss: 5.9349 Training q_loss: 2606.6392 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3525 Total reward: 199.0 Average reward fake: 0.43713030219078064 Average reward real: 0.5518863201141357 Training d_loss: 1.2602 Training g_loss: 0.8457 Training q_loss: 590.5063 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3526 Total reward: 199.0 Average reward fake: 0.41908520460128784 Average reward real: 0.6404017210006714 Training d_loss: 1.1063 Training g_loss: 0.9002 Training q_loss: 1727.0

-------------------------------------------------------------------------------
Episode: 3547 Total reward: 199.0 Average reward fake: 0.49597182869911194 Average reward real: 0.5010894536972046 Training d_loss: 1.3774 Training g_loss: 0.7016 Training q_loss: 6109.8013 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3548 Total reward: 199.0 Average reward fake: 0.48039817810058594 Average reward real: 0.4975529611110687 Training d_loss: 1.3595 Training g_loss: 0.7422 Training q_loss: 2427.7539 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3549 Total reward: 111.0 Average reward fake: 0.5034735798835754 Average reward real: 0.5400208234786987 Training d_loss: 1.4469 Training g_loss: 0.7540 Training q_loss: 40351

-------------------------------------------------------------------------------
Episode: 3570 Total reward: 199.0 Average reward fake: 0.38508933782577515 Average reward real: 0.6670472621917725 Training d_loss: 0.9620 Training g_loss: 0.9659 Training q_loss: 3609.0293 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3571 Total reward: 20.0 Average reward fake: 0.44175633788108826 Average reward real: 0.7746026515960693 Training d_loss: 1.0733 Training g_loss: 0.9031 Training q_loss: 1976.8336 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3572 Total reward: 147.0 Average reward fake: 0.35898444056510925 Average reward real: 0.4996749460697174 Training d_loss: 1.2185 Training g_loss: 1.0223 Training q_loss: 581.5

-------------------------------------------------------------------------------
Episode: 3594 Total reward: 15.0 Average reward fake: 0.34764862060546875 Average reward real: 0.6216732263565063 Training d_loss: 1.0105 Training g_loss: 1.0522 Training q_loss: 1759.7816 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3595 Total reward: 17.0 Average reward fake: 0.3980053961277008 Average reward real: 0.5951972007751465 Training d_loss: 1.3180 Training g_loss: 8.1008 Training q_loss: 992295.3125 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3596 Total reward: 18.0 Average reward fake: 0.3621406853199005 Average reward real: 0.6411768794059753 Training d_loss: 0.9698 Training g_loss: 1.0142 Training q_loss: 2946.51

-------------------------------------------------------------------------------
Episode: 3617 Total reward: 199.0 Average reward fake: 0.5135592222213745 Average reward real: 0.5961966514587402 Training d_loss: 1.3077 Training g_loss: 0.7029 Training q_loss: 1488.6869 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3618 Total reward: 199.0 Average reward fake: 0.4779898524284363 Average reward real: 0.5166733264923096 Training d_loss: 1.3440 Training g_loss: 0.7534 Training q_loss: 1698.2556 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3619 Total reward: 199.0 Average reward fake: 0.46828046441078186 Average reward real: 0.5091889500617981 Training d_loss: 1.3413 Training g_loss: 7.5457 Training q_loss: 2065.1

-------------------------------------------------------------------------------
Episode: 3640 Total reward: 199.0 Average reward fake: 0.4497576653957367 Average reward real: 0.5084553956985474 Training d_loss: 1.3148 Training g_loss: 5.8039 Training q_loss: 1016.9861 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3641 Total reward: 199.0 Average reward fake: 0.4666743278503418 Average reward real: 0.49437135457992554 Training d_loss: 1.4036 Training g_loss: 0.7883 Training q_loss: 2356.1414 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3642 Total reward: 199.0 Average reward fake: 0.4699557423591614 Average reward real: 0.5736554861068726 Training d_loss: 1.2558 Training g_loss: 7.4271 Training q_loss: 808.60

-------------------------------------------------------------------------------
Episode: 3663 Total reward: 199.0 Average reward fake: 0.3296256959438324 Average reward real: 0.5142412781715393 Training d_loss: 1.1356 Training g_loss: 7.4233 Training q_loss: 1000.4811 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3664 Total reward: 164.0 Average reward fake: 0.2950749099254608 Average reward real: 0.7270950078964233 Training d_loss: 0.7326 Training g_loss: 5.7950 Training q_loss: 316.3749 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3665 Total reward: 170.0 Average reward fake: 0.36302101612091064 Average reward real: 0.4559924006462097 Training d_loss: 1.2961 Training g_loss: 0.9993 Training q_loss: 548.361

-------------------------------------------------------------------------------
Episode: 3686 Total reward: 199.0 Average reward fake: 0.39169153571128845 Average reward real: 0.433053582906723 Training d_loss: 1.3654 Training g_loss: 5.5326 Training q_loss: 113.9351 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3687 Total reward: 199.0 Average reward fake: 0.4597128927707672 Average reward real: 0.48021167516708374 Training d_loss: 1.3576 Training g_loss: 0.8581 Training q_loss: 341.9144 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3688 Total reward: 199.0 Average reward fake: 0.4507690370082855 Average reward real: 0.5524309873580933 Training d_loss: 1.2522 Training g_loss: 3.1928 Training q_loss: 359.9606

-------------------------------------------------------------------------------
Episode: 3709 Total reward: 186.0 Average reward fake: 0.33460140228271484 Average reward real: 0.6906417608261108 Training d_loss: 0.8901 Training g_loss: 1.1026 Training q_loss: 1598.0178 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3710 Total reward: 110.0 Average reward fake: 0.38337746262550354 Average reward real: 0.9746180772781372 Training d_loss: 0.7371 Training g_loss: 1.0478 Training q_loss: 3653.6851 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3711 Total reward: 25.0 Average reward fake: 0.36909082531929016 Average reward real: 0.6725603342056274 Training d_loss: 0.9711 Training g_loss: 0.9992 Training q_loss: 10537

-------------------------------------------------------------------------------
Episode: 3732 Total reward: 15.0 Average reward fake: 0.3538247048854828 Average reward real: 0.7099413871765137 Training d_loss: 0.8847 Training g_loss: 1.0335 Training q_loss: 24617.1914 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3733 Total reward: 14.0 Average reward fake: 0.4289821684360504 Average reward real: 0.6603654623031616 Training d_loss: 1.2377 Training g_loss: 0.8954 Training q_loss: 22930.2441 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3734 Total reward: 11.0 Average reward fake: 0.3604748249053955 Average reward real: 0.6531049609184265 Training d_loss: 0.9851 Training g_loss: 1.0305 Training q_loss: 45190.78

-------------------------------------------------------------------------------
Episode: 3755 Total reward: 177.0 Average reward fake: 0.4818182587623596 Average reward real: 0.6330119371414185 Training d_loss: 1.3598 Training g_loss: 0.7890 Training q_loss: 1364194.8750 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3756 Total reward: 199.0 Average reward fake: 0.3765254020690918 Average reward real: 0.7268157005310059 Training d_loss: 0.8574 Training g_loss: 0.9810 Training q_loss: 14420.7129 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3757 Total reward: 199.0 Average reward fake: 0.26455894112586975 Average reward real: 0.7286756634712219 Training d_loss: 0.7585 Training g_loss: 1.5314 Training q_loss: 15

-------------------------------------------------------------------------------
Episode: 3778 Total reward: 23.0 Average reward fake: 0.4235352575778961 Average reward real: 0.607377290725708 Training d_loss: 1.3933 Training g_loss: 0.9105 Training q_loss: 8322.6299 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3779 Total reward: 20.0 Average reward fake: 0.33869868516921997 Average reward real: 0.5937337875366211 Training d_loss: 1.0727 Training g_loss: 1.0835 Training q_loss: 8346.7129 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3780 Total reward: 22.0 Average reward fake: 0.3514567017555237 Average reward real: 0.5379199385643005 Training d_loss: 1.1732 Training g_loss: 1.0399 Training q_loss: 9802.2969 

-------------------------------------------------------------------------------
Episode: 3801 Total reward: 179.0 Average reward fake: 0.39982250332832336 Average reward real: 0.5265005826950073 Training d_loss: 1.2401 Training g_loss: 4.8005 Training q_loss: 1830.2340 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3802 Total reward: 197.0 Average reward fake: 0.4096696972846985 Average reward real: 0.47350770235061646 Training d_loss: 1.3354 Training g_loss: 4.6423 Training q_loss: 11808.0049 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3803 Total reward: 189.0 Average reward fake: 0.40910395979881287 Average reward real: 0.5301786661148071 Training d_loss: 1.2382 Training g_loss: 2.2272 Training q_loss: 504

-------------------------------------------------------------------------------
Episode: 3824 Total reward: 184.0 Average reward fake: 0.462186723947525 Average reward real: 0.5581339597702026 Training d_loss: 1.2501 Training g_loss: 0.8014 Training q_loss: 117927.9141 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3825 Total reward: 166.0 Average reward fake: 0.5118618607521057 Average reward real: 0.5377101302146912 Training d_loss: 1.3799 Training g_loss: 0.7018 Training q_loss: 1939.1801 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3826 Total reward: 164.0 Average reward fake: 0.4417521357536316 Average reward real: 0.5598143339157104 Training d_loss: 1.2459 Training g_loss: 6.0921 Training q_loss: 3801.2

-------------------------------------------------------------------------------
Episode: 3847 Total reward: 199.0 Average reward fake: 0.45506539940834045 Average reward real: 0.5383607149124146 Training d_loss: 1.3699 Training g_loss: 0.8145 Training q_loss: 374.0470 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3848 Total reward: 199.0 Average reward fake: 0.36750251054763794 Average reward real: 0.6892805695533752 Training d_loss: 1.0296 Training g_loss: 5.5388 Training q_loss: 1075.6595 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3849 Total reward: 186.0 Average reward fake: 0.47888249158859253 Average reward real: 0.595919132232666 Training d_loss: 1.2920 Training g_loss: 0.7702 Training q_loss: 1756.8

-------------------------------------------------------------------------------
Episode: 3870 Total reward: 199.0 Average reward fake: 0.500711977481842 Average reward real: 0.5544058680534363 Training d_loss: 1.3231 Training g_loss: 1.7160 Training q_loss: 323.3403 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3871 Total reward: 199.0 Average reward fake: 0.5335690379142761 Average reward real: 0.5013465881347656 Training d_loss: 1.4712 Training g_loss: 0.6400 Training q_loss: 494.5707 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3872 Total reward: 199.0 Average reward fake: 0.36619651317596436 Average reward real: 0.5210224986076355 Training d_loss: 1.1705 Training g_loss: 14.0765 Training q_loss: 448.1181

-------------------------------------------------------------------------------
Episode: 3893 Total reward: 199.0 Average reward fake: 0.47165733575820923 Average reward real: 0.5090185403823853 Training d_loss: 1.3268 Training g_loss: 0.8410 Training q_loss: 182.5462 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3894 Total reward: 199.0 Average reward fake: 0.3190733790397644 Average reward real: 0.5330835580825806 Training d_loss: 1.0775 Training g_loss: 9.3665 Training q_loss: 57.0898 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3895 Total reward: 199.0 Average reward fake: 0.44225797057151794 Average reward real: 0.5663819313049316 Training d_loss: 1.2288 Training g_loss: 6.7344 Training q_loss: 130.8347

-------------------------------------------------------------------------------
Episode: 3916 Total reward: 199.0 Average reward fake: 0.4707411825656891 Average reward real: 0.6304404735565186 Training d_loss: 1.2825 Training g_loss: 6.2983 Training q_loss: 498.3934 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3917 Total reward: 199.0 Average reward fake: 0.4088849127292633 Average reward real: 0.44926223158836365 Training d_loss: 1.4120 Training g_loss: 0.9203 Training q_loss: 390.9892 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3918 Total reward: 199.0 Average reward fake: 0.44414201378822327 Average reward real: 0.5361766815185547 Training d_loss: 1.3912 Training g_loss: 0.8719 Training q_loss: 606.467

-------------------------------------------------------------------------------
Episode: 3940 Total reward: 13.0 Average reward fake: 0.3732548654079437 Average reward real: 0.7383536100387573 Training d_loss: 0.8725 Training g_loss: 0.9804 Training q_loss: 20089.0469 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3941 Total reward: 12.0 Average reward fake: 0.37654682993888855 Average reward real: 0.739587664604187 Training d_loss: 0.8743 Training g_loss: 0.9856 Training q_loss: 26793.4844 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3942 Total reward: 15.0 Average reward fake: 0.33709239959716797 Average reward real: 0.5946940183639526 Training d_loss: 1.0712 Training g_loss: 1.0939 Training q_loss: 32061.6

-------------------------------------------------------------------------------
Episode: 3963 Total reward: 17.0 Average reward fake: 0.31899479031562805 Average reward real: 0.6478941440582275 Training d_loss: 0.9672 Training g_loss: 1.1533 Training q_loss: 8755.0195 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3964 Total reward: 13.0 Average reward fake: 0.33335280418395996 Average reward real: 0.587180495262146 Training d_loss: 1.0777 Training g_loss: 1.0744 Training q_loss: 5460.7197 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3965 Total reward: 18.0 Average reward fake: 0.3570480942726135 Average reward real: 0.6601237058639526 Training d_loss: 0.9754 Training g_loss: 1.0250 Training q_loss: 2512.5730

-------------------------------------------------------------------------------
Episode: 3987 Total reward: 9.0 Average reward fake: 0.3173636794090271 Average reward real: 0.6518607139587402 Training d_loss: 0.9625 Training g_loss: 1.1460 Training q_loss: 16949.1836 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3988 Total reward: 11.0 Average reward fake: 0.31604406237602234 Average reward real: 0.7861936688423157 Training d_loss: 0.7341 Training g_loss: 1.1566 Training q_loss: 4004.7839 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 3989 Total reward: 12.0 Average reward fake: 0.3346428871154785 Average reward real: 0.7265472412109375 Training d_loss: 0.8527 Training g_loss: 1.0891 Training q_loss: 6331.1514

-------------------------------------------------------------------------------
Episode: 4010 Total reward: 38.0 Average reward fake: 0.4538545608520508 Average reward real: 0.5680798292160034 Training d_loss: 1.5006 Training g_loss: 0.8839 Training q_loss: 5106.4917 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4011 Total reward: 58.0 Average reward fake: 0.3668063282966614 Average reward real: 0.6502050757408142 Training d_loss: 0.9633 Training g_loss: 0.9906 Training q_loss: 4540254.0000 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4012 Total reward: 36.0 Average reward fake: 0.448777437210083 Average reward real: 0.6819452047348022 Training d_loss: 1.1441 Training g_loss: 0.8714 Training q_loss: 3602322.

-------------------------------------------------------------------------------
Episode: 4033 Total reward: 199.0 Average reward fake: 0.4710209369659424 Average reward real: 0.533854067325592 Training d_loss: 1.2991 Training g_loss: 4.9861 Training q_loss: 614.0986 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4034 Total reward: 199.0 Average reward fake: 0.44051307439804077 Average reward real: 0.4635864794254303 Training d_loss: 1.4022 Training g_loss: 3.0741 Training q_loss: 3665.6504 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4035 Total reward: 199.0 Average reward fake: 0.3922264277935028 Average reward real: 0.5475481152534485 Training d_loss: 1.2191 Training g_loss: 1.0540 Training q_loss: 3038.403

-------------------------------------------------------------------------------
Episode: 4056 Total reward: 199.0 Average reward fake: 0.45530620217323303 Average reward real: 0.4883336126804352 Training d_loss: 1.3404 Training g_loss: 0.7906 Training q_loss: 6652.8291 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4057 Total reward: 199.0 Average reward fake: 0.4769977629184723 Average reward real: 0.48995786905288696 Training d_loss: 1.3650 Training g_loss: 0.7513 Training q_loss: 9209.3916 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4058 Total reward: 199.0 Average reward fake: 0.4431247115135193 Average reward real: 0.4978675842285156 Training d_loss: 1.3136 Training g_loss: 2.4367 Training q_loss: 1951.

-------------------------------------------------------------------------------
Episode: 4079 Total reward: 138.0 Average reward fake: 0.4035979211330414 Average reward real: 0.5930787920951843 Training d_loss: 1.3109 Training g_loss: 1.1495 Training q_loss: 21206.3574 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4080 Total reward: 123.0 Average reward fake: 0.3612874150276184 Average reward real: 0.6192768216133118 Training d_loss: 1.0228 Training g_loss: 1.0247 Training q_loss: 17381.2012 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4081 Total reward: 95.0 Average reward fake: 0.350294828414917 Average reward real: 0.6368745565414429 Training d_loss: 0.9955 Training g_loss: 1.0369 Training q_loss: 10498.9

-------------------------------------------------------------------------------
Episode: 4102 Total reward: 105.0 Average reward fake: 0.4804512560367584 Average reward real: 0.5099164843559265 Training d_loss: 1.4085 Training g_loss: 0.7662 Training q_loss: 3300.8281 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4103 Total reward: 117.0 Average reward fake: 0.4622787833213806 Average reward real: 0.5611532330513 Training d_loss: 1.2479 Training g_loss: 0.8074 Training q_loss: 3296.1133 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4104 Total reward: 150.0 Average reward fake: 0.47139501571655273 Average reward real: 0.5049999952316284 Training d_loss: 1.3254 Training g_loss: 0.7644 Training q_loss: 3119.4910

-------------------------------------------------------------------------------
Episode: 4125 Total reward: 179.0 Average reward fake: 0.4922337532043457 Average reward real: 0.5042086839675903 Training d_loss: 1.3642 Training g_loss: 0.7205 Training q_loss: 1856.3773 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4126 Total reward: 179.0 Average reward fake: 0.4489389955997467 Average reward real: 0.5025637745857239 Training d_loss: 1.2987 Training g_loss: 1.0299 Training q_loss: 615.3408 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4127 Total reward: 199.0 Average reward fake: 0.526910126209259 Average reward real: 0.60073322057724 Training d_loss: 1.3005 Training g_loss: 0.8481 Training q_loss: 2349.5356 E

-------------------------------------------------------------------------------
Episode: 4149 Total reward: 8.0 Average reward fake: 0.35123491287231445 Average reward real: 0.6044530868530273 Training d_loss: 1.0668 Training g_loss: 1.0482 Training q_loss: 323006.3125 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4150 Total reward: 9.0 Average reward fake: 0.33891990780830383 Average reward real: 0.7278414964675903 Training d_loss: 0.8545 Training g_loss: 1.0868 Training q_loss: 213260.7031 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4151 Total reward: 10.0 Average reward fake: 0.32570332288742065 Average reward real: 0.7904218435287476 Training d_loss: 0.7379 Training g_loss: 1.1232 Training q_loss: 27644

-------------------------------------------------------------------------------
Episode: 4173 Total reward: 12.0 Average reward fake: 0.33609724044799805 Average reward real: 0.7926979064941406 Training d_loss: 0.7449 Training g_loss: 1.0919 Training q_loss: 741395.3750 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4174 Total reward: 9.0 Average reward fake: 0.39808177947998047 Average reward real: 0.7926472425460815 Training d_loss: 1.1610 Training g_loss: 0.9918 Training q_loss: 26445830.0000 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4175 Total reward: 12.0 Average reward fake: 0.39930447936058044 Average reward real: 0.856340229511261 Training d_loss: 1.0294 Training g_loss: 0.9938 Training q_loss: 252

-------------------------------------------------------------------------------
Episode: 4197 Total reward: 11.0 Average reward fake: 0.3322863280773163 Average reward real: 0.5659282803535461 Training d_loss: 1.0999 Training g_loss: 1.1004 Training q_loss: 130598.9375 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4198 Total reward: 16.0 Average reward fake: 0.41257160902023315 Average reward real: 0.6378114223480225 Training d_loss: 1.2063 Training g_loss: 0.9383 Training q_loss: 87994.4219 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4199 Total reward: 18.0 Average reward fake: 0.35398635268211365 Average reward real: 0.4613134264945984 Training d_loss: 1.2909 Training g_loss: 1.0318 Training q_loss: 11129

-------------------------------------------------------------------------------
Episode: 4220 Total reward: 9.0 Average reward fake: 0.29613083600997925 Average reward real: 0.6901131868362427 Training d_loss: 0.8670 Training g_loss: 1.2197 Training q_loss: 36810.9453 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4221 Total reward: 9.0 Average reward fake: 0.26079174876213074 Average reward real: 0.6899118423461914 Training d_loss: 0.8300 Training g_loss: 4.2029 Training q_loss: 206365.3750 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4222 Total reward: 10.0 Average reward fake: 0.45075124502182007 Average reward real: 0.7697669267654419 Training d_loss: 1.3286 Training g_loss: 0.9031 Training q_loss: 60179.

-------------------------------------------------------------------------------
Episode: 4244 Total reward: 9.0 Average reward fake: 0.31216564774513245 Average reward real: 0.7881997227668762 Training d_loss: 0.7289 Training g_loss: 1.1577 Training q_loss: 34160.0117 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4245 Total reward: 12.0 Average reward fake: 0.43571799993515015 Average reward real: 0.6824402213096619 Training d_loss: 1.3872 Training g_loss: 0.8751 Training q_loss: 75472.9453 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4246 Total reward: 10.0 Average reward fake: 0.3784940242767334 Average reward real: 0.6218059062957764 Training d_loss: 1.0638 Training g_loss: 0.9743 Training q_loss: 96500.6

-------------------------------------------------------------------------------
Episode: 4268 Total reward: 10.0 Average reward fake: 0.39369234442710876 Average reward real: 0.6309412717819214 Training d_loss: 1.2141 Training g_loss: 0.9901 Training q_loss: 91870.0156 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4269 Total reward: 11.0 Average reward fake: 0.38402098417282104 Average reward real: 0.5652274489402771 Training d_loss: 1.3215 Training g_loss: 1.0227 Training q_loss: 117735.3984 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4270 Total reward: 11.0 Average reward fake: 0.38234442472457886 Average reward real: 0.5028898119926453 Training d_loss: 1.4251 Training g_loss: 1.0187 Training q_loss: 1477

-------------------------------------------------------------------------------
Episode: 4292 Total reward: 8.0 Average reward fake: 0.340449720621109 Average reward real: 0.7844696640968323 Training d_loss: 0.7573 Training g_loss: 1.0885 Training q_loss: 64861.8516 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4293 Total reward: 12.0 Average reward fake: 0.32868340611457825 Average reward real: 0.523708701133728 Training d_loss: 1.1838 Training g_loss: 1.1135 Training q_loss: 40765.6172 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4294 Total reward: 11.0 Average reward fake: 0.31399914622306824 Average reward real: 0.5808337330818176 Training d_loss: 1.0795 Training g_loss: 1.1583 Training q_loss: 298223.15

-------------------------------------------------------------------------------
Episode: 4315 Total reward: 199.0 Average reward fake: 0.5002095103263855 Average reward real: 0.5038067102432251 Training d_loss: 1.3803 Training g_loss: 0.6993 Training q_loss: 13208.2295 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4316 Total reward: 122.0 Average reward fake: 0.45771536231040955 Average reward real: 0.49935927987098694 Training d_loss: 1.3603 Training g_loss: 3.4081 Training q_loss: 231496.0312 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4317 Total reward: 82.0 Average reward fake: 0.44564110040664673 Average reward real: 0.5167151093482971 Training d_loss: 1.2709 Training g_loss: 0.8148 Training q_loss: 71

-------------------------------------------------------------------------------
Episode: 4338 Total reward: 199.0 Average reward fake: 0.4940613806247711 Average reward real: 0.5043888688087463 Training d_loss: 1.3676 Training g_loss: 0.7057 Training q_loss: 235870.3438 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4339 Total reward: 199.0 Average reward fake: 0.5177183151245117 Average reward real: 0.5177183151245117 Training d_loss: 1.3970 Training g_loss: 0.6625 Training q_loss: 171035.0938 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4340 Total reward: 199.0 Average reward fake: 0.408914715051651 Average reward real: 0.523656964302063 Training d_loss: 1.2048 Training g_loss: 3.7699 Training q_loss: 14336

-------------------------------------------------------------------------------
Episode: 4361 Total reward: 199.0 Average reward fake: 0.464851438999176 Average reward real: 0.48736056685447693 Training d_loss: 1.3547 Training g_loss: 0.7805 Training q_loss: 48933.3438 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4362 Total reward: 199.0 Average reward fake: 0.5465366244316101 Average reward real: 0.5717138051986694 Training d_loss: 1.4092 Training g_loss: 0.6343 Training q_loss: 17635.9609 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4363 Total reward: 199.0 Average reward fake: 0.5075079202651978 Average reward real: 0.5006879568099976 Training d_loss: 1.4046 Training g_loss: 0.6809 Training q_loss: 8299.

-------------------------------------------------------------------------------
Episode: 4384 Total reward: 148.0 Average reward fake: 0.43062537908554077 Average reward real: 0.565296471118927 Training d_loss: 1.2210 Training g_loss: 0.8824 Training q_loss: 20786.2910 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4385 Total reward: 151.0 Average reward fake: 0.4248853325843811 Average reward real: 0.49826326966285706 Training d_loss: 1.2885 Training g_loss: 0.8667 Training q_loss: 17898.9824 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4386 Total reward: 137.0 Average reward fake: 0.40308713912963867 Average reward real: 0.5671712160110474 Training d_loss: 1.1488 Training g_loss: 0.9307 Training q_loss: 763

-------------------------------------------------------------------------------
Episode: 4407 Total reward: 164.0 Average reward fake: 0.5399110913276672 Average reward real: 0.6006883382797241 Training d_loss: 1.3760 Training g_loss: 0.6555 Training q_loss: 8897.2998 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4408 Total reward: 158.0 Average reward fake: 0.4577367901802063 Average reward real: 0.48150673508644104 Training d_loss: 1.3473 Training g_loss: 0.7871 Training q_loss: 9146.0684 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4409 Total reward: 193.0 Average reward fake: 0.4995346665382385 Average reward real: 0.5004332661628723 Training d_loss: 1.3846 Training g_loss: 0.6930 Training q_loss: 3333.2

-------------------------------------------------------------------------------
Episode: 4430 Total reward: 26.0 Average reward fake: 0.33871597051620483 Average reward real: 0.7598825693130493 Training d_loss: 0.7814 Training g_loss: 1.0844 Training q_loss: 44910.3164 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4431 Total reward: 18.0 Average reward fake: 0.31310683488845825 Average reward real: 0.7576560974121094 Training d_loss: 0.7612 Training g_loss: 1.1609 Training q_loss: 104840.5156 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4432 Total reward: 15.0 Average reward fake: 0.35487836599349976 Average reward real: 0.5359459519386292 Training d_loss: 1.1762 Training g_loss: 0.9885 Training q_loss: 6211

-------------------------------------------------------------------------------
Episode: 4453 Total reward: 174.0 Average reward fake: 0.4996362328529358 Average reward real: 0.4907149374485016 Training d_loss: 1.4117 Training g_loss: 0.6980 Training q_loss: 6904.9507 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4454 Total reward: 159.0 Average reward fake: 0.5034393072128296 Average reward real: 0.4966355264186859 Training d_loss: 1.4007 Training g_loss: 0.6861 Training q_loss: 17636.0820 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4455 Total reward: 187.0 Average reward fake: 0.4365202486515045 Average reward real: 0.5119482278823853 Training d_loss: 1.2730 Training g_loss: 0.9338 Training q_loss: 9755.3

-------------------------------------------------------------------------------
Episode: 4477 Total reward: 11.0 Average reward fake: 0.35048478841781616 Average reward real: 0.7385947108268738 Training d_loss: 0.8525 Training g_loss: 1.0589 Training q_loss: 128203.8984 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4478 Total reward: 10.0 Average reward fake: 0.3290582299232483 Average reward real: 0.5296226739883423 Training d_loss: 1.1779 Training g_loss: 1.1116 Training q_loss: 321234.8125 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4479 Total reward: 12.0 Average reward fake: 0.34747499227523804 Average reward real: 0.6726325154304504 Training d_loss: 0.9565 Training g_loss: 1.0537 Training q_loss: 1138

-------------------------------------------------------------------------------
Episode: 4500 Total reward: 12.0 Average reward fake: 0.3535889983177185 Average reward real: 0.5462514162063599 Training d_loss: 1.1653 Training g_loss: 1.0339 Training q_loss: 16764432.0000 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4501 Total reward: 9.0 Average reward fake: 0.3603453040122986 Average reward real: 0.6145058870315552 Training d_loss: 1.0609 Training g_loss: 1.0224 Training q_loss: 117285.5625 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4502 Total reward: 11.0 Average reward fake: 0.3526710867881775 Average reward real: 0.8672657012939453 Training d_loss: 0.6466 Training g_loss: 1.0517 Training q_loss: 11320

Episode: 4523 Total reward: 15.0 Average reward fake: 0.3269701898097992 Average reward real: 0.48897314071655273 Training d_loss: 1.2213 Training g_loss: 1.1188 Training q_loss: 91508.7969 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4524 Total reward: 15.0 Average reward fake: 0.3635547161102295 Average reward real: 0.658688485622406 Training d_loss: 1.0943 Training g_loss: 1.0749 Training q_loss: 297178.7500 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4525 Total reward: 14.0 Average reward fake: 0.3792601525783539 Average reward real: 0.6624404191970825 Training d_loss: 1.0929 Training g_loss: 1.0343 Training q_loss: 114630.1484 Explore P: 0.0100
---------------------------------------------------------

-------------------------------------------------------------------------------
Episode: 4546 Total reward: 33.0 Average reward fake: 0.4360821843147278 Average reward real: 0.5695722103118896 Training d_loss: 1.3008 Training g_loss: 0.8851 Training q_loss: 112106.2266 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4547 Total reward: 29.0 Average reward fake: 0.4415244460105896 Average reward real: 0.6457314491271973 Training d_loss: 1.1404 Training g_loss: 0.8675 Training q_loss: 59235.3438 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4548 Total reward: 31.0 Average reward fake: 0.4607730805873871 Average reward real: 0.5978802442550659 Training d_loss: 1.2231 Training g_loss: 0.8175 Training q_loss: 29753.2

-------------------------------------------------------------------------------
Episode: 4569 Total reward: 159.0 Average reward fake: 0.5316766500473022 Average reward real: 0.49294835329055786 Training d_loss: 1.4823 Training g_loss: 0.6402 Training q_loss: 25987.7930 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4570 Total reward: 153.0 Average reward fake: 0.509601891040802 Average reward real: 0.5024611353874207 Training d_loss: 1.4015 Training g_loss: 0.6758 Training q_loss: 17204.8262 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4571 Total reward: 199.0 Average reward fake: 0.47832632064819336 Average reward real: 0.5119101405143738 Training d_loss: 1.3314 Training g_loss: 0.7375 Training q_loss: 3928

-------------------------------------------------------------------------------
Episode: 4592 Total reward: 199.0 Average reward fake: 0.504581093788147 Average reward real: 0.50458163022995 Training d_loss: 1.3874 Training g_loss: 0.6841 Training q_loss: 38615.0117 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4593 Total reward: 189.0 Average reward fake: 0.4952108860015869 Average reward real: 0.47489237785339355 Training d_loss: 1.4379 Training g_loss: 0.7082 Training q_loss: 41677.5664 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4594 Total reward: 199.0 Average reward fake: 0.45513850450515747 Average reward real: 0.5077502131462097 Training d_loss: 1.3121 Training g_loss: 2.8043 Training q_loss: 17325.

-------------------------------------------------------------------------------
Episode: 4615 Total reward: 199.0 Average reward fake: 0.3987186551094055 Average reward real: 0.6411999464035034 Training d_loss: 1.2310 Training g_loss: 0.9838 Training q_loss: 33879.1367 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4616 Total reward: 199.0 Average reward fake: 0.34299683570861816 Average reward real: 0.7115408182144165 Training d_loss: 0.8743 Training g_loss: 1.0841 Training q_loss: 190381.8438 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4617 Total reward: 199.0 Average reward fake: 0.3677692115306854 Average reward real: 0.6735595464706421 Training d_loss: 0.9691 Training g_loss: 1.0002 Training q_loss: 260

-------------------------------------------------------------------------------
Episode: 4638 Total reward: 151.0 Average reward fake: 0.38972434401512146 Average reward real: 0.5949780941009521 Training d_loss: 1.1846 Training g_loss: 0.9913 Training q_loss: 60840.9141 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4639 Total reward: 142.0 Average reward fake: 0.47239217162132263 Average reward real: 0.5539548397064209 Training d_loss: 1.3905 Training g_loss: 0.8139 Training q_loss: 17864.9375 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4640 Total reward: 150.0 Average reward fake: 0.4392065107822418 Average reward real: 0.5698586702346802 Training d_loss: 1.2984 Training g_loss: 0.8803 Training q_loss: 131

-------------------------------------------------------------------------------
Episode: 4661 Total reward: 199.0 Average reward fake: 0.5159175992012024 Average reward real: 0.5057367086410522 Training d_loss: 1.4167 Training g_loss: 0.6674 Training q_loss: 24845.6855 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4662 Total reward: 199.0 Average reward fake: 0.4965667128562927 Average reward real: 0.507095217704773 Training d_loss: 1.3665 Training g_loss: 0.7008 Training q_loss: 16247.7451 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4663 Total reward: 199.0 Average reward fake: 0.42752447724342346 Average reward real: 0.5278056263923645 Training d_loss: 1.2482 Training g_loss: 0.8997 Training q_loss: 12885

-------------------------------------------------------------------------------
Episode: 4684 Total reward: 199.0 Average reward fake: 0.39833465218544006 Average reward real: 0.5374576449394226 Training d_loss: 1.1780 Training g_loss: 1.0931 Training q_loss: 3482.5425 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4685 Total reward: 199.0 Average reward fake: 0.5187759399414062 Average reward real: 0.5187759399414062 Training d_loss: 1.3902 Training g_loss: 0.6603 Training q_loss: 1237.7229 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4686 Total reward: 185.0 Average reward fake: 0.5013987421989441 Average reward real: 0.4999365210533142 Training d_loss: 1.3904 Training g_loss: 0.6973 Training q_loss: 1583.8

-------------------------------------------------------------------------------
Episode: 4707 Total reward: 199.0 Average reward fake: 0.5167299509048462 Average reward real: 0.5195364356040955 Training d_loss: 1.3826 Training g_loss: 0.6591 Training q_loss: 48293.0156 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4708 Total reward: 180.0 Average reward fake: 0.48641437292099 Average reward real: 0.48641437292099 Training d_loss: 1.3892 Training g_loss: 0.7223 Training q_loss: 711.7551 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4709 Total reward: 176.0 Average reward fake: 0.5127955079078674 Average reward real: 0.5128316283226013 Training d_loss: 1.3898 Training g_loss: 0.6683 Training q_loss: 9960.5371 E

-------------------------------------------------------------------------------
Episode: 4730 Total reward: 184.0 Average reward fake: 0.48435401916503906 Average reward real: 0.5366442203521729 Training d_loss: 1.3182 Training g_loss: 3.0453 Training q_loss: 757.3446 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4731 Total reward: 199.0 Average reward fake: 0.46160298585891724 Average reward real: 0.5157433748245239 Training d_loss: 1.3089 Training g_loss: 1.0437 Training q_loss: 1197.7306 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4732 Total reward: 199.0 Average reward fake: 0.5069464445114136 Average reward real: 0.5132948756217957 Training d_loss: 1.3746 Training g_loss: 0.6808 Training q_loss: 567.27

-------------------------------------------------------------------------------
Episode: 4753 Total reward: 199.0 Average reward fake: 0.3813759386539459 Average reward real: 0.5358667373657227 Training d_loss: 1.1654 Training g_loss: 1.4999 Training q_loss: 276.5375 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4754 Total reward: 199.0 Average reward fake: 0.5103339552879333 Average reward real: 0.5072440505027771 Training d_loss: 1.3942 Training g_loss: 0.6730 Training q_loss: 62.4523 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4755 Total reward: 199.0 Average reward fake: 0.48810163140296936 Average reward real: 0.519365668296814 Training d_loss: 1.3386 Training g_loss: 0.7509 Training q_loss: 6676.5205 

-------------------------------------------------------------------------------
Episode: 4776 Total reward: 199.0 Average reward fake: 0.4537765383720398 Average reward real: 0.505352795124054 Training d_loss: 1.3145 Training g_loss: 3.1885 Training q_loss: 393.6221 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4777 Total reward: 180.0 Average reward fake: 0.47510188817977905 Average reward real: 0.5260986089706421 Training d_loss: 1.3278 Training g_loss: 3.6739 Training q_loss: 622.0399 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4778 Total reward: 149.0 Average reward fake: 0.5158370137214661 Average reward real: 0.5212495923042297 Training d_loss: 1.3797 Training g_loss: 0.6645 Training q_loss: 340.2913 

-------------------------------------------------------------------------------
Episode: 4799 Total reward: 199.0 Average reward fake: 0.5079625248908997 Average reward real: 0.5079635977745056 Training d_loss: 1.3925 Training g_loss: 0.6796 Training q_loss: 256.2911 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4800 Total reward: 199.0 Average reward fake: 0.42239275574684143 Average reward real: 0.4830215871334076 Training d_loss: 1.3004 Training g_loss: 3.5615 Training q_loss: 1026.6083 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4801 Total reward: 199.0 Average reward fake: 0.46827489137649536 Average reward real: 0.4852023124694824 Training d_loss: 1.3731 Training g_loss: 0.7290 Training q_loss: 421.62

-------------------------------------------------------------------------------
Episode: 4822 Total reward: 181.0 Average reward fake: 0.5203356146812439 Average reward real: 0.5203703045845032 Training d_loss: 1.3881 Training g_loss: 0.6535 Training q_loss: 684.9960 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4823 Total reward: 199.0 Average reward fake: 0.500044047832489 Average reward real: 0.4982040822505951 Training d_loss: 1.3903 Training g_loss: 0.6942 Training q_loss: 451.8048 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4824 Total reward: 199.0 Average reward fake: 0.4564472734928131 Average reward real: 0.5086116194725037 Training d_loss: 1.3129 Training g_loss: 1.4211 Training q_loss: 778.4615 E

-------------------------------------------------------------------------------
Episode: 4845 Total reward: 177.0 Average reward fake: 0.42166075110435486 Average reward real: 0.5324355959892273 Training d_loss: 1.2376 Training g_loss: 5.0134 Training q_loss: 682.8162 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4846 Total reward: 199.0 Average reward fake: 0.43009844422340393 Average reward real: 0.4851357936859131 Training d_loss: 1.3146 Training g_loss: 3.5178 Training q_loss: 452.0209 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4847 Total reward: 199.0 Average reward fake: 0.4927193522453308 Average reward real: 0.4927193522453308 Training d_loss: 1.4037 Training g_loss: 0.7028 Training q_loss: 1450.50

-------------------------------------------------------------------------------
Episode: 4868 Total reward: 199.0 Average reward fake: 0.46886664628982544 Average reward real: 0.5171043276786804 Training d_loss: 1.3224 Training g_loss: 3.1838 Training q_loss: 762.9852 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4869 Total reward: 184.0 Average reward fake: 0.5225790739059448 Average reward real: 0.5225824117660522 Training d_loss: 1.3905 Training g_loss: 0.6482 Training q_loss: 256.8575 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4870 Total reward: 199.0 Average reward fake: 0.5058423280715942 Average reward real: 0.5058423280715942 Training d_loss: 1.3864 Training g_loss: 0.6812 Training q_loss: 277.7556

-------------------------------------------------------------------------------
Episode: 4891 Total reward: 199.0 Average reward fake: 0.5069435834884644 Average reward real: 0.5263987183570862 Training d_loss: 1.3504 Training g_loss: 0.7901 Training q_loss: 1440.6481 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4892 Total reward: 116.0 Average reward fake: 0.4892459511756897 Average reward real: 0.540738582611084 Training d_loss: 1.3248 Training g_loss: 3.6165 Training q_loss: 171.9878 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4893 Total reward: 107.0 Average reward fake: 0.4341837763786316 Average reward real: 0.5009269714355469 Training d_loss: 1.3156 Training g_loss: 3.6519 Training q_loss: 170.2123 

-------------------------------------------------------------------------------
Episode: 4914 Total reward: 199.0 Average reward fake: 0.460129976272583 Average reward real: 0.5123255252838135 Training d_loss: 1.3133 Training g_loss: 3.5397 Training q_loss: 279.9088 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4915 Total reward: 199.0 Average reward fake: 0.5237144231796265 Average reward real: 0.5219736099243164 Training d_loss: 1.3921 Training g_loss: 0.6480 Training q_loss: 901.9471 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4916 Total reward: 192.0 Average reward fake: 0.3873819410800934 Average reward real: 0.5187327861785889 Training d_loss: 1.1924 Training g_loss: 2.7944 Training q_loss: 2390.4641 

-------------------------------------------------------------------------------
Episode: 4937 Total reward: 171.0 Average reward fake: 0.46133488416671753 Average reward real: 0.5138086080551147 Training d_loss: 1.3128 Training g_loss: 3.5878 Training q_loss: 292.2840 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4938 Total reward: 199.0 Average reward fake: 0.4598774015903473 Average reward real: 0.5143991708755493 Training d_loss: 1.3007 Training g_loss: 0.7846 Training q_loss: 66840.0156 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4939 Total reward: 199.0 Average reward fake: 0.4380768835544586 Average reward real: 0.4663909375667572 Training d_loss: 1.4050 Training g_loss: 0.7330 Training q_loss: 2034.2

-------------------------------------------------------------------------------
Episode: 4960 Total reward: 148.0 Average reward fake: 0.5087825059890747 Average reward real: 0.5087884664535522 Training d_loss: 1.3938 Training g_loss: 0.6784 Training q_loss: 707.0717 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4961 Total reward: 157.0 Average reward fake: 0.46459752321243286 Average reward real: 0.5152751803398132 Training d_loss: 1.3167 Training g_loss: 3.3953 Training q_loss: 1179.6704 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4962 Total reward: 147.0 Average reward fake: 0.4592514932155609 Average reward real: 0.5180663466453552 Training d_loss: 1.3164 Training g_loss: 3.1680 Training q_loss: 1011.78

-------------------------------------------------------------------------------
Episode: 4983 Total reward: 199.0 Average reward fake: 0.481762558221817 Average reward real: 0.5023530721664429 Training d_loss: 1.3550 Training g_loss: 0.7532 Training q_loss: 922.2151 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4984 Total reward: 152.0 Average reward fake: 0.5264792442321777 Average reward real: 0.531136691570282 Training d_loss: 1.3824 Training g_loss: 0.6437 Training q_loss: 15836.9004 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 4985 Total reward: 144.0 Average reward fake: 0.538019061088562 Average reward real: 0.6024566888809204 Training d_loss: 1.3422 Training g_loss: 2.3267 Training q_loss: 665.6639 E

-------------------------------------------------------------------------------
Episode: 5006 Total reward: 134.0 Average reward fake: 0.4854581952095032 Average reward real: 0.5242751240730286 Training d_loss: 1.3235 Training g_loss: 0.7234 Training q_loss: 1994.9268 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5007 Total reward: 100.0 Average reward fake: 0.456338107585907 Average reward real: 0.5397465825080872 Training d_loss: 1.2663 Training g_loss: 3.3638 Training q_loss: 1582.7627 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5008 Total reward: 163.0 Average reward fake: 0.4452778398990631 Average reward real: 0.5056400299072266 Training d_loss: 1.3019 Training g_loss: 3.4546 Training q_loss: 1081.980

-------------------------------------------------------------------------------
Episode: 5029 Total reward: 164.0 Average reward fake: 0.41246867179870605 Average reward real: 0.49830254912376404 Training d_loss: 1.2641 Training g_loss: 1.8524 Training q_loss: 845.3604 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5030 Total reward: 120.0 Average reward fake: 0.49820202589035034 Average reward real: 0.5280560255050659 Training d_loss: 1.3698 Training g_loss: 0.7096 Training q_loss: 3622.7651 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5031 Total reward: 141.0 Average reward fake: 0.44927549362182617 Average reward real: 0.4999631345272064 Training d_loss: 1.3072 Training g_loss: 0.8050 Training q_loss: 5685

-------------------------------------------------------------------------------
Episode: 5052 Total reward: 149.0 Average reward fake: 0.5499721765518188 Average reward real: 0.5316294431686401 Training d_loss: 1.4374 Training g_loss: 0.6028 Training q_loss: 616.2448 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5053 Total reward: 199.0 Average reward fake: 0.5433856844902039 Average reward real: 0.5478121638298035 Training d_loss: 1.3859 Training g_loss: 0.6103 Training q_loss: 942.6945 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5054 Total reward: 131.0 Average reward fake: 0.4845965504646301 Average reward real: 0.5227720141410828 Training d_loss: 1.3488 Training g_loss: 3.1860 Training q_loss: 682.3522 

-------------------------------------------------------------------------------
Episode: 5075 Total reward: 123.0 Average reward fake: 0.46093934774398804 Average reward real: 0.5078744888305664 Training d_loss: 1.3110 Training g_loss: 0.7822 Training q_loss: 1728.8763 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5076 Total reward: 159.0 Average reward fake: 0.5029484033584595 Average reward real: 0.5249849557876587 Training d_loss: 1.3848 Training g_loss: 0.8794 Training q_loss: 1636.4779 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5077 Total reward: 116.0 Average reward fake: 0.524243175983429 Average reward real: 0.5308179259300232 Training d_loss: 1.4132 Training g_loss: 0.8019 Training q_loss: 4960.90

-------------------------------------------------------------------------------
Episode: 5098 Total reward: 139.0 Average reward fake: 0.4765295088291168 Average reward real: 0.5192909836769104 Training d_loss: 1.3341 Training g_loss: 3.0782 Training q_loss: 636.5527 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5099 Total reward: 199.0 Average reward fake: 0.4965416491031647 Average reward real: 0.5058646202087402 Training d_loss: 1.3684 Training g_loss: 0.7004 Training q_loss: 1274.0410 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5100 Total reward: 123.0 Average reward fake: 0.5268049836158752 Average reward real: 0.5393255949020386 Training d_loss: 1.3798 Training g_loss: 0.6473 Training q_loss: 397.0511

-------------------------------------------------------------------------------
Episode: 5121 Total reward: 199.0 Average reward fake: 0.45124921202659607 Average reward real: 0.503976583480835 Training d_loss: 1.3128 Training g_loss: 3.2513 Training q_loss: 1402.9485 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5122 Total reward: 129.0 Average reward fake: 0.41868066787719727 Average reward real: 0.5183283090591431 Training d_loss: 1.2515 Training g_loss: 4.0387 Training q_loss: 518.5278 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5123 Total reward: 167.0 Average reward fake: 0.4372691512107849 Average reward real: 0.5686618089675903 Training d_loss: 1.1705 Training g_loss: 1.5822 Training q_loss: 396.245

-------------------------------------------------------------------------------
Episode: 5144 Total reward: 153.0 Average reward fake: 0.4256382882595062 Average reward real: 0.5335204005241394 Training d_loss: 1.2349 Training g_loss: 3.3223 Training q_loss: 482.7726 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5145 Total reward: 199.0 Average reward fake: 0.49884724617004395 Average reward real: 0.544231116771698 Training d_loss: 1.4226 Training g_loss: 2.8641 Training q_loss: 1470.9006 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5146 Total reward: 70.0 Average reward fake: 0.44020143151283264 Average reward real: 0.5293148756027222 Training d_loss: 1.2364 Training g_loss: 0.8351 Training q_loss: 1507.578

-------------------------------------------------------------------------------
Episode: 5167 Total reward: 184.0 Average reward fake: 0.43410158157348633 Average reward real: 0.48041313886642456 Training d_loss: 1.3288 Training g_loss: 3.1075 Training q_loss: 1010.9846 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5168 Total reward: 199.0 Average reward fake: 0.4366256594657898 Average reward real: 0.5372354388237 Training d_loss: 1.2455 Training g_loss: 3.1538 Training q_loss: 231.9665 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5169 Total reward: 199.0 Average reward fake: 0.45202556252479553 Average reward real: 0.5263185501098633 Training d_loss: 1.2769 Training g_loss: 3.1310 Training q_loss: 587.4734

-------------------------------------------------------------------------------
Episode: 5190 Total reward: 172.0 Average reward fake: 0.4281395971775055 Average reward real: 0.5506337881088257 Training d_loss: 1.2142 Training g_loss: 4.4252 Training q_loss: 299.4247 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5191 Total reward: 119.0 Average reward fake: 0.41237109899520874 Average reward real: 0.5273973941802979 Training d_loss: 1.2116 Training g_loss: 3.0943 Training q_loss: 1082.6809 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5192 Total reward: 171.0 Average reward fake: 0.31176164746284485 Average reward real: 0.5352746844291687 Training d_loss: 1.0687 Training g_loss: 9.6585 Training q_loss: 633.26

-------------------------------------------------------------------------------
Episode: 5213 Total reward: 193.0 Average reward fake: 0.5519198179244995 Average reward real: 0.5595143437385559 Training d_loss: 1.3868 Training g_loss: 0.5993 Training q_loss: 1129.3616 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5214 Total reward: 137.0 Average reward fake: 0.3778378665447235 Average reward real: 0.5187217593193054 Training d_loss: 1.1899 Training g_loss: 5.7100 Training q_loss: 218.1898 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5215 Total reward: 146.0 Average reward fake: 0.4156225323677063 Average reward real: 0.5966018438339233 Training d_loss: 1.1293 Training g_loss: 4.2365 Training q_loss: 493.5433

-------------------------------------------------------------------------------
Episode: 5236 Total reward: 199.0 Average reward fake: 0.5092300176620483 Average reward real: 0.5167576670646667 Training d_loss: 1.3733 Training g_loss: 0.6745 Training q_loss: 322.6162 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5237 Total reward: 130.0 Average reward fake: 0.42588290572166443 Average reward real: 0.570472776889801 Training d_loss: 1.1973 Training g_loss: 2.1007 Training q_loss: 386.9157 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5238 Total reward: 199.0 Average reward fake: 0.49019837379455566 Average reward real: 0.5513473749160767 Training d_loss: 1.3248 Training g_loss: 1.4202 Training q_loss: 380.6637

-------------------------------------------------------------------------------
Episode: 5259 Total reward: 182.0 Average reward fake: 0.5122004747390747 Average reward real: 0.5171467065811157 Training d_loss: 1.3797 Training g_loss: 0.6782 Training q_loss: 90.5704 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5260 Total reward: 199.0 Average reward fake: 0.3356298804283142 Average reward real: 0.4493374228477478 Training d_loss: 1.2482 Training g_loss: 1.4013 Training q_loss: 673.4103 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5261 Total reward: 199.0 Average reward fake: 0.4129679799079895 Average reward real: 0.5753263831138611 Training d_loss: 1.1698 Training g_loss: 1.6343 Training q_loss: 823.8583 E

-------------------------------------------------------------------------------
Episode: 5282 Total reward: 12.0 Average reward fake: 0.3530103862285614 Average reward real: 0.7403192520141602 Training d_loss: 0.8528 Training g_loss: 1.0391 Training q_loss: 79069.3906 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5283 Total reward: 10.0 Average reward fake: 0.3405696749687195 Average reward real: 0.6697582006454468 Training d_loss: 0.9555 Training g_loss: 1.0819 Training q_loss: 61534.3047 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5284 Total reward: 7.0 Average reward fake: 0.3291657567024231 Average reward real: 0.6641616821289062 Training d_loss: 0.9553 Training g_loss: 1.1176 Training q_loss: 59996.824

-------------------------------------------------------------------------------
Episode: 5305 Total reward: 26.0 Average reward fake: 0.35620397329330444 Average reward real: 0.6059142351150513 Training d_loss: 1.0345 Training g_loss: 1.0296 Training q_loss: 13769.6221 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5306 Total reward: 28.0 Average reward fake: 0.42753228545188904 Average reward real: 0.5928956270217896 Training d_loss: 1.1875 Training g_loss: 0.8803 Training q_loss: 14154.0137 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5307 Total reward: 61.0 Average reward fake: 0.48159098625183105 Average reward real: 0.5073367357254028 Training d_loss: 1.3507 Training g_loss: 0.7368 Training q_loss: 6270.

Episode: 5327 Total reward: 14.0 Average reward fake: 0.3289952278137207 Average reward real: 0.6481670141220093 Training d_loss: 0.9714 Training g_loss: 1.1148 Training q_loss: 17787.1758 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5328 Total reward: 16.0 Average reward fake: 0.31845852732658386 Average reward real: 0.6475294828414917 Training d_loss: 0.9674 Training g_loss: 1.1474 Training q_loss: 2099382.2500 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5329 Total reward: 16.0 Average reward fake: 0.2950030267238617 Average reward real: 0.7195704579353333 Training d_loss: 0.8153 Training g_loss: 2.7661 Training q_loss: 33947.9141 Explore P: 0.0100
--------------------------------------------------------

-------------------------------------------------------------------------------
Episode: 5350 Total reward: 33.0 Average reward fake: 0.34223538637161255 Average reward real: 0.6715700626373291 Training d_loss: 0.9170 Training g_loss: 1.0723 Training q_loss: 16393.1992 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5351 Total reward: 56.0 Average reward fake: 0.45923247933387756 Average reward real: 0.5471365451812744 Training d_loss: 1.3826 Training g_loss: 0.8321 Training q_loss: 47021.1484 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5352 Total reward: 76.0 Average reward fake: 0.48172396421432495 Average reward real: 0.6247951984405518 Training d_loss: 1.3520 Training g_loss: 0.8207 Training q_loss: 54889

-------------------------------------------------------------------------------
Episode: 5373 Total reward: 199.0 Average reward fake: 0.4425439238548279 Average reward real: 0.5854719877243042 Training d_loss: 1.1770 Training g_loss: 0.8440 Training q_loss: 12713.2891 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5374 Total reward: 199.0 Average reward fake: 0.48853740096092224 Average reward real: 0.5403906106948853 Training d_loss: 1.3089 Training g_loss: 0.7305 Training q_loss: 13379.9502 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5375 Total reward: 199.0 Average reward fake: 0.4852082133293152 Average reward real: 0.5525100827217102 Training d_loss: 1.3018 Training g_loss: 0.7445 Training q_loss: 6724

-------------------------------------------------------------------------------
Episode: 5396 Total reward: 199.0 Average reward fake: 0.5309557318687439 Average reward real: 0.5538503527641296 Training d_loss: 1.3967 Training g_loss: 0.6568 Training q_loss: 39936.1172 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5397 Total reward: 199.0 Average reward fake: 0.38960498571395874 Average reward real: 0.5251635313034058 Training d_loss: 1.2698 Training g_loss: 0.9815 Training q_loss: 3684.3430 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5398 Total reward: 199.0 Average reward fake: 0.4327474534511566 Average reward real: 0.5000372529029846 Training d_loss: 1.3582 Training g_loss: 2.6852 Training q_loss: 8408.

-------------------------------------------------------------------------------
Episode: 5419 Total reward: 199.0 Average reward fake: 0.3121294379234314 Average reward real: 0.4463520050048828 Training d_loss: 1.3090 Training g_loss: 1.1598 Training q_loss: 5367.5029 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5420 Total reward: 114.0 Average reward fake: 0.3347788453102112 Average reward real: 0.5981325507164001 Training d_loss: 1.0670 Training g_loss: 1.0967 Training q_loss: 14379.4873 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5421 Total reward: 84.0 Average reward fake: 0.3513334393501282 Average reward real: 0.6706129312515259 Training d_loss: 0.9609 Training g_loss: 1.0448 Training q_loss: 41801.8

-------------------------------------------------------------------------------
Episode: 5442 Total reward: 9.0 Average reward fake: 0.3342651426792145 Average reward real: 0.5995155572891235 Training d_loss: 1.0654 Training g_loss: 1.0936 Training q_loss: 70484.4609 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5443 Total reward: 9.0 Average reward fake: 0.3461698889732361 Average reward real: 0.5412558913230896 Training d_loss: 1.1686 Training g_loss: 1.0565 Training q_loss: 113148.2656 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5444 Total reward: 9.0 Average reward fake: 0.3571214973926544 Average reward real: 0.8683168292045593 Training d_loss: 0.6508 Training g_loss: 1.0350 Training q_loss: 31917772.0

-------------------------------------------------------------------------------
Episode: 5465 Total reward: 9.0 Average reward fake: 0.2793862223625183 Average reward real: 0.6393914222717285 Training d_loss: 0.9655 Training g_loss: 1.2820 Training q_loss: 24361.7070 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5466 Total reward: 11.0 Average reward fake: 0.2876668870449066 Average reward real: 0.6435330510139465 Training d_loss: 0.9625 Training g_loss: 1.2352 Training q_loss: 29576.4316 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5467 Total reward: 11.0 Average reward fake: 0.31677699089050293 Average reward real: 0.5897504687309265 Training d_loss: 1.0710 Training g_loss: 1.1456 Training q_loss: 35211.25

-------------------------------------------------------------------------------
Episode: 5489 Total reward: 10.0 Average reward fake: 0.3468132019042969 Average reward real: 0.7364873290061951 Training d_loss: 0.8517 Training g_loss: 1.0622 Training q_loss: 12711.6631 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5490 Total reward: 8.0 Average reward fake: 0.403756320476532 Average reward real: 0.6672411561012268 Training d_loss: 1.4836 Training g_loss: 0.9756 Training q_loss: 3332803.5000 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5491 Total reward: 105.0 Average reward fake: 0.517912745475769 Average reward real: 0.5030924081802368 Training d_loss: 1.4188 Training g_loss: 0.6643 Training q_loss: 40026.96

-------------------------------------------------------------------------------
Episode: 5512 Total reward: 185.0 Average reward fake: 0.5003763437271118 Average reward real: 0.5003763437271118 Training d_loss: 1.3867 Training g_loss: 0.6936 Training q_loss: 148886.9688 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5513 Total reward: 88.0 Average reward fake: 0.44395899772644043 Average reward real: 0.5244592428207397 Training d_loss: 1.2487 Training g_loss: 0.8258 Training q_loss: 13076.2275 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5514 Total reward: 93.0 Average reward fake: 0.5312736630439758 Average reward real: 0.5892767906188965 Training d_loss: 1.3672 Training g_loss: 0.6717 Training q_loss: 27325

-------------------------------------------------------------------------------
Episode: 5535 Total reward: 199.0 Average reward fake: 0.5176480412483215 Average reward real: 0.5013771057128906 Training d_loss: 1.4465 Training g_loss: 0.6706 Training q_loss: 2111.5513 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5536 Total reward: 199.0 Average reward fake: 0.4833036959171295 Average reward real: 0.5871895551681519 Training d_loss: 1.2372 Training g_loss: 0.7870 Training q_loss: 3485.6699 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5537 Total reward: 199.0 Average reward fake: 0.45225396752357483 Average reward real: 0.4522729516029358 Training d_loss: 1.4482 Training g_loss: 0.8191 Training q_loss: 16327.

-------------------------------------------------------------------------------
Episode: 5558 Total reward: 199.0 Average reward fake: 0.557242214679718 Average reward real: 0.5250416398048401 Training d_loss: 1.4611 Training g_loss: 0.6007 Training q_loss: 7025.6748 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5559 Total reward: 199.0 Average reward fake: 0.5015157461166382 Average reward real: 0.5051789283752441 Training d_loss: 1.3793 Training g_loss: 0.6910 Training q_loss: 1896.6199 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5560 Total reward: 192.0 Average reward fake: 0.5042213797569275 Average reward real: 0.5104268789291382 Training d_loss: 1.3745 Training g_loss: 0.6872 Training q_loss: 1660.137

-------------------------------------------------------------------------------
Episode: 5581 Total reward: 180.0 Average reward fake: 0.47364625334739685 Average reward real: 0.6324995160102844 Training d_loss: 1.1671 Training g_loss: 2.1427 Training q_loss: 455.4430 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5582 Total reward: 199.0 Average reward fake: 0.3941724896430969 Average reward real: 0.416421502828598 Training d_loss: 1.4221 Training g_loss: 0.8493 Training q_loss: 625.3511 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5583 Total reward: 199.0 Average reward fake: 0.4883180260658264 Average reward real: 0.5363653302192688 Training d_loss: 1.3355 Training g_loss: 1.6272 Training q_loss: 960.7757 

-------------------------------------------------------------------------------
Episode: 5604 Total reward: 129.0 Average reward fake: 0.4616064131259918 Average reward real: 0.5803318619728088 Training d_loss: 1.3517 Training g_loss: 2.2873 Training q_loss: 687.6960 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5605 Total reward: 199.0 Average reward fake: 0.4237927794456482 Average reward real: 0.5478160381317139 Training d_loss: 1.2607 Training g_loss: 0.8900 Training q_loss: 674.6164 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5606 Total reward: 168.0 Average reward fake: 0.3973531723022461 Average reward real: 0.4769766926765442 Training d_loss: 1.3431 Training g_loss: 0.9483 Training q_loss: 426.5434 

-------------------------------------------------------------------------------
Episode: 5627 Total reward: 78.0 Average reward fake: 0.4221371114253998 Average reward real: 0.5858149528503418 Training d_loss: 1.1376 Training g_loss: 2.3692 Training q_loss: 543.6949 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5628 Total reward: 63.0 Average reward fake: 0.40005844831466675 Average reward real: 0.5934424996376038 Training d_loss: 1.1380 Training g_loss: 2.3557 Training q_loss: 1039.1475 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5629 Total reward: 70.0 Average reward fake: 0.4384703040122986 Average reward real: 0.5887936353683472 Training d_loss: 1.1828 Training g_loss: 2.3442 Training q_loss: 25017.9863

-------------------------------------------------------------------------------
Episode: 5650 Total reward: 53.0 Average reward fake: 0.5359723567962646 Average reward real: 0.536191463470459 Training d_loss: 1.4219 Training g_loss: 0.6409 Training q_loss: 1584.9607 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5651 Total reward: 104.0 Average reward fake: 0.5046069025993347 Average reward real: 0.5108965039253235 Training d_loss: 1.3766 Training g_loss: 0.6869 Training q_loss: 4347.1162 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5652 Total reward: 108.0 Average reward fake: 0.5554325580596924 Average reward real: 0.533328652381897 Training d_loss: 1.4428 Training g_loss: 0.5928 Training q_loss: 1317.6708 

-------------------------------------------------------------------------------
Episode: 5673 Total reward: 113.0 Average reward fake: 0.45993977785110474 Average reward real: 0.5596277117729187 Training d_loss: 1.2853 Training g_loss: 0.8111 Training q_loss: 1802.7360 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5674 Total reward: 110.0 Average reward fake: 0.4888893663883209 Average reward real: 0.6025879383087158 Training d_loss: 1.3389 Training g_loss: 1.9810 Training q_loss: 1546.6564 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5675 Total reward: 126.0 Average reward fake: 0.5285576581954956 Average reward real: 0.6695376634597778 Training d_loss: 1.2127 Training g_loss: 0.6701 Training q_loss: 41482.

-------------------------------------------------------------------------------
Episode: 5696 Total reward: 10.0 Average reward fake: 0.36815768480300903 Average reward real: 0.6828851699829102 Training d_loss: 0.9269 Training g_loss: 1.0015 Training q_loss: 58543.8359 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5697 Total reward: 9.0 Average reward fake: 0.41019004583358765 Average reward real: 0.5698458552360535 Training d_loss: 1.2824 Training g_loss: 0.9405 Training q_loss: 86214.5938 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5698 Total reward: 11.0 Average reward fake: 0.4073491096496582 Average reward real: 0.7213937640190125 Training d_loss: 1.0068 Training g_loss: 0.9490 Training q_loss: 84247.1

-------------------------------------------------------------------------------
Episode: 5719 Total reward: 184.0 Average reward fake: 0.4937669634819031 Average reward real: 0.5269633531570435 Training d_loss: 1.3477 Training g_loss: 0.7197 Training q_loss: 30582.3848 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5720 Total reward: 136.0 Average reward fake: 0.41483601927757263 Average reward real: 0.5295548439025879 Training d_loss: 1.2401 Training g_loss: 2.3664 Training q_loss: 53942.0117 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5721 Total reward: 157.0 Average reward fake: 0.45797595381736755 Average reward real: 0.4710441529750824 Training d_loss: 1.3738 Training g_loss: 0.7853 Training q_loss: 229

-------------------------------------------------------------------------------
Episode: 5742 Total reward: 128.0 Average reward fake: 0.4776069223880768 Average reward real: 0.5144141912460327 Training d_loss: 1.3449 Training g_loss: 0.7532 Training q_loss: 103736.6953 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5743 Total reward: 106.0 Average reward fake: 0.5126866698265076 Average reward real: 0.5126866698265076 Training d_loss: 1.4906 Training g_loss: 0.7105 Training q_loss: 1464317.2500 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5744 Total reward: 104.0 Average reward fake: 0.46183571219444275 Average reward real: 0.4809611439704895 Training d_loss: 1.3788 Training g_loss: 0.7850 Training q_loss: 2

-------------------------------------------------------------------------------
Episode: 5765 Total reward: 127.0 Average reward fake: 0.513434886932373 Average reward real: 0.5111732482910156 Training d_loss: 1.3919 Training g_loss: 0.6696 Training q_loss: 14257.7314 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5766 Total reward: 152.0 Average reward fake: 0.4990430772304535 Average reward real: 0.49904316663742065 Training d_loss: 1.3885 Training g_loss: 0.6963 Training q_loss: 15326.0840 Explore P: 0.0100
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Episode: 5767 Total reward: 159.0 Average reward fake: 0.5005778670310974 Average reward real: 0.513576328754425 Training d_loss: 1.3645 Training g_loss: 0.6948 Training q_loss: 15922.

## Visualizing training

Below I'll plot the total rewards for each episode. I'm plotting the rolling average too, in blue.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / N

In [None]:
eps, arr = np.array(rewards_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Total rewards')

In [None]:
eps, arr = np.array(d_loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('D losses')

In [None]:
eps, arr = np.array(g_loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('G losses')

In [None]:
eps, arr = np.array(q_loss_list).T
smoothed_arr = running_mean(arr, 10)
plt.plot(eps[-len(smoothed_arr):], smoothed_arr)
plt.plot(eps, arr, color='grey', alpha=0.3)
plt.xlabel('Episode')
plt.ylabel('Q losses')

## Testing

Let's checkout how our trained agent plays the game.

In [41]:
test_episodes = 10
test_max_steps = 1000
env.reset()
with tf.Session() as sess:
    #saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    # Save the trained model 
    saver.restore(sess, 'checkpoints/DQAN-cartpole.ckpt')
    
    # iterations
    for ep in range(test_episodes):
        
        # number of env/rob steps
        t = 0
        while t < test_max_steps:
            env.render()
            
            # Get action from DQAN
            feed_dict = {model.states: state.reshape((1, *state.shape))}
            actions_logits = sess.run(model.actions_logits, feed_dict)
            action = np.argmax(actions_logits)
            
            # Take action, get new state and reward
            next_state, reward, done, _ = env.step(action)
            
            # The task is done or not;
            if done:
                t = test_max_steps
                env.reset()
                
                # Take one random step to get the pole and cart moving
                state, reward, done, _ = env.step(env.action_space.sample())
            else:
                state = next_state
                t += 1

INFO:tensorflow:Restoring parameters from checkpoints/DQAN-cartpole.ckpt


In [26]:
env.close()

## Extending this to Deep Convolutional QAN

So, Cart-Pole is a pretty simple game. However, the same model can be used to train an agent to play something much more complicated like Pong or Space Invaders. Instead of a state like we're using here though, you'd want to use convolutional layers to get the state from the screen images.

![Deep Q-Learning Atari](assets/atari-network.png)

I'll leave it as a challenge for you to use deep Q-learning to train an agent to play Atari games. Here's the original paper which will get you started: http://www.davidqiu.com:8888/research/nature14236.pdf.