# CPSC 533V: Assignment 3 - Behavioral Cloning and Deep Q Learning

## 48 points total (9% of final grade)

Name: Haomiao Zhang Student Number: 33074155

---
This assignment will help you transition from tabular approaches, topic of HW 2, to deep neural network approaches. You will implement the [Atari DQN / Deep Q-Learning](https://arxiv.org/abs/1312.5602) algorithm, which arguably kicked off the modern Deep Reinforcement Learning craze.

In this assignment we will use PyTorch as our deep learning framework.  To familiarize yourself with PyTorch, your first task is to use a behavior cloning (BC) approach to learn a policy.  Behavior cloning is a supervised learning method in which there exists a dataset of expert demonstrations (state-action pairs) and the goal is to learn a policy $\pi$ that mimics this expert.  At any given state, your policy should choose the same action the export would.

Since BC avoids the need to collect data from the policy you are trying to learn, it is relatively simple. 
This makes it a nice stepping stone for implementing DQN. Furthermore, BC is relevant to modern approaches---for example its use as an initialization for systems like [AlphaGo][go] and [AlphaStar][star], which then use RL to further adapte the BC result.  

<!--

I feel like this might be better suited to going lower in the document:

Unfortunately, in many tasks it is impossible to collect good expert demonstrations, making

it's not always possible to have good expert demonstrations for a task in an environemnt and this is where reinforcement learning comes handy. Through the reward signal retrieved by interacting with the environment, the agent learns by itself what is a good policy and can learn to outperform the experts.

-->

Goals:
- Famliarize yourself with PyTorch and its API including models, datasets, dataloaders
- Implement a supervised learning approach (behavioral cloning) to learn a policy.
- Implement the DQN objective and learn a policy through environment interaction.

[go]:  https://deepmind.com/research/case-studies/alphago-the-story-so-far
[star]: https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii

## Submission information

- Complete the assignment by editing and executing the associated Python files.
- Copy and paste the code and the terminal output requested in the predefined cells on this Jupyter notebook.
- When done, upload the completed Jupyter notebook (ipynb file) on canvas.

## Task 0: Preliminaries

### PyTorch

If you have never used PyTorch before, we recommend you follow this [60 Minutes Blitz][blitz] tutorial from the official website. It should give you enough context to be able to complete the assignment.


**If you have issues, post questions to Piazza**

### Installation

To install all required python packages:

```
python3 -m pip install -r requirements.txt
```

### Debugging


You can include:  `import ipdb; ipdb.set_trace()` in your code and it will drop you to that point in the code, where you can interact with variables and test out expressions.  We recommend this as an effective method to debug the algorithms.


[blitz]: https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

## Task 1: Behavioral Cloning

Behavioral Cloning is a type of supervised learning in which you are given a dataset of expert demonstrations tuple $(s, a)$ and the goal is to learn a policy function $\hat a = \pi(s)$, such that $\hat a = a$.

The optimization objective is $\min_\theta D(\pi(s), a)$ where $\theta$ are the parameters the policy $\pi$, in our case the weights of a neural network, and where $D$ represents some difference between the actions.

---

Before starting, we suggest reading through the provided files.

For Behavioral Cloning, the important files to understand are: `model.py`, `dataset.py` and `bc.py`.

- The file `model.py` has the skeleton for the model (which you will have to complete in the following questions),

- The file `dataset.py` has the skeleton for the dataset the model is being trained with,

- and, `bc.py` will have all the structure for training the model with the dataset.


### 1.1 Dataset

We provide a pickle file with pre-collected expert demonstrations on CartPole from which to learn the policy $\pi$. The data has been collected from an expert policy on the environment, with the addition of a small amount of gaussian noise to the actions.

The pickle file contains a list of tuples of states and actions in `numpy` in the following way:

```
[(state s, action a), (state s, action a), (state s, action a), ...]
```

In the `dataset.py` file, we provide skeleton code for creating a custom dataset. The provided code shows how to load the file.

Your goal is to overwrite the `__getitem__` function in order to return a dictionary of tensors of the correct type.

Hint: Look in the `bc.py` file to understand how the dataset is used.

Answer the following questions:

- [**QUESTION 2 points]** Insert your code in the placeholder below.

In [5]:
# PLACEHOLDER TO INSERT YOUR __getitem__ method here

def __getitem__(self, index):
    item = self.data[index]
    return dict({'state':torch.tensor(item[0]),'action':torch.tensor(item[1])})

- **[QUESTION 2 points]** How big is the dataset provided?

The dataset has a length of 99660.

- **[QUESTION 2 points]** What is the dimensionality of $s$ and what range does each dimension of $s$ span?  I.e., how much of the state space does the expert data cover?

The state has 4 dimensions since the length of the state is 4. The range of each dimension for the state covered by expert data are: [-0.7227, 2.3995], [-0.4330, 1.8470], [-0.0501, 0.1464], [-0.3812, 0.4714]. The numbers are obtained by finding the maximum and minimum of each state in the data.

- **[QUESTION 2 points]** What are the dimensionalities and ranges of the action $a$ in the dataset (how much of the action space does the expert data cover)?

The action has 1 dimension, and the action space covered by the expert data are {0, 1}, which is 100% coverage.


### 1.2 Environment

Recall the state and action space of CartPole, from the previous assignment.

- **[QUESTION 2 points]** Considering the full state and action spaces, do you think the provided expert dataset has good coverage?  Why or why not? How might this impact the performance of our cloned policy?

The expert data does not cover some of the state space. For example, the data does not cover the case when the cart is further in the negative position, has a large negative speed, or when the pole has large negative angles. Insufficient coverage will decrease the performance of our cloned policy since there might be better policies that is not demonstrated by the expert.

### 1.3 Model

The file `model.py` provides skeleton code for the model. Your goal is to create the architecture of the network by adding layers that map the input to output.

You will need to update the `__init__` method and the `forward` method.

The `select_action` method has already been written for you.  This should be used when running the policy in the environment, while the `forward` function should be used at training time.

- [**QUESTION 5 points]** Insert your code in the placeholder below.

In [4]:
# PLACEHOLDER TO INSERT YOUR MyModel class here

class MyModel(nn.Module):
     def __init__(self, state_size, action_size):
        super(MyModel, self).__init__()
        
        # following piazza advice for 2 hidden layers of 64 neurons
        # syntax follow the 60 min tutorial provided
        self.hd1 = nn.Linear(state_size,64)
        self.hd2 = nn.Linear(64,64)
        self.output = nn.Linear(64,action_size)

     def forward(self, x):
        x = F.relu(self.hd1(x).float())
        x = F.relu(self.hd2(x))
        x = self.output(x)
        return x

     def select_action(self, state):
        self.eval()
        x = self.forward(state)
        self.train()
        return x.max(1)[1].view(1, 1).to(torch.long)

Answer the following questions:

- **[QUESTION 2 points]** What is the input of the network?

The input of the network are the states. The dimension is 4 for each example.

- **[QUESTION 2 points]** What is the output?

The output of the nework are the probability of each action. The dimension is 2 for each example.


### 1.4 Training

The file `bc.py` is the entry point for training your behavioral cloning model. The skeleton and the main components are already there.

The missing parts for you to do are:

- Initializing the model
- Choosing a loss function
- Choosing an optimizer
- Playing with hyperparameters to train your model.

- [**QUESTION 5 points]** Insert your code in the placeholder below.

In [5]:
# PLACEHOLDER FOR YOUR CODE HER
# HOW DID YOU INITIALIZE YOUR MODEL, OPTIMIZER AND LOSS FUNCTIONS? PASTE HERE YOUR FINAL CODE
# NOTE: YOU CAN KEEP THE FOLLOWING LINES COMMENTED OUT, AS RUNNING THIS CELL WILL PROBABLY RESULT IN ERRORS

model = MyModel(4,2)
optimizer = torch.optim.SGD(model.parameters(),lr=LEARNING_RATE)

# seems to be an loss function for classification and output format matches
loss_function = torch.nn.CrossEntropyLoss()

You can run your code by doing:

```
python3 bc.py
```

**During all of this assignment, the code in `eval_policy.py` will be your best friend.** At any time, you can test your model by giving as argument the path to the model weights and the environment name using the following command:

```
python3 eval_policy.py --model-path /path/to/model/weights --env ENV_NAME
````

In [None]:
# output of training phase, running python3 bc.py
[epoch    1/100] [iter       0] [loss 0.69714]
[epoch    1/100] [iter     500] [loss 0.68122]
[epoch    1/100] [iter    1000] [loss 0.66725]
[epoch    1/100] [iter    1500] [loss 0.65466]
[epoch    2/100] [iter    2000] [loss 0.63796]
[epoch    2/100] [iter    2500] [loss 0.62442]
[epoch    2/100] [iter    3000] [loss 0.60190]
[Test on environment] [epoch 2/100] [score 53.40]
[epoch    3/100] [iter    3500] [loss 0.58616]
[epoch    3/100] [iter    4000] [loss 0.56452]
[epoch    3/100] [iter    4500] [loss 0.53839]
[epoch    4/100] [iter    5000] [loss 0.52452]
[epoch    4/100] [iter    5500] [loss 0.48137]
[epoch    4/100] [iter    6000] [loss 0.47635]
[Test on environment] [epoch 4/100] [score 61.50]
[epoch    5/100] [iter    6500] [loss 0.45083]
[epoch    5/100] [iter    7000] [loss 0.43100]
[epoch    5/100] [iter    7500] [loss 0.37357]
[epoch    6/100] [iter    8000] [loss 0.38259]
[epoch    6/100] [iter    8500] [loss 0.36017]
[epoch    6/100] [iter    9000] [loss 0.36235]
[Test on environment] [epoch 6/100] [score 60.50]
[epoch    7/100] [iter    9500] [loss 0.36913]
[epoch    7/100] [iter   10000] [loss 0.30188]
[epoch    7/100] [iter   10500] [loss 0.34008]
[epoch    8/100] [iter   11000] [loss 0.33769]
[epoch    8/100] [iter   11500] [loss 0.29680]
[epoch    8/100] [iter   12000] [loss 0.21036]
[Test on environment] [epoch 8/100] [score 69.60]
[epoch    9/100] [iter   12500] [loss 0.29051]
[epoch    9/100] [iter   13000] [loss 0.25852]
[epoch    9/100] [iter   13500] [loss 0.25409]
[epoch    9/100] [iter   14000] [loss 0.36082]
[epoch   10/100] [iter   14500] [loss 0.30018]
[epoch   10/100] [iter   15000] [loss 0.29064]
[epoch   10/100] [iter   15500] [loss 0.21888]
[Test on environment] [epoch 10/100] [score 77.20]
[epoch   11/100] [iter   16000] [loss 0.24531]
[epoch   11/100] [iter   16500] [loss 0.24884]
[epoch   11/100] [iter   17000] [loss 0.24639]
[epoch   12/100] [iter   17500] [loss 0.22901]
[epoch   12/100] [iter   18000] [loss 0.28408]
[epoch   12/100] [iter   18500] [loss 0.25482]
[Test on environment] [epoch 12/100] [score 108.40]
[epoch   13/100] [iter   19000] [loss 0.31272]
[epoch   13/100] [iter   19500] [loss 0.18302]
[epoch   13/100] [iter   20000] [loss 0.17613]
[epoch   14/100] [iter   20500] [loss 0.24361]
[epoch   14/100] [iter   21000] [loss 0.19487]
[epoch   14/100] [iter   21500] [loss 0.25849]
[Test on environment] [epoch 14/100] [score 116.70]
[epoch   15/100] [iter   22000] [loss 0.24766]
[epoch   15/100] [iter   22500] [loss 0.21922]
[epoch   15/100] [iter   23000] [loss 0.16931]
[epoch   16/100] [iter   23500] [loss 0.26442]
[epoch   16/100] [iter   24000] [loss 0.11117]
[epoch   16/100] [iter   24500] [loss 0.28343]
[Test on environment] [epoch 16/100] [score 140.90]
[epoch   17/100] [iter   25000] [loss 0.17825]
[epoch   17/100] [iter   25500] [loss 0.16142]
[epoch   17/100] [iter   26000] [loss 0.20124]
[epoch   18/100] [iter   26500] [loss 0.14321]
[epoch   18/100] [iter   27000] [loss 0.23828]
[epoch   18/100] [iter   27500] [loss 0.20905]
[epoch   18/100] [iter   28000] [loss 0.32970]
[Test on environment] [epoch 18/100] [score 155.10]
[epoch   19/100] [iter   28500] [loss 0.12756]
[epoch   19/100] [iter   29000] [loss 0.24492]
[epoch   19/100] [iter   29500] [loss 0.23846]
[epoch   20/100] [iter   30000] [loss 0.14152]
[epoch   20/100] [iter   30500] [loss 0.13408]
[epoch   20/100] [iter   31000] [loss 0.23141]
[Test on environment] [epoch 20/100] [score 186.90]
[epoch   21/100] [iter   31500] [loss 0.21407]
[epoch   21/100] [iter   32000] [loss 0.14571]
[epoch   21/100] [iter   32500] [loss 0.19157]
[epoch   22/100] [iter   33000] [loss 0.09238]
[epoch   22/100] [iter   33500] [loss 0.12021]
[epoch   22/100] [iter   34000] [loss 0.13936]
[Test on environment] [epoch 22/100] [score 159.00]
[epoch   23/100] [iter   34500] [loss 0.17434]
[epoch   23/100] [iter   35000] [loss 0.12500]
[epoch   23/100] [iter   35500] [loss 0.24270]
[epoch   24/100] [iter   36000] [loss 0.17178]
[epoch   24/100] [iter   36500] [loss 0.14341]
[epoch   24/100] [iter   37000] [loss 0.13779]
[Test on environment] [epoch 24/100] [score 173.80]
[epoch   25/100] [iter   37500] [loss 0.11098]
[epoch   25/100] [iter   38000] [loss 0.11585]
[epoch   25/100] [iter   38500] [loss 0.12512]
[epoch   26/100] [iter   39000] [loss 0.09220]
[epoch   26/100] [iter   39500] [loss 0.13928]
[epoch   26/100] [iter   40000] [loss 0.20441]
[epoch   26/100] [iter   40500] [loss 0.13053]
[Test on environment] [epoch 26/100] [score 190.00]
[epoch   27/100] [iter   41000] [loss 0.13350]
[epoch   27/100] [iter   41500] [loss 0.12575]
[epoch   27/100] [iter   42000] [loss 0.07754]
[epoch   28/100] [iter   42500] [loss 0.12791]
[epoch   28/100] [iter   43000] [loss 0.11287]
[epoch   28/100] [iter   43500] [loss 0.25251]
[Test on environment] [epoch 28/100] [score 189.60]
[epoch   29/100] [iter   44000] [loss 0.16524]
[epoch   29/100] [iter   44500] [loss 0.12574]
[epoch   29/100] [iter   45000] [loss 0.05651]
[epoch   30/100] [iter   45500] [loss 0.15205]
[epoch   30/100] [iter   46000] [loss 0.08784]
[epoch   30/100] [iter   46500] [loss 0.12887]
[Test on environment] [epoch 30/100] [score 196.70]
[epoch   31/100] [iter   47000] [loss 0.08194]
[epoch   31/100] [iter   47500] [loss 0.12054]
[epoch   31/100] [iter   48000] [loss 0.10460]
[epoch   32/100] [iter   48500] [loss 0.12993]
[epoch   32/100] [iter   49000] [loss 0.10380]
[epoch   32/100] [iter   49500] [loss 0.11424]
[Test on environment] [epoch 32/100] [score 194.00]
[epoch   33/100] [iter   50000] [loss 0.12923]
[epoch   33/100] [iter   50500] [loss 0.07099]
[epoch   33/100] [iter   51000] [loss 0.08171]
[epoch   34/100] [iter   51500] [loss 0.09515]
[epoch   34/100] [iter   52000] [loss 0.07492]
[epoch   34/100] [iter   52500] [loss 0.13357]
[Test on environment] [epoch 34/100] [score 200.00]
[epoch   35/100] [iter   53000] [loss 0.13564]
[epoch   35/100] [iter   53500] [loss 0.08119]
[epoch   35/100] [iter   54000] [loss 0.06457]
[epoch   35/100] [iter   54500] [loss 0.18283]
[epoch   36/100] [iter   55000] [loss 0.10442]
[epoch   36/100] [iter   55500] [loss 0.10068]
[epoch   36/100] [iter   56000] [loss 0.07301]
[Test on environment] [epoch 36/100] [score 198.20]
[epoch   37/100] [iter   56500] [loss 0.14000]
[epoch   37/100] [iter   57000] [loss 0.13257]
[epoch   37/100] [iter   57500] [loss 0.07662]
[epoch   38/100] [iter   58000] [loss 0.08104]
[epoch   38/100] [iter   58500] [loss 0.13946]
[epoch   38/100] [iter   59000] [loss 0.04721]
[Test on environment] [epoch 38/100] [score 198.00]
[epoch   39/100] [iter   59500] [loss 0.09174]
[epoch   39/100] [iter   60000] [loss 0.13207]
[epoch   39/100] [iter   60500] [loss 0.08296]
[epoch   40/100] [iter   61000] [loss 0.08297]
[epoch   40/100] [iter   61500] [loss 0.08201]
[epoch   40/100] [iter   62000] [loss 0.06275]
[Test on environment] [epoch 40/100] [score 199.60]
[epoch   41/100] [iter   62500] [loss 0.07500]
[epoch   41/100] [iter   63000] [loss 0.07306]
[epoch   41/100] [iter   63500] [loss 0.00711]
[epoch   42/100] [iter   64000] [loss 0.07645]
[epoch   42/100] [iter   64500] [loss 0.12512]
[epoch   42/100] [iter   65000] [loss 0.07498]
[Test on environment] [epoch 42/100] [score 198.70]
[epoch   43/100] [iter   65500] [loss 0.05814]
[epoch   43/100] [iter   66000] [loss 0.14734]
[epoch   43/100] [iter   66500] [loss 0.09420]
[epoch   44/100] [iter   67000] [loss 0.06559]
[epoch   44/100] [iter   67500] [loss 0.08785]
[epoch   44/100] [iter   68000] [loss 0.01799]
[epoch   44/100] [iter   68500] [loss 0.04748]
[Test on environment] [epoch 44/100] [score 198.90]
[epoch   45/100] [iter   69000] [loss 0.06215]
[epoch   45/100] [iter   69500] [loss 0.14878]
[epoch   45/100] [iter   70000] [loss 0.04379]
[epoch   46/100] [iter   70500] [loss 0.08242]
[epoch   46/100] [iter   71000] [loss 0.14611]
[epoch   46/100] [iter   71500] [loss 0.03529]
[Test on environment] [epoch 46/100] [score 199.30]
[epoch   47/100] [iter   72000] [loss 0.11181]
[epoch   47/100] [iter   72500] [loss 0.05420]
[epoch   47/100] [iter   73000] [loss 0.06542]
[epoch   48/100] [iter   73500] [loss 0.02776]
[epoch   48/100] [iter   74000] [loss 0.06160]
[epoch   48/100] [iter   74500] [loss 0.05815]
[Test on environment] [epoch 48/100] [score 198.20]
[epoch   49/100] [iter   75000] [loss 0.07618]
[epoch   49/100] [iter   75500] [loss 0.04137]
[epoch   49/100] [iter   76000] [loss 0.06780]
[epoch   50/100] [iter   76500] [loss 0.05133]
[epoch   50/100] [iter   77000] [loss 0.05768]
[epoch   50/100] [iter   77500] [loss 0.05279]
[Test on environment] [epoch 50/100] [score 196.50]
[epoch   51/100] [iter   78000] [loss 0.06458]
[epoch   51/100] [iter   78500] [loss 0.02706]
[epoch   51/100] [iter   79000] [loss 0.15182]
[epoch   52/100] [iter   79500] [loss 0.02669]
[epoch   52/100] [iter   80000] [loss 0.03993]
[epoch   52/100] [iter   80500] [loss 0.04363]
[epoch   52/100] [iter   81000] [loss 0.04641]
[Test on environment] [epoch 52/100] [score 199.80]
[epoch   53/100] [iter   81500] [loss 0.04811]
[epoch   53/100] [iter   82000] [loss 0.07831]
[epoch   53/100] [iter   82500] [loss 0.13590]
[epoch   54/100] [iter   83000] [loss 0.05363]
[epoch   54/100] [iter   83500] [loss 0.01846]
[epoch   54/100] [iter   84000] [loss 0.07821]
[Test on environment] [epoch 54/100] [score 195.70]
[epoch   55/100] [iter   84500] [loss 0.06859]
[epoch   55/100] [iter   85000] [loss 0.04701]
[epoch   55/100] [iter   85500] [loss 0.05324]
[epoch   56/100] [iter   86000] [loss 0.06712]
[epoch   56/100] [iter   86500] [loss 0.06254]
[epoch   56/100] [iter   87000] [loss 0.04174]
[Test on environment] [epoch 56/100] [score 199.00]
[epoch   57/100] [iter   87500] [loss 0.11631]
[epoch   57/100] [iter   88000] [loss 0.05255]
[epoch   57/100] [iter   88500] [loss 0.06074]
[epoch   58/100] [iter   89000] [loss 0.08504]
[epoch   58/100] [iter   89500] [loss 0.05822]
[epoch   58/100] [iter   90000] [loss 0.04474]
[Test on environment] [epoch 58/100] [score 198.00]
[epoch   59/100] [iter   90500] [loss 0.04967]
[epoch   59/100] [iter   91000] [loss 0.04785]
[epoch   59/100] [iter   91500] [loss 0.04461]
[epoch   60/100] [iter   92000] [loss 0.02715]
[epoch   60/100] [iter   92500] [loss 0.04390]
[epoch   60/100] [iter   93000] [loss 0.03712]
[Test on environment] [epoch 60/100] [score 200.00]
[epoch   61/100] [iter   93500] [loss 0.04944]
[epoch   61/100] [iter   94000] [loss 0.06080]
[epoch   61/100] [iter   94500] [loss 0.02200]
[epoch   61/100] [iter   95000] [loss 0.05759]
[epoch   62/100] [iter   95500] [loss 0.04561]
[epoch   62/100] [iter   96000] [loss 0.05548]
[epoch   62/100] [iter   96500] [loss 0.05217]
[Test on environment] [epoch 62/100] [score 197.60]
[epoch   63/100] [iter   97000] [loss 0.08831]
[epoch   63/100] [iter   97500] [loss 0.02987]
[epoch   63/100] [iter   98000] [loss 0.04199]
[epoch   64/100] [iter   98500] [loss 0.04198]
[epoch   64/100] [iter   99000] [loss 0.03689]
[epoch   64/100] [iter   99500] [loss 0.04334]
[Test on environment] [epoch 64/100] [score 198.50]
[epoch   65/100] [iter  100000] [loss 0.04235]
[epoch   65/100] [iter  100500] [loss 0.05102]
[epoch   65/100] [iter  101000] [loss 0.07311]
[epoch   66/100] [iter  101500] [loss 0.06371]
[epoch   66/100] [iter  102000] [loss 0.03966]
[epoch   66/100] [iter  102500] [loss 0.04935]
[Test on environment] [epoch 66/100] [score 198.50]
[epoch   67/100] [iter  103000] [loss 0.05879]
[epoch   67/100] [iter  103500] [loss 0.06695]
[epoch   67/100] [iter  104000] [loss 0.05565]
[epoch   68/100] [iter  104500] [loss 0.03269]
[epoch   68/100] [iter  105000] [loss 0.08685]
[epoch   68/100] [iter  105500] [loss 0.06411]
[Test on environment] [epoch 68/100] [score 200.00]
[epoch   69/100] [iter  106000] [loss 0.04937]
[epoch   69/100] [iter  106500] [loss 0.04324]
[epoch   69/100] [iter  107000] [loss 0.05792]
[epoch   69/100] [iter  107500] [loss 0.04539]
[epoch   70/100] [iter  108000] [loss 0.14215]
[epoch   70/100] [iter  108500] [loss 0.04782]
[epoch   70/100] [iter  109000] [loss 0.06069]
[Test on environment] [epoch 70/100] [score 199.60]
[epoch   71/100] [iter  109500] [loss 0.03340]
[epoch   71/100] [iter  110000] [loss 0.06787]
[epoch   71/100] [iter  110500] [loss 0.06662]
[epoch   72/100] [iter  111000] [loss 0.03403]
[epoch   72/100] [iter  111500] [loss 0.08459]
[epoch   72/100] [iter  112000] [loss 0.04408]
[Test on environment] [epoch 72/100] [score 200.00]
[epoch   73/100] [iter  112500] [loss 0.02905]
[epoch   73/100] [iter  113000] [loss 0.04234]
[epoch   73/100] [iter  113500] [loss 0.04563]
[epoch   74/100] [iter  114000] [loss 0.04804]
[epoch   74/100] [iter  114500] [loss 0.05245]
[epoch   74/100] [iter  115000] [loss 0.05299]
[Test on environment] [epoch 74/100] [score 200.00]
[epoch   75/100] [iter  115500] [loss 0.02796]
[epoch   75/100] [iter  116000] [loss 0.05554]
[epoch   75/100] [iter  116500] [loss 0.03132]
[epoch   76/100] [iter  117000] [loss 0.03151]
[epoch   76/100] [iter  117500] [loss 0.02985]
[epoch   76/100] [iter  118000] [loss 0.02627]
[Test on environment] [epoch 76/100] [score 198.70]
[epoch   77/100] [iter  118500] [loss 0.04323]
[epoch   77/100] [iter  119000] [loss 0.04289]
[epoch   77/100] [iter  119500] [loss 0.03098]
[epoch   78/100] [iter  120000] [loss 0.03663]
[epoch   78/100] [iter  120500] [loss 0.04377]
[epoch   78/100] [iter  121000] [loss 0.04071]
[epoch   78/100] [iter  121500] [loss 0.06311]
[Test on environment] [epoch 78/100] [score 200.00]
[epoch   79/100] [iter  122000] [loss 0.01713]
[epoch   79/100] [iter  122500] [loss 0.07172]
[epoch   79/100] [iter  123000] [loss 0.04676]
[epoch   80/100] [iter  123500] [loss 0.05198]
[epoch   80/100] [iter  124000] [loss 0.02204]
[epoch   80/100] [iter  124500] [loss 0.06332]
[Test on environment] [epoch 80/100] [score 198.90]
[epoch   81/100] [iter  125000] [loss 0.04733]
[epoch   81/100] [iter  125500] [loss 0.02210]
[epoch   81/100] [iter  126000] [loss 0.03909]
[epoch   82/100] [iter  126500] [loss 0.01423]
[epoch   82/100] [iter  127000] [loss 0.03103]
[epoch   82/100] [iter  127500] [loss 0.07605]
[Test on environment] [epoch 82/100] [score 200.00]
[epoch   83/100] [iter  128000] [loss 0.04798]
[epoch   83/100] [iter  128500] [loss 0.07016]
[epoch   83/100] [iter  129000] [loss 0.01939]
[epoch   84/100] [iter  129500] [loss 0.02238]
[epoch   84/100] [iter  130000] [loss 0.13405]
[epoch   84/100] [iter  130500] [loss 0.02822]
[Test on environment] [epoch 84/100] [score 200.00]
[epoch   85/100] [iter  131000] [loss 0.01473]
[epoch   85/100] [iter  131500] [loss 0.05388]
[epoch   85/100] [iter  132000] [loss 0.06503]
[epoch   86/100] [iter  132500] [loss 0.01272]
[epoch   86/100] [iter  133000] [loss 0.05862]
[epoch   86/100] [iter  133500] [loss 0.06432]
[Test on environment] [epoch 86/100] [score 198.30]
[epoch   87/100] [iter  134000] [loss 0.04333]
[epoch   87/100] [iter  134500] [loss 0.04263]
[epoch   87/100] [iter  135000] [loss 0.03157]
[epoch   87/100] [iter  135500] [loss 0.02746]
[epoch   88/100] [iter  136000] [loss 0.03820]
[epoch   88/100] [iter  136500] [loss 0.01769]
[epoch   88/100] [iter  137000] [loss 0.03962]
[Test on environment] [epoch 88/100] [score 200.00]
[epoch   89/100] [iter  137500] [loss 0.02568]
[epoch   89/100] [iter  138000] [loss 0.04549]
[epoch   89/100] [iter  138500] [loss 0.03729]
[epoch   90/100] [iter  139000] [loss 0.11098]
[epoch   90/100] [iter  139500] [loss 0.04339]
[epoch   90/100] [iter  140000] [loss 0.08004]
[Test on environment] [epoch 90/100] [score 197.90]
[epoch   91/100] [iter  140500] [loss 0.01768]
[epoch   91/100] [iter  141000] [loss 0.08420]
[epoch   91/100] [iter  141500] [loss 0.03575]
[epoch   92/100] [iter  142000] [loss 0.01235]
[epoch   92/100] [iter  142500] [loss 0.04670]
[epoch   92/100] [iter  143000] [loss 0.02943]
[Test on environment] [epoch 92/100] [score 200.00]
[epoch   93/100] [iter  143500] [loss 0.03243]
[epoch   93/100] [iter  144000] [loss 0.03741]
[epoch   93/100] [iter  144500] [loss 0.03913]
[epoch   94/100] [iter  145000] [loss 0.02893]
[epoch   94/100] [iter  145500] [loss 0.02154]
[epoch   94/100] [iter  146000] [loss 0.02874]
[Test on environment] [epoch 94/100] [score 200.00]
[epoch   95/100] [iter  146500] [loss 0.03196]
[epoch   95/100] [iter  147000] [loss 0.01311]
[epoch   95/100] [iter  147500] [loss 0.03302]
[epoch   95/100] [iter  148000] [loss 0.02416]
[epoch   96/100] [iter  148500] [loss 0.02109]
[epoch   96/100] [iter  149000] [loss 0.02331]
[epoch   96/100] [iter  149500] [loss 0.05027]
[Test on environment] [epoch 96/100] [score 200.00]
[epoch   97/100] [iter  150000] [loss 0.05223]
[epoch   97/100] [iter  150500] [loss 0.02322]
[epoch   97/100] [iter  151000] [loss 0.01922]
[epoch   98/100] [iter  151500] [loss 0.05662]
[epoch   98/100] [iter  152000] [loss 0.13024]
[epoch   98/100] [iter  152500] [loss 0.02138]
[Test on environment] [epoch 98/100] [score 199.20]
[epoch   99/100] [iter  153000] [loss 0.02181]
[epoch   99/100] [iter  153500] [loss 0.04091]
[epoch   99/100] [iter  154000] [loss 0.04907]
[epoch  100/100] [iter  154500] [loss 0.01671]
[epoch  100/100] [iter  155000] [loss 0.02109]
[epoch  100/100] [iter  155500] [loss 0.03168]
[Test on environment] [epoch 100/100] [score 199.90]
Saving model as behavioral_cloning_CartPole-v0.pt

In [4]:
## PASTE YOUR TERMINAL OUTPUT HERE
# NOTE: TO HAVE LESS LINES PRINTED, YOU CAN SET THE VARIABLE PRINT_INTERVAL TO 5 or 10

# Below is the result of python3 eval_policy.py --model-path behavioral_cloning_CartPole-v0.pt --env CartPole-v0
[Episode    0/10] [reward 200.0]
[Episode    1/10] [reward 200.0]
[Episode    2/10] [reward 200.0]
[Episode    3/10] [reward 200.0]
[Episode    4/10] [reward 200.0]
[Episode    5/10] [reward 200.0]
[Episode    6/10] [reward 200.0]
[Episode    7/10] [reward 200.0]
[Episode    8/10] [reward 200.0]
[Episode    9/10] [reward 200.0]

**[QUESTION 2 points]** Did you manage to learn a good policy? How consistent is the reward you are getting?

Based on the amount of reward, I think the neural network learned a good policy. The reward is also rather consistent. However, in terms of variety, I don't think it learned a good policy. There is a bias on choosing the action of moving to the right since it has more sample in expert data on the right I think. 

## Task 2: Deep Q Learning

There are two main issues with the behavior cloning approach.

- First, we are not always lucky enough to have access to a dataset of expert demonstrations.
- Second, replicating an expert policy suffers from compounding error. The policy $\pi$ only sees these "perfect" examples and has no knowledge on how to recover from states not visited by the expert. For this reason, as soon as it is presented with a state that is off the expert trajectory, it will perform poorly and will continue to deviate from a good trajectory without the possibility of recovering from errors.

---
The second task consists in solving the environment from scratch, using RL, and most specifically the DQN algorithm, to learn a policy $\pi$.

For this task, familiarize yourself with the file `dqn.py`. We are going to re-use the file `model.py` for the model you created in the previous task.

Your task is very similar to the one in the previous assignment, to implement the Q-learning algorithm, but in this version, our Q-function is approximated with a neural network.

The algorithm (excerpted from Section 6.5 of [Sutton's book](http://incompleteideas.net/book/RLbook2018.pdf)) is given below:

![DQN algorithm](https://i.imgur.com/Mh4Uxta.png)

### 2.0 Think about your model...



**[QUESTION 2 points]** In DQN, we are using the same model as in task 1 for behavioral cloning. In both tasks the model receives as input the state and in both tasks the model outputs something that has the same dimensionality as the number of actions. These two outputs, though, represent very different things. What is each one representing?

The outputs for behavioral cloning are raw probabilities of taking each actions, the output of DQN are the rewards (the value of Q function) from the samples.

### 2.1 Update your Q-function

Complete the `optimize_model` function. This function receives as input a `state`, an `action`, the `next_state`, the `reward` and `done` representing the tuple $(s_t, a_t, s_{t+1}, r_t, done_t)$. Your task is to update your Q-function as shown in the [Atari DQN paper](https://arxiv.org/abs/1312.5602) environment. For now don't be concerned with the experience replay buffer. We'll get to that later.

![Loss function](https://i.imgur.com/tpTsV8m.png)

- [**QUESTION 8 points]** Insert your code in the placeholder below.

In [7]:
## PLACEHOLDER TO INSERT YOUR optimize_model function here:

def optimize_model(state, action, next_state, reward, done):
    # TODO given a tuple (s_t, a_t, s_{t+1}, r_t, done_t) update your model weights
    
    loss_function = torch.nn.MSELoss()

    if done:
        y = reward
    else:
        y = reward+GAMMA*torch.max(target(torch.from_numpy(next_state)))

    loss = loss_function(torch.tensor(y), model(torch.from_numpy(state))[action])
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

### 2.2 $\epsilon$-greedy strategy

You will need a strategy to explore your environment. The standard strategy is to use $\epsilon$-greedy. Implement it in the `choose_action` function template.

- [**QUESTION 5 points]** Insert your code in the placeholder below.

In [9]:
## PLACEHOLDER TO INSERT YOUR choose_action function here:

 def choose_action(state, test_mode=False):
    # TODO implement an epsilon-greedy strategy
    
    if torch.rand(1)<EPS_EXPLORATION:
        action = env.action_space.sample()
    else:
        action = torch.argmax(model(torch.from_numpy(state)))
    return torch.tensor(action)

    raise NotImplementedError()

### 2.3 Train your model

Try to train a model in this way.

You can run your code by doing:

```
python3 dqn.py
```

**[QUESTION 2 points]** How many episodes does it take to learn (ie. reach a good reward)?

4000 episodes seems sufficient. All evaluation would get 200 rewards. Interestingly, each time there is a different tendencies of how policy converges. There are cases that the cart continously moving to the side, and there are cases that the cart would balance the pole around the inital position.

In [1]:
## PASTE YOUR TERMINAL OUTPUT HERE
## Evaluation Output
[Episode    0/10] [reward 200.0]
[Episode    1/10] [reward 200.0]
[Episode    2/10] [reward 200.0]
[Episode    3/10] [reward 200.0]
[Episode    4/10] [reward 200.0]
[Episode    5/10] [reward 200.0]
[Episode    6/10] [reward 200.0]
[Episode    7/10] [reward 200.0]
[Episode    8/10] [reward 200.0]
[Episode    9/10] [reward 200.0]
## Training Output
[Episode    1/4000] [Steps   11] [reward 12.0]
[Episode    2/4000] [Steps    9] [reward 10.0]
[Episode    3/4000] [Steps    8] [reward 9.0]
[Episode    4/4000] [Steps   11] [reward 12.0]
[Episode    5/4000] [Steps    8] [reward 9.0]
[Episode    6/4000] [Steps   12] [reward 13.0]
[Episode    7/4000] [Steps    9] [reward 10.0]
[Episode    8/4000] [Steps    9] [reward 10.0]
[Episode    9/4000] [Steps   10] [reward 11.0]
[Episode   10/4000] [Steps    9] [reward 10.0]
[Episode   11/4000] [Steps    9] [reward 10.0]
[Episode   12/4000] [Steps   10] [reward 11.0]
[Episode   13/4000] [Steps    9] [reward 10.0]
[Episode   14/4000] [Steps   12] [reward 13.0]
[Episode   15/4000] [Steps   12] [reward 13.0]
[Episode   16/4000] [Steps   11] [reward 12.0]
[Episode   17/4000] [Steps    9] [reward 10.0]
[Episode   18/4000] [Steps    9] [reward 10.0]
[Episode   19/4000] [Steps   12] [reward 13.0]
[Episode   20/4000] [Steps    9] [reward 10.0]
[Episode   21/4000] [Steps    8] [reward 9.0]
[Episode   22/4000] [Steps    8] [reward 9.0]
[Episode   23/4000] [Steps    9] [reward 10.0]
[Episode   24/4000] [Steps   12] [reward 13.0]
[Episode   25/4000] [Steps    7] [reward 8.0]
----------
saving model.
[TEST Episode 25] [Average Reward 9.1]
----------
[Episode   26/4000] [Steps    7] [reward 8.0]
[Episode   27/4000] [Steps    8] [reward 9.0]
[Episode   28/4000] [Steps    9] [reward 10.0]
[Episode   29/4000] [Steps    9] [reward 10.0]
[Episode   30/4000] [Steps    9] [reward 10.0]
[Episode   31/4000] [Steps    8] [reward 9.0]
[Episode   32/4000] [Steps   12] [reward 13.0]
[Episode   33/4000] [Steps   10] [reward 11.0]
[Episode   34/4000] [Steps    8] [reward 9.0]
[Episode   35/4000] [Steps   10] [reward 11.0]
[Episode   36/4000] [Steps    9] [reward 10.0]
[Episode   37/4000] [Steps   11] [reward 12.0]
[Episode   38/4000] [Steps   11] [reward 12.0]
[Episode   39/4000] [Steps   10] [reward 11.0]
[Episode   40/4000] [Steps   10] [reward 11.0]
[Episode   41/4000] [Steps    8] [reward 9.0]
[Episode   42/4000] [Steps    9] [reward 10.0]
[Episode   43/4000] [Steps   16] [reward 17.0]
[Episode   44/4000] [Steps    7] [reward 8.0]
[Episode   45/4000] [Steps   10] [reward 11.0]
[Episode   46/4000] [Steps   11] [reward 12.0]
[Episode   47/4000] [Steps    9] [reward 10.0]
[Episode   48/4000] [Steps   10] [reward 11.0]
[Episode   49/4000] [Steps   11] [reward 12.0]
[Episode   50/4000] [Steps    9] [reward 10.0]
----------
saving model.
[TEST Episode 50] [Average Reward 9.5]
----------
[Episode   51/4000] [Steps    8] [reward 9.0]
[Episode   52/4000] [Steps    8] [reward 9.0]
[Episode   53/4000] [Steps    9] [reward 10.0]
[Episode   54/4000] [Steps    9] [reward 10.0]
[Episode   55/4000] [Steps   11] [reward 12.0]
[Episode   56/4000] [Steps   12] [reward 13.0]
[Episode   57/4000] [Steps   11] [reward 12.0]
[Episode   58/4000] [Steps    9] [reward 10.0]
[Episode   59/4000] [Steps   10] [reward 11.0]
[Episode   60/4000] [Steps    8] [reward 9.0]
[Episode   61/4000] [Steps   10] [reward 11.0]
[Episode   62/4000] [Steps    9] [reward 10.0]
[Episode   63/4000] [Steps   10] [reward 11.0]
[Episode   64/4000] [Steps    8] [reward 9.0]
[Episode   65/4000] [Steps   10] [reward 11.0]
[Episode   66/4000] [Steps   13] [reward 14.0]
[Episode   67/4000] [Steps   10] [reward 11.0]
[Episode   68/4000] [Steps   13] [reward 14.0]
[Episode   69/4000] [Steps   11] [reward 12.0]
[Episode   70/4000] [Steps   11] [reward 12.0]
[Episode   71/4000] [Steps   14] [reward 15.0]
[Episode   72/4000] [Steps   11] [reward 12.0]
[Episode   73/4000] [Steps    9] [reward 10.0]
[Episode   74/4000] [Steps    9] [reward 10.0]
[Episode   75/4000] [Steps    9] [reward 10.0]
----------
[TEST Episode 75] [Average Reward 9.2]
----------
[Episode   76/4000] [Steps    8] [reward 9.0]
[Episode   77/4000] [Steps    8] [reward 9.0]
[Episode   78/4000] [Steps   10] [reward 11.0]
[Episode   79/4000] [Steps    8] [reward 9.0]
[Episode   80/4000] [Steps   10] [reward 11.0]
[Episode   81/4000] [Steps    9] [reward 10.0]
[Episode   82/4000] [Steps    7] [reward 8.0]
[Episode   83/4000] [Steps    8] [reward 9.0]
[Episode   84/4000] [Steps   11] [reward 12.0]
[Episode   85/4000] [Steps    9] [reward 10.0]
[Episode   86/4000] [Steps   13] [reward 14.0]
[Episode   87/4000] [Steps    9] [reward 10.0]
[Episode   88/4000] [Steps    9] [reward 10.0]
[Episode   89/4000] [Steps   11] [reward 12.0]
[Episode   90/4000] [Steps    8] [reward 9.0]
[Episode   91/4000] [Steps    7] [reward 8.0]
[Episode   92/4000] [Steps    7] [reward 8.0]
[Episode   93/4000] [Steps   10] [reward 11.0]
[Episode   94/4000] [Steps    8] [reward 9.0]
[Episode   95/4000] [Steps    8] [reward 9.0]
[Episode   96/4000] [Steps   11] [reward 12.0]
[Episode   97/4000] [Steps    9] [reward 10.0]
[Episode   98/4000] [Steps    9] [reward 10.0]
[Episode   99/4000] [Steps    9] [reward 10.0]
[Episode  100/4000] [Steps    9] [reward 10.0]
----------
[TEST Episode 100] [Average Reward 9.2]
----------
[Episode  101/4000] [Steps   10] [reward 11.0]
[Episode  102/4000] [Steps    9] [reward 10.0]
[Episode  103/4000] [Steps    9] [reward 10.0]
[Episode  104/4000] [Steps   10] [reward 11.0]
[Episode  105/4000] [Steps    9] [reward 10.0]
[Episode  106/4000] [Steps    8] [reward 9.0]
[Episode  107/4000] [Steps   12] [reward 13.0]
[Episode  108/4000] [Steps    7] [reward 8.0]
[Episode  109/4000] [Steps   14] [reward 15.0]
[Episode  110/4000] [Steps   12] [reward 13.0]
[Episode  111/4000] [Steps    9] [reward 10.0]
[Episode  112/4000] [Steps    8] [reward 9.0]
[Episode  113/4000] [Steps    8] [reward 9.0]
[Episode  114/4000] [Steps   14] [reward 15.0]
[Episode  115/4000] [Steps    7] [reward 8.0]
[Episode  116/4000] [Steps    8] [reward 9.0]
[Episode  117/4000] [Steps   11] [reward 12.0]
[Episode  118/4000] [Steps    8] [reward 9.0]
[Episode  119/4000] [Steps    8] [reward 9.0]
[Episode  120/4000] [Steps   12] [reward 13.0]
[Episode  121/4000] [Steps    9] [reward 10.0]
[Episode  122/4000] [Steps   10] [reward 11.0]
[Episode  123/4000] [Steps   12] [reward 13.0]
[Episode  124/4000] [Steps   13] [reward 14.0]
[Episode  125/4000] [Steps    9] [reward 10.0]
----------
saving model.
[TEST Episode 125] [Average Reward 10.2]
----------
[Episode  126/4000] [Steps   13] [reward 14.0]
[Episode  127/4000] [Steps   11] [reward 12.0]
[Episode  128/4000] [Steps   11] [reward 12.0]
[Episode  129/4000] [Steps    9] [reward 10.0]
[Episode  130/4000] [Steps   11] [reward 12.0]
[Episode  131/4000] [Steps   21] [reward 22.0]
[Episode  132/4000] [Steps   33] [reward 34.0]
[Episode  133/4000] [Steps   11] [reward 12.0]
[Episode  134/4000] [Steps   31] [reward 32.0]
[Episode  135/4000] [Steps   24] [reward 25.0]
[Episode  136/4000] [Steps   32] [reward 33.0]
[Episode  137/4000] [Steps    9] [reward 10.0]
[Episode  138/4000] [Steps   11] [reward 12.0]
[Episode  139/4000] [Steps    8] [reward 9.0]
[Episode  140/4000] [Steps    9] [reward 10.0]
[Episode  141/4000] [Steps   10] [reward 11.0]
[Episode  142/4000] [Steps   10] [reward 11.0]
[Episode  143/4000] [Steps   10] [reward 11.0]
[Episode  144/4000] [Steps    9] [reward 10.0]
[Episode  145/4000] [Steps    9] [reward 10.0]
[Episode  146/4000] [Steps   10] [reward 11.0]
[Episode  147/4000] [Steps    9] [reward 10.0]
[Episode  148/4000] [Steps    8] [reward 9.0]
[Episode  149/4000] [Steps   12] [reward 13.0]
[Episode  150/4000] [Steps    8] [reward 9.0]
----------
[TEST Episode 150] [Average Reward 9.5]
----------
[Episode  151/4000] [Steps    8] [reward 9.0]
[Episode  152/4000] [Steps    9] [reward 10.0]
[Episode  153/4000] [Steps   12] [reward 13.0]
[Episode  154/4000] [Steps   14] [reward 15.0]
[Episode  155/4000] [Steps   11] [reward 12.0]
[Episode  156/4000] [Steps    9] [reward 10.0]
[Episode  157/4000] [Steps   16] [reward 17.0]
[Episode  158/4000] [Steps   14] [reward 15.0]
[Episode  159/4000] [Steps   37] [reward 38.0]
[Episode  160/4000] [Steps   25] [reward 26.0]
[Episode  161/4000] [Steps   12] [reward 13.0]
[Episode  162/4000] [Steps   12] [reward 13.0]
[Episode  163/4000] [Steps   18] [reward 19.0]
[Episode  164/4000] [Steps   10] [reward 11.0]
[Episode  165/4000] [Steps   25] [reward 26.0]
[Episode  166/4000] [Steps   15] [reward 16.0]
[Episode  167/4000] [Steps   64] [reward 65.0]
[Episode  168/4000] [Steps   12] [reward 13.0]
[Episode  169/4000] [Steps    7] [reward 8.0]
[Episode  170/4000] [Steps    9] [reward 10.0]
[Episode  171/4000] [Steps    8] [reward 9.0]
[Episode  172/4000] [Steps    8] [reward 9.0]
[Episode  173/4000] [Steps   13] [reward 14.0]
[Episode  174/4000] [Steps    9] [reward 10.0]
[Episode  175/4000] [Steps    8] [reward 9.0]
----------
[TEST Episode 175] [Average Reward 9.1]
----------
[Episode  176/4000] [Steps   12] [reward 13.0]
[Episode  177/4000] [Steps    9] [reward 10.0]
[Episode  178/4000] [Steps   10] [reward 11.0]
[Episode  179/4000] [Steps   13] [reward 14.0]
[Episode  180/4000] [Steps   14] [reward 15.0]
[Episode  181/4000] [Steps   10] [reward 11.0]
[Episode  182/4000] [Steps   17] [reward 18.0]
[Episode  183/4000] [Steps   31] [reward 32.0]
[Episode  184/4000] [Steps   40] [reward 41.0]
[Episode  185/4000] [Steps   45] [reward 46.0]
[Episode  186/4000] [Steps    8] [reward 9.0]
[Episode  187/4000] [Steps   11] [reward 12.0]
[Episode  188/4000] [Steps   11] [reward 12.0]
[Episode  189/4000] [Steps   10] [reward 11.0]
[Episode  190/4000] [Steps   15] [reward 16.0]
[Episode  191/4000] [Steps   12] [reward 13.0]
[Episode  192/4000] [Steps   13] [reward 14.0]
[Episode  193/4000] [Steps    9] [reward 10.0]
[Episode  194/4000] [Steps   18] [reward 19.0]
[Episode  195/4000] [Steps   14] [reward 15.0]
[Episode  196/4000] [Steps    7] [reward 8.0]
[Episode  197/4000] [Steps   18] [reward 19.0]
[Episode  198/4000] [Steps   65] [reward 66.0]
[Episode  199/4000] [Steps   94] [reward 95.0]
[Episode  200/4000] [Steps   55] [reward 56.0]
----------
saving model.
[TEST Episode 200] [Average Reward 11.3]
----------
[Episode  201/4000] [Steps   29] [reward 30.0]
[Episode  202/4000] [Steps   12] [reward 13.0]
[Episode  203/4000] [Steps    7] [reward 8.0]
[Episode  204/4000] [Steps    7] [reward 8.0]
[Episode  205/4000] [Steps   10] [reward 11.0]
[Episode  206/4000] [Steps    9] [reward 10.0]
[Episode  207/4000] [Steps    9] [reward 10.0]
[Episode  208/4000] [Steps   10] [reward 11.0]
[Episode  209/4000] [Steps   16] [reward 17.0]
[Episode  210/4000] [Steps    8] [reward 9.0]
[Episode  211/4000] [Steps   10] [reward 11.0]
[Episode  212/4000] [Steps    8] [reward 9.0]
[Episode  213/4000] [Steps   12] [reward 13.0]
[Episode  214/4000] [Steps   11] [reward 12.0]
[Episode  215/4000] [Steps   23] [reward 24.0]
[Episode  216/4000] [Steps   10] [reward 11.0]
[Episode  217/4000] [Steps   10] [reward 11.0]
[Episode  218/4000] [Steps    9] [reward 10.0]
[Episode  219/4000] [Steps    9] [reward 10.0]
[Episode  220/4000] [Steps    8] [reward 9.0]
[Episode  221/4000] [Steps   13] [reward 14.0]
[Episode  222/4000] [Steps   49] [reward 50.0]
[Episode  223/4000] [Steps  120] [reward 121.0]
[Episode  224/4000] [Steps   12] [reward 13.0]
[Episode  225/4000] [Steps   10] [reward 11.0]
----------
saving model.
[TEST Episode 225] [Average Reward 13.0]
----------
[Episode  226/4000] [Steps    8] [reward 9.0]
[Episode  227/4000] [Steps    8] [reward 9.0]
[Episode  228/4000] [Steps    8] [reward 9.0]
[Episode  229/4000] [Steps   11] [reward 12.0]
[Episode  230/4000] [Steps    8] [reward 9.0]
[Episode  231/4000] [Steps    9] [reward 10.0]
[Episode  232/4000] [Steps    7] [reward 8.0]
[Episode  233/4000] [Steps   10] [reward 11.0]
[Episode  234/4000] [Steps    9] [reward 10.0]
[Episode  235/4000] [Steps   10] [reward 11.0]
[Episode  236/4000] [Steps    9] [reward 10.0]
[Episode  237/4000] [Steps  138] [reward 139.0]
[Episode  238/4000] [Steps    8] [reward 9.0]
[Episode  239/4000] [Steps    7] [reward 8.0]
[Episode  240/4000] [Steps    8] [reward 9.0]
[Episode  241/4000] [Steps   14] [reward 15.0]
[Episode  242/4000] [Steps   35] [reward 36.0]
[Episode  243/4000] [Steps    9] [reward 10.0]
[Episode  244/4000] [Steps   23] [reward 24.0]
[Episode  245/4000] [Steps   17] [reward 18.0]
[Episode  246/4000] [Steps    8] [reward 9.0]
[Episode  247/4000] [Steps    9] [reward 10.0]
[Episode  248/4000] [Steps   26] [reward 27.0]
[Episode  249/4000] [Steps   32] [reward 33.0]
[Episode  250/4000] [Steps   30] [reward 31.0]
----------
saving model.
[TEST Episode 250] [Average Reward 21.3]
----------
[Episode  251/4000] [Steps   40] [reward 41.0]
[Episode  252/4000] [Steps    7] [reward 8.0]
[Episode  253/4000] [Steps    8] [reward 9.0]
[Episode  254/4000] [Steps   12] [reward 13.0]
[Episode  255/4000] [Steps   11] [reward 12.0]
[Episode  256/4000] [Steps   11] [reward 12.0]
[Episode  257/4000] [Steps    9] [reward 10.0]
[Episode  258/4000] [Steps    8] [reward 9.0]
[Episode  259/4000] [Steps    8] [reward 9.0]
[Episode  260/4000] [Steps    9] [reward 10.0]
[Episode  261/4000] [Steps    8] [reward 9.0]
[Episode  262/4000] [Steps   12] [reward 13.0]
[Episode  263/4000] [Steps    9] [reward 10.0]
[Episode  264/4000] [Steps   11] [reward 12.0]
[Episode  265/4000] [Steps    9] [reward 10.0]
[Episode  266/4000] [Steps    8] [reward 9.0]
[Episode  267/4000] [Steps   14] [reward 15.0]
[Episode  268/4000] [Steps    8] [reward 9.0]
[Episode  269/4000] [Steps    8] [reward 9.0]
[Episode  270/4000] [Steps    8] [reward 9.0]
[Episode  271/4000] [Steps    7] [reward 8.0]
[Episode  272/4000] [Steps    9] [reward 10.0]
[Episode  273/4000] [Steps    7] [reward 8.0]
[Episode  274/4000] [Steps    8] [reward 9.0]
[Episode  275/4000] [Steps    7] [reward 8.0]
----------
[TEST Episode 275] [Average Reward 9.6]
----------
[Episode  276/4000] [Steps    7] [reward 8.0]
[Episode  277/4000] [Steps    9] [reward 10.0]
[Episode  278/4000] [Steps    7] [reward 8.0]
[Episode  279/4000] [Steps   10] [reward 11.0]
[Episode  280/4000] [Steps    7] [reward 8.0]
[Episode  281/4000] [Steps    9] [reward 10.0]
[Episode  282/4000] [Steps    7] [reward 8.0]
[Episode  283/4000] [Steps    8] [reward 9.0]
[Episode  284/4000] [Steps   11] [reward 12.0]
[Episode  285/4000] [Steps   10] [reward 11.0]
[Episode  286/4000] [Steps   10] [reward 11.0]
[Episode  287/4000] [Steps   16] [reward 17.0]
[Episode  288/4000] [Steps    8] [reward 9.0]
[Episode  289/4000] [Steps   12] [reward 13.0]
[Episode  290/4000] [Steps    9] [reward 10.0]
[Episode  291/4000] [Steps    8] [reward 9.0]
[Episode  292/4000] [Steps    9] [reward 10.0]
[Episode  293/4000] [Steps    9] [reward 10.0]
[Episode  294/4000] [Steps   10] [reward 11.0]
[Episode  295/4000] [Steps    8] [reward 9.0]
[Episode  296/4000] [Steps    9] [reward 10.0]
[Episode  297/4000] [Steps    9] [reward 10.0]
[Episode  298/4000] [Steps    8] [reward 9.0]
[Episode  299/4000] [Steps    8] [reward 9.0]
[Episode  300/4000] [Steps   12] [reward 13.0]
----------
[TEST Episode 300] [Average Reward 9.5]
----------
[Episode  301/4000] [Steps    9] [reward 10.0]
[Episode  302/4000] [Steps    8] [reward 9.0]
[Episode  303/4000] [Steps    7] [reward 8.0]
[Episode  304/4000] [Steps   10] [reward 11.0]
[Episode  305/4000] [Steps   12] [reward 13.0]
[Episode  306/4000] [Steps    9] [reward 10.0]
[Episode  307/4000] [Steps    9] [reward 10.0]
[Episode  308/4000] [Steps   11] [reward 12.0]
[Episode  309/4000] [Steps    8] [reward 9.0]
[Episode  310/4000] [Steps    9] [reward 10.0]
[Episode  311/4000] [Steps   12] [reward 13.0]
[Episode  312/4000] [Steps    7] [reward 8.0]
[Episode  313/4000] [Steps    9] [reward 10.0]
[Episode  314/4000] [Steps    8] [reward 9.0]
[Episode  315/4000] [Steps    9] [reward 10.0]
[Episode  316/4000] [Steps   11] [reward 12.0]
[Episode  317/4000] [Steps    9] [reward 10.0]
[Episode  318/4000] [Steps    8] [reward 9.0]
[Episode  319/4000] [Steps   11] [reward 12.0]
[Episode  320/4000] [Steps    9] [reward 10.0]
[Episode  321/4000] [Steps    8] [reward 9.0]
[Episode  322/4000] [Steps   10] [reward 11.0]
[Episode  323/4000] [Steps    8] [reward 9.0]
[Episode  324/4000] [Steps   14] [reward 15.0]
[Episode  325/4000] [Steps    8] [reward 9.0]
----------
[TEST Episode 325] [Average Reward 9.5]
----------
[Episode  326/4000] [Steps    9] [reward 10.0]
[Episode  327/4000] [Steps    9] [reward 10.0]
[Episode  328/4000] [Steps   10] [reward 11.0]
[Episode  329/4000] [Steps   10] [reward 11.0]
[Episode  330/4000] [Steps    9] [reward 10.0]
[Episode  331/4000] [Steps    7] [reward 8.0]
[Episode  332/4000] [Steps   12] [reward 13.0]
[Episode  333/4000] [Steps    9] [reward 10.0]
[Episode  334/4000] [Steps    8] [reward 9.0]
[Episode  335/4000] [Steps   11] [reward 12.0]
[Episode  336/4000] [Steps    7] [reward 8.0]
[Episode  337/4000] [Steps   12] [reward 13.0]
[Episode  338/4000] [Steps    8] [reward 9.0]
[Episode  339/4000] [Steps    9] [reward 10.0]
[Episode  340/4000] [Steps    9] [reward 10.0]
[Episode  341/4000] [Steps   12] [reward 13.0]
[Episode  342/4000] [Steps    8] [reward 9.0]
[Episode  343/4000] [Steps    9] [reward 10.0]
[Episode  344/4000] [Steps   10] [reward 11.0]
[Episode  345/4000] [Steps   11] [reward 12.0]
[Episode  346/4000] [Steps    9] [reward 10.0]
[Episode  347/4000] [Steps   13] [reward 14.0]
[Episode  348/4000] [Steps   11] [reward 12.0]
[Episode  349/4000] [Steps   11] [reward 12.0]
[Episode  350/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 350] [Average Reward 9.3]
----------
[Episode  351/4000] [Steps   10] [reward 11.0]
[Episode  352/4000] [Steps    9] [reward 10.0]
[Episode  353/4000] [Steps    8] [reward 9.0]
[Episode  354/4000] [Steps    8] [reward 9.0]
[Episode  355/4000] [Steps    9] [reward 10.0]
[Episode  356/4000] [Steps    8] [reward 9.0]
[Episode  357/4000] [Steps   10] [reward 11.0]
[Episode  358/4000] [Steps   11] [reward 12.0]
[Episode  359/4000] [Steps   10] [reward 11.0]
[Episode  360/4000] [Steps   12] [reward 13.0]
[Episode  361/4000] [Steps    8] [reward 9.0]
[Episode  362/4000] [Steps   10] [reward 11.0]
[Episode  363/4000] [Steps    8] [reward 9.0]
[Episode  364/4000] [Steps    9] [reward 10.0]
[Episode  365/4000] [Steps    8] [reward 9.0]
[Episode  366/4000] [Steps    9] [reward 10.0]
[Episode  367/4000] [Steps    9] [reward 10.0]
[Episode  368/4000] [Steps   11] [reward 12.0]
[Episode  369/4000] [Steps   10] [reward 11.0]
[Episode  370/4000] [Steps    8] [reward 9.0]
[Episode  371/4000] [Steps    9] [reward 10.0]
[Episode  372/4000] [Steps    8] [reward 9.0]
[Episode  373/4000] [Steps    9] [reward 10.0]
[Episode  374/4000] [Steps   10] [reward 11.0]
[Episode  375/4000] [Steps   13] [reward 14.0]
----------
[TEST Episode 375] [Average Reward 9.7]
----------
[Episode  376/4000] [Steps    7] [reward 8.0]
[Episode  377/4000] [Steps   10] [reward 11.0]
[Episode  378/4000] [Steps    8] [reward 9.0]
[Episode  379/4000] [Steps   11] [reward 12.0]
[Episode  380/4000] [Steps    7] [reward 8.0]
[Episode  381/4000] [Steps    9] [reward 10.0]
[Episode  382/4000] [Steps   14] [reward 15.0]
[Episode  383/4000] [Steps    9] [reward 10.0]
[Episode  384/4000] [Steps   13] [reward 14.0]
[Episode  385/4000] [Steps    9] [reward 10.0]
[Episode  386/4000] [Steps   13] [reward 14.0]
[Episode  387/4000] [Steps   13] [reward 14.0]
[Episode  388/4000] [Steps    9] [reward 10.0]
[Episode  389/4000] [Steps   12] [reward 13.0]
[Episode  390/4000] [Steps   10] [reward 11.0]
[Episode  391/4000] [Steps    8] [reward 9.0]
[Episode  392/4000] [Steps    8] [reward 9.0]
[Episode  393/4000] [Steps   10] [reward 11.0]
[Episode  394/4000] [Steps    8] [reward 9.0]
[Episode  395/4000] [Steps   10] [reward 11.0]
[Episode  396/4000] [Steps   12] [reward 13.0]
[Episode  397/4000] [Steps   10] [reward 11.0]
[Episode  398/4000] [Steps    9] [reward 10.0]
[Episode  399/4000] [Steps   13] [reward 14.0]
[Episode  400/4000] [Steps    9] [reward 10.0]
----------
[TEST Episode 400] [Average Reward 12.6]
----------
[Episode  401/4000] [Steps   14] [reward 15.0]
[Episode  402/4000] [Steps   15] [reward 16.0]
[Episode  403/4000] [Steps    9] [reward 10.0]
[Episode  404/4000] [Steps   10] [reward 11.0]
[Episode  405/4000] [Steps   14] [reward 15.0]
[Episode  406/4000] [Steps   17] [reward 18.0]
[Episode  407/4000] [Steps   13] [reward 14.0]
[Episode  408/4000] [Steps   17] [reward 18.0]
[Episode  409/4000] [Steps   15] [reward 16.0]
[Episode  410/4000] [Steps   13] [reward 14.0]
[Episode  411/4000] [Steps   10] [reward 11.0]
[Episode  412/4000] [Steps   10] [reward 11.0]
[Episode  413/4000] [Steps    8] [reward 9.0]
[Episode  414/4000] [Steps    9] [reward 10.0]
[Episode  415/4000] [Steps   16] [reward 17.0]
[Episode  416/4000] [Steps   10] [reward 11.0]
[Episode  417/4000] [Steps   10] [reward 11.0]
[Episode  418/4000] [Steps   10] [reward 11.0]
[Episode  419/4000] [Steps   10] [reward 11.0]
[Episode  420/4000] [Steps    8] [reward 9.0]
[Episode  421/4000] [Steps   12] [reward 13.0]
[Episode  422/4000] [Steps   10] [reward 11.0]
[Episode  423/4000] [Steps    9] [reward 10.0]
[Episode  424/4000] [Steps   14] [reward 15.0]
[Episode  425/4000] [Steps   10] [reward 11.0]
----------
[TEST Episode 425] [Average Reward 9.5]
----------
[Episode  426/4000] [Steps   11] [reward 12.0]
[Episode  427/4000] [Steps    9] [reward 10.0]
[Episode  428/4000] [Steps    8] [reward 9.0]
[Episode  429/4000] [Steps    9] [reward 10.0]
[Episode  430/4000] [Steps   12] [reward 13.0]
[Episode  431/4000] [Steps   10] [reward 11.0]
[Episode  432/4000] [Steps   11] [reward 12.0]
[Episode  433/4000] [Steps   10] [reward 11.0]
[Episode  434/4000] [Steps   12] [reward 13.0]
[Episode  435/4000] [Steps   12] [reward 13.0]
[Episode  436/4000] [Steps    7] [reward 8.0]
[Episode  437/4000] [Steps    8] [reward 9.0]
[Episode  438/4000] [Steps   13] [reward 14.0]
[Episode  439/4000] [Steps   11] [reward 12.0]
[Episode  440/4000] [Steps   10] [reward 11.0]
[Episode  441/4000] [Steps   10] [reward 11.0]
[Episode  442/4000] [Steps   12] [reward 13.0]
[Episode  443/4000] [Steps   11] [reward 12.0]
[Episode  444/4000] [Steps    9] [reward 10.0]
[Episode  445/4000] [Steps   13] [reward 14.0]
[Episode  446/4000] [Steps    9] [reward 10.0]
[Episode  447/4000] [Steps    9] [reward 10.0]
[Episode  448/4000] [Steps   10] [reward 11.0]
[Episode  449/4000] [Steps    8] [reward 9.0]
[Episode  450/4000] [Steps   11] [reward 12.0]
----------
[TEST Episode 450] [Average Reward 10.0]
----------
[Episode  451/4000] [Steps    7] [reward 8.0]
[Episode  452/4000] [Steps    9] [reward 10.0]
[Episode  453/4000] [Steps    9] [reward 10.0]
[Episode  454/4000] [Steps   11] [reward 12.0]
[Episode  455/4000] [Steps    9] [reward 10.0]
[Episode  456/4000] [Steps   10] [reward 11.0]
[Episode  457/4000] [Steps   11] [reward 12.0]
[Episode  458/4000] [Steps    9] [reward 10.0]
[Episode  459/4000] [Steps   10] [reward 11.0]
[Episode  460/4000] [Steps   12] [reward 13.0]
[Episode  461/4000] [Steps    8] [reward 9.0]
[Episode  462/4000] [Steps   12] [reward 13.0]
[Episode  463/4000] [Steps   15] [reward 16.0]
[Episode  464/4000] [Steps   19] [reward 20.0]
[Episode  465/4000] [Steps   16] [reward 17.0]
[Episode  466/4000] [Steps   19] [reward 20.0]
[Episode  467/4000] [Steps   16] [reward 17.0]
[Episode  468/4000] [Steps   18] [reward 19.0]
[Episode  469/4000] [Steps   18] [reward 19.0]
[Episode  470/4000] [Steps    9] [reward 10.0]
[Episode  471/4000] [Steps   15] [reward 16.0]
[Episode  472/4000] [Steps   12] [reward 13.0]
[Episode  473/4000] [Steps   14] [reward 15.0]
[Episode  474/4000] [Steps   12] [reward 13.0]
[Episode  475/4000] [Steps   12] [reward 13.0]
----------
[TEST Episode 475] [Average Reward 14.2]
----------
[Episode  476/4000] [Steps   11] [reward 12.0]
[Episode  477/4000] [Steps   11] [reward 12.0]
[Episode  478/4000] [Steps    8] [reward 9.0]
[Episode  479/4000] [Steps    8] [reward 9.0]
[Episode  480/4000] [Steps    8] [reward 9.0]
[Episode  481/4000] [Steps    7] [reward 8.0]
[Episode  482/4000] [Steps    9] [reward 10.0]
[Episode  483/4000] [Steps   11] [reward 12.0]
[Episode  484/4000] [Steps    9] [reward 10.0]
[Episode  485/4000] [Steps    7] [reward 8.0]
[Episode  486/4000] [Steps   10] [reward 11.0]
[Episode  487/4000] [Steps    9] [reward 10.0]
[Episode  488/4000] [Steps   10] [reward 11.0]
[Episode  489/4000] [Steps   15] [reward 16.0]
[Episode  490/4000] [Steps   13] [reward 14.0]
[Episode  491/4000] [Steps    8] [reward 9.0]
[Episode  492/4000] [Steps   13] [reward 14.0]
[Episode  493/4000] [Steps   10] [reward 11.0]
[Episode  494/4000] [Steps   12] [reward 13.0]
[Episode  495/4000] [Steps    8] [reward 9.0]
[Episode  496/4000] [Steps    9] [reward 10.0]
[Episode  497/4000] [Steps    9] [reward 10.0]
[Episode  498/4000] [Steps   13] [reward 14.0]
[Episode  499/4000] [Steps   13] [reward 14.0]
[Episode  500/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 500] [Average Reward 15.0]
----------
[Episode  501/4000] [Steps   18] [reward 19.0]
[Episode  502/4000] [Steps   16] [reward 17.0]
[Episode  503/4000] [Steps   17] [reward 18.0]
[Episode  504/4000] [Steps    8] [reward 9.0]
[Episode  505/4000] [Steps    9] [reward 10.0]
[Episode  506/4000] [Steps   13] [reward 14.0]
[Episode  507/4000] [Steps   12] [reward 13.0]
[Episode  508/4000] [Steps   10] [reward 11.0]
[Episode  509/4000] [Steps    9] [reward 10.0]
[Episode  510/4000] [Steps   10] [reward 11.0]
[Episode  511/4000] [Steps    8] [reward 9.0]
[Episode  512/4000] [Steps   11] [reward 12.0]
[Episode  513/4000] [Steps   12] [reward 13.0]
[Episode  514/4000] [Steps   16] [reward 17.0]
[Episode  515/4000] [Steps   13] [reward 14.0]
[Episode  516/4000] [Steps   12] [reward 13.0]
[Episode  517/4000] [Steps   14] [reward 15.0]
[Episode  518/4000] [Steps   11] [reward 12.0]
[Episode  519/4000] [Steps   10] [reward 11.0]
[Episode  520/4000] [Steps    9] [reward 10.0]
[Episode  521/4000] [Steps   12] [reward 13.0]
[Episode  522/4000] [Steps   14] [reward 15.0]
[Episode  523/4000] [Steps   25] [reward 26.0]
[Episode  524/4000] [Steps   15] [reward 16.0]
[Episode  525/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 525] [Average Reward 14.2]
----------
[Episode  526/4000] [Steps   17] [reward 18.0]
[Episode  527/4000] [Steps   13] [reward 14.0]
[Episode  528/4000] [Steps   13] [reward 14.0]
[Episode  529/4000] [Steps   12] [reward 13.0]
[Episode  530/4000] [Steps   10] [reward 11.0]
[Episode  531/4000] [Steps    8] [reward 9.0]
[Episode  532/4000] [Steps   12] [reward 13.0]
[Episode  533/4000] [Steps    9] [reward 10.0]
[Episode  534/4000] [Steps    9] [reward 10.0]
[Episode  535/4000] [Steps    8] [reward 9.0]
[Episode  536/4000] [Steps    9] [reward 10.0]
[Episode  537/4000] [Steps   12] [reward 13.0]
[Episode  538/4000] [Steps    8] [reward 9.0]
[Episode  539/4000] [Steps   10] [reward 11.0]
[Episode  540/4000] [Steps    8] [reward 9.0]
[Episode  541/4000] [Steps    9] [reward 10.0]
[Episode  542/4000] [Steps   12] [reward 13.0]
[Episode  543/4000] [Steps   16] [reward 17.0]
[Episode  544/4000] [Steps   16] [reward 17.0]
[Episode  545/4000] [Steps   27] [reward 28.0]
[Episode  546/4000] [Steps   17] [reward 18.0]
[Episode  547/4000] [Steps   30] [reward 31.0]
[Episode  548/4000] [Steps   25] [reward 26.0]
[Episode  549/4000] [Steps   30] [reward 31.0]
[Episode  550/4000] [Steps   25] [reward 26.0]
----------
saving model.
[TEST Episode 550] [Average Reward 38.0]
----------
[Episode  551/4000] [Steps   28] [reward 29.0]
[Episode  552/4000] [Steps   36] [reward 37.0]
[Episode  553/4000] [Steps   39] [reward 40.0]
[Episode  554/4000] [Steps   59] [reward 60.0]
[Episode  555/4000] [Steps   38] [reward 39.0]
[Episode  556/4000] [Steps   17] [reward 18.0]
[Episode  557/4000] [Steps   11] [reward 12.0]
[Episode  558/4000] [Steps    9] [reward 10.0]
[Episode  559/4000] [Steps    7] [reward 8.0]
[Episode  560/4000] [Steps   24] [reward 25.0]
[Episode  561/4000] [Steps   17] [reward 18.0]
[Episode  562/4000] [Steps   12] [reward 13.0]
[Episode  563/4000] [Steps    8] [reward 9.0]
[Episode  564/4000] [Steps    8] [reward 9.0]
[Episode  565/4000] [Steps    8] [reward 9.0]
[Episode  566/4000] [Steps    9] [reward 10.0]
[Episode  567/4000] [Steps    8] [reward 9.0]
[Episode  568/4000] [Steps    8] [reward 9.0]
[Episode  569/4000] [Steps    8] [reward 9.0]
[Episode  570/4000] [Steps    9] [reward 10.0]
[Episode  571/4000] [Steps   11] [reward 12.0]
[Episode  572/4000] [Steps    9] [reward 10.0]
[Episode  573/4000] [Steps    8] [reward 9.0]
[Episode  574/4000] [Steps   11] [reward 12.0]
[Episode  575/4000] [Steps    8] [reward 9.0]
----------
[TEST Episode 575] [Average Reward 9.4]
----------
[Episode  576/4000] [Steps   10] [reward 11.0]
[Episode  577/4000] [Steps    9] [reward 10.0]
[Episode  578/4000] [Steps    9] [reward 10.0]
[Episode  579/4000] [Steps   10] [reward 11.0]
[Episode  580/4000] [Steps    8] [reward 9.0]
[Episode  581/4000] [Steps    8] [reward 9.0]
[Episode  582/4000] [Steps   13] [reward 14.0]
[Episode  583/4000] [Steps   12] [reward 13.0]
[Episode  584/4000] [Steps   11] [reward 12.0]
[Episode  585/4000] [Steps    8] [reward 9.0]
[Episode  586/4000] [Steps   10] [reward 11.0]
[Episode  587/4000] [Steps   11] [reward 12.0]
[Episode  588/4000] [Steps   10] [reward 11.0]
[Episode  589/4000] [Steps    9] [reward 10.0]
[Episode  590/4000] [Steps    8] [reward 9.0]
[Episode  591/4000] [Steps    8] [reward 9.0]
[Episode  592/4000] [Steps    8] [reward 9.0]
[Episode  593/4000] [Steps   11] [reward 12.0]
[Episode  594/4000] [Steps   10] [reward 11.0]
[Episode  595/4000] [Steps   11] [reward 12.0]
[Episode  596/4000] [Steps   24] [reward 25.0]
[Episode  597/4000] [Steps   25] [reward 26.0]
[Episode  598/4000] [Steps   20] [reward 21.0]
[Episode  599/4000] [Steps   32] [reward 33.0]
[Episode  600/4000] [Steps   55] [reward 56.0]
----------
[TEST Episode 600] [Average Reward 30.7]
----------
[Episode  601/4000] [Steps   40] [reward 41.0]
[Episode  602/4000] [Steps   45] [reward 46.0]
[Episode  603/4000] [Steps   27] [reward 28.0]
[Episode  604/4000] [Steps   30] [reward 31.0]
[Episode  605/4000] [Steps   27] [reward 28.0]
[Episode  606/4000] [Steps   22] [reward 23.0]
[Episode  607/4000] [Steps   44] [reward 45.0]
[Episode  608/4000] [Steps   27] [reward 28.0]
[Episode  609/4000] [Steps   36] [reward 37.0]
[Episode  610/4000] [Steps   23] [reward 24.0]
[Episode  611/4000] [Steps   34] [reward 35.0]
[Episode  612/4000] [Steps   31] [reward 32.0]
[Episode  613/4000] [Steps   41] [reward 42.0]
[Episode  614/4000] [Steps   24] [reward 25.0]
[Episode  615/4000] [Steps   31] [reward 32.0]
[Episode  616/4000] [Steps   19] [reward 20.0]
[Episode  617/4000] [Steps   33] [reward 34.0]
[Episode  618/4000] [Steps   25] [reward 26.0]
[Episode  619/4000] [Steps   28] [reward 29.0]
[Episode  620/4000] [Steps   28] [reward 29.0]
[Episode  621/4000] [Steps   36] [reward 37.0]
[Episode  622/4000] [Steps   23] [reward 24.0]
[Episode  623/4000] [Steps   15] [reward 16.0]
[Episode  624/4000] [Steps   15] [reward 16.0]
[Episode  625/4000] [Steps   10] [reward 11.0]
----------
[TEST Episode 625] [Average Reward 9.2]
----------
[Episode  626/4000] [Steps   12] [reward 13.0]
[Episode  627/4000] [Steps    9] [reward 10.0]
[Episode  628/4000] [Steps   11] [reward 12.0]
[Episode  629/4000] [Steps    8] [reward 9.0]
[Episode  630/4000] [Steps    9] [reward 10.0]
[Episode  631/4000] [Steps   10] [reward 11.0]
[Episode  632/4000] [Steps   10] [reward 11.0]
[Episode  633/4000] [Steps   11] [reward 12.0]
[Episode  634/4000] [Steps    9] [reward 10.0]
[Episode  635/4000] [Steps    8] [reward 9.0]
[Episode  636/4000] [Steps    8] [reward 9.0]
[Episode  637/4000] [Steps    9] [reward 10.0]
[Episode  638/4000] [Steps   15] [reward 16.0]
[Episode  639/4000] [Steps   23] [reward 24.0]
[Episode  640/4000] [Steps   30] [reward 31.0]
[Episode  641/4000] [Steps   33] [reward 34.0]
[Episode  642/4000] [Steps   18] [reward 19.0]
[Episode  643/4000] [Steps   21] [reward 22.0]
[Episode  644/4000] [Steps   13] [reward 14.0]
[Episode  645/4000] [Steps   11] [reward 12.0]
[Episode  646/4000] [Steps   28] [reward 29.0]
[Episode  647/4000] [Steps   24] [reward 25.0]
[Episode  648/4000] [Steps   38] [reward 39.0]
[Episode  649/4000] [Steps   29] [reward 30.0]
[Episode  650/4000] [Steps   25] [reward 26.0]
----------
[TEST Episode 650] [Average Reward 21.2]
----------
[Episode  651/4000] [Steps   33] [reward 34.0]
[Episode  652/4000] [Steps   39] [reward 40.0]
[Episode  653/4000] [Steps   25] [reward 26.0]
[Episode  654/4000] [Steps   25] [reward 26.0]
[Episode  655/4000] [Steps   33] [reward 34.0]
[Episode  656/4000] [Steps   39] [reward 40.0]
[Episode  657/4000] [Steps   32] [reward 33.0]
[Episode  658/4000] [Steps   43] [reward 44.0]
[Episode  659/4000] [Steps   40] [reward 41.0]
[Episode  660/4000] [Steps   41] [reward 42.0]
[Episode  661/4000] [Steps   26] [reward 27.0]
[Episode  662/4000] [Steps   31] [reward 32.0]
[Episode  663/4000] [Steps   34] [reward 35.0]
[Episode  664/4000] [Steps   31] [reward 32.0]
[Episode  665/4000] [Steps   24] [reward 25.0]
[Episode  666/4000] [Steps   30] [reward 31.0]
[Episode  667/4000] [Steps   45] [reward 46.0]
[Episode  668/4000] [Steps   28] [reward 29.0]
[Episode  669/4000] [Steps   33] [reward 34.0]
[Episode  670/4000] [Steps   17] [reward 18.0]
[Episode  671/4000] [Steps   20] [reward 21.0]
[Episode  672/4000] [Steps   29] [reward 30.0]
[Episode  673/4000] [Steps   37] [reward 38.0]
[Episode  674/4000] [Steps   23] [reward 24.0]
[Episode  675/4000] [Steps   27] [reward 28.0]
----------
saving model.
[TEST Episode 675] [Average Reward 44.8]
----------
[Episode  676/4000] [Steps   44] [reward 45.0]
[Episode  677/4000] [Steps   47] [reward 48.0]
[Episode  678/4000] [Steps   29] [reward 30.0]
[Episode  679/4000] [Steps   22] [reward 23.0]
[Episode  680/4000] [Steps   14] [reward 15.0]
[Episode  681/4000] [Steps   32] [reward 33.0]
[Episode  682/4000] [Steps   34] [reward 35.0]
[Episode  683/4000] [Steps   41] [reward 42.0]
[Episode  684/4000] [Steps   34] [reward 35.0]
[Episode  685/4000] [Steps  121] [reward 122.0]
[Episode  686/4000] [Steps   71] [reward 72.0]
[Episode  687/4000] [Steps   54] [reward 55.0]
[Episode  688/4000] [Steps   36] [reward 37.0]
[Episode  689/4000] [Steps   44] [reward 45.0]
[Episode  690/4000] [Steps   36] [reward 37.0]
[Episode  691/4000] [Steps   29] [reward 30.0]
[Episode  692/4000] [Steps   42] [reward 43.0]
[Episode  693/4000] [Steps   20] [reward 21.0]
[Episode  694/4000] [Steps   28] [reward 29.0]
[Episode  695/4000] [Steps   22] [reward 23.0]
[Episode  696/4000] [Steps   32] [reward 33.0]
[Episode  697/4000] [Steps   26] [reward 27.0]
[Episode  698/4000] [Steps   29] [reward 30.0]
[Episode  699/4000] [Steps   27] [reward 28.0]
[Episode  700/4000] [Steps   24] [reward 25.0]
----------
[TEST Episode 700] [Average Reward 25.8]
----------
[Episode  701/4000] [Steps   33] [reward 34.0]
[Episode  702/4000] [Steps   48] [reward 49.0]
[Episode  703/4000] [Steps   28] [reward 29.0]
[Episode  704/4000] [Steps   39] [reward 40.0]
[Episode  705/4000] [Steps   35] [reward 36.0]
[Episode  706/4000] [Steps   38] [reward 39.0]
[Episode  707/4000] [Steps   37] [reward 38.0]
[Episode  708/4000] [Steps   47] [reward 48.0]
[Episode  709/4000] [Steps   39] [reward 40.0]
[Episode  710/4000] [Steps   45] [reward 46.0]
[Episode  711/4000] [Steps   48] [reward 49.0]
[Episode  712/4000] [Steps   27] [reward 28.0]
[Episode  713/4000] [Steps   29] [reward 30.0]
[Episode  714/4000] [Steps   47] [reward 48.0]
[Episode  715/4000] [Steps   40] [reward 41.0]
[Episode  716/4000] [Steps   25] [reward 26.0]
[Episode  717/4000] [Steps   38] [reward 39.0]
[Episode  718/4000] [Steps   39] [reward 40.0]
[Episode  719/4000] [Steps   39] [reward 40.0]
[Episode  720/4000] [Steps   33] [reward 34.0]
[Episode  721/4000] [Steps   23] [reward 24.0]
[Episode  722/4000] [Steps   37] [reward 38.0]
[Episode  723/4000] [Steps   25] [reward 26.0]
[Episode  724/4000] [Steps   24] [reward 25.0]
[Episode  725/4000] [Steps   62] [reward 63.0]
----------
saving model.
[TEST Episode 725] [Average Reward 161.5]
----------
[Episode  726/4000] [Steps   89] [reward 90.0]
[Episode  727/4000] [Steps   91] [reward 92.0]
[Episode  728/4000] [Steps   61] [reward 62.0]
[Episode  729/4000] [Steps  109] [reward 110.0]
[Episode  730/4000] [Steps   95] [reward 96.0]
[Episode  731/4000] [Steps   84] [reward 85.0]
[Episode  732/4000] [Steps   99] [reward 100.0]
[Episode  733/4000] [Steps   64] [reward 65.0]
[Episode  734/4000] [Steps   72] [reward 73.0]
[Episode  735/4000] [Steps   44] [reward 45.0]
[Episode  736/4000] [Steps   48] [reward 49.0]
[Episode  737/4000] [Steps   29] [reward 30.0]
[Episode  738/4000] [Steps   29] [reward 30.0]
[Episode  739/4000] [Steps   26] [reward 27.0]
[Episode  740/4000] [Steps   26] [reward 27.0]
[Episode  741/4000] [Steps   64] [reward 65.0]
[Episode  742/4000] [Steps   94] [reward 95.0]
[Episode  743/4000] [Steps   11] [reward 12.0]
[Episode  744/4000] [Steps   12] [reward 13.0]
[Episode  745/4000] [Steps   28] [reward 29.0]
[Episode  746/4000] [Steps   13] [reward 14.0]
[Episode  747/4000] [Steps   18] [reward 19.0]
[Episode  748/4000] [Steps   25] [reward 26.0]
[Episode  749/4000] [Steps   49] [reward 50.0]
[Episode  750/4000] [Steps   24] [reward 25.0]
----------
[TEST Episode 750] [Average Reward 31.2]
----------
[Episode  751/4000] [Steps   38] [reward 39.0]
[Episode  752/4000] [Steps   25] [reward 26.0]
[Episode  753/4000] [Steps   28] [reward 29.0]
[Episode  754/4000] [Steps   45] [reward 46.0]
[Episode  755/4000] [Steps   34] [reward 35.0]
[Episode  756/4000] [Steps   32] [reward 33.0]
[Episode  757/4000] [Steps   28] [reward 29.0]
[Episode  758/4000] [Steps   28] [reward 29.0]
[Episode  759/4000] [Steps   31] [reward 32.0]
[Episode  760/4000] [Steps   37] [reward 38.0]
[Episode  761/4000] [Steps   29] [reward 30.0]
[Episode  762/4000] [Steps   21] [reward 22.0]
[Episode  763/4000] [Steps   12] [reward 13.0]
[Episode  764/4000] [Steps    9] [reward 10.0]
[Episode  765/4000] [Steps   11] [reward 12.0]
[Episode  766/4000] [Steps   10] [reward 11.0]
[Episode  767/4000] [Steps   12] [reward 13.0]
[Episode  768/4000] [Steps   11] [reward 12.0]
[Episode  769/4000] [Steps    8] [reward 9.0]
[Episode  770/4000] [Steps    9] [reward 10.0]
[Episode  771/4000] [Steps    7] [reward 8.0]
[Episode  772/4000] [Steps   10] [reward 11.0]
[Episode  773/4000] [Steps    9] [reward 10.0]
[Episode  774/4000] [Steps   40] [reward 41.0]
[Episode  775/4000] [Steps   13] [reward 14.0]
----------
[TEST Episode 775] [Average Reward 21.6]
----------
[Episode  776/4000] [Steps   24] [reward 25.0]
[Episode  777/4000] [Steps   93] [reward 94.0]
[Episode  778/4000] [Steps   57] [reward 58.0]
[Episode  779/4000] [Steps   25] [reward 26.0]
[Episode  780/4000] [Steps   16] [reward 17.0]
[Episode  781/4000] [Steps   29] [reward 30.0]
[Episode  782/4000] [Steps   46] [reward 47.0]
[Episode  783/4000] [Steps   51] [reward 52.0]
[Episode  784/4000] [Steps   61] [reward 62.0]
[Episode  785/4000] [Steps   32] [reward 33.0]
[Episode  786/4000] [Steps   23] [reward 24.0]
[Episode  787/4000] [Steps   26] [reward 27.0]
[Episode  788/4000] [Steps   14] [reward 15.0]
[Episode  789/4000] [Steps   34] [reward 35.0]
[Episode  790/4000] [Steps   18] [reward 19.0]
[Episode  791/4000] [Steps   21] [reward 22.0]
[Episode  792/4000] [Steps   17] [reward 18.0]
[Episode  793/4000] [Steps   25] [reward 26.0]
[Episode  794/4000] [Steps   19] [reward 20.0]
[Episode  795/4000] [Steps   17] [reward 18.0]
[Episode  796/4000] [Steps   27] [reward 28.0]
[Episode  797/4000] [Steps   30] [reward 31.0]
[Episode  798/4000] [Steps   18] [reward 19.0]
[Episode  799/4000] [Steps   25] [reward 26.0]
[Episode  800/4000] [Steps   34] [reward 35.0]
----------
[TEST Episode 800] [Average Reward 23.5]
----------
[Episode  801/4000] [Steps   46] [reward 47.0]
[Episode  802/4000] [Steps   95] [reward 96.0]
[Episode  803/4000] [Steps  181] [reward 182.0]
[Episode  804/4000] [Steps  160] [reward 161.0]
[Episode  805/4000] [Steps  151] [reward 152.0]
[Episode  806/4000] [Steps  151] [reward 152.0]
[Episode  807/4000] [Steps  118] [reward 119.0]
[Episode  808/4000] [Steps  191] [reward 192.0]
[Episode  809/4000] [Steps  177] [reward 178.0]
[Episode  810/4000] [Steps  110] [reward 111.0]
[Episode  811/4000] [Steps   69] [reward 70.0]
[Episode  812/4000] [Steps   20] [reward 21.0]
[Episode  813/4000] [Steps   36] [reward 37.0]
[Episode  814/4000] [Steps   32] [reward 33.0]
[Episode  815/4000] [Steps   34] [reward 35.0]
[Episode  816/4000] [Steps   30] [reward 31.0]
[Episode  817/4000] [Steps   34] [reward 35.0]
[Episode  818/4000] [Steps   34] [reward 35.0]
[Episode  819/4000] [Steps   26] [reward 27.0]
[Episode  820/4000] [Steps   22] [reward 23.0]
[Episode  821/4000] [Steps   33] [reward 34.0]
[Episode  822/4000] [Steps   27] [reward 28.0]
[Episode  823/4000] [Steps   27] [reward 28.0]
[Episode  824/4000] [Steps  121] [reward 122.0]
[Episode  825/4000] [Steps  199] [reward 200.0]
----------
saving model.
[TEST Episode 825] [Average Reward 185.2]
----------
[Episode  826/4000] [Steps   70] [reward 71.0]
[Episode  827/4000] [Steps  144] [reward 145.0]
[Episode  828/4000] [Steps   34] [reward 35.0]
[Episode  829/4000] [Steps   11] [reward 12.0]
[Episode  830/4000] [Steps   28] [reward 29.0]
[Episode  831/4000] [Steps   10] [reward 11.0]
[Episode  832/4000] [Steps   26] [reward 27.0]
[Episode  833/4000] [Steps   21] [reward 22.0]
[Episode  834/4000] [Steps   16] [reward 17.0]
[Episode  835/4000] [Steps   30] [reward 31.0]
[Episode  836/4000] [Steps   14] [reward 15.0]
[Episode  837/4000] [Steps   14] [reward 15.0]
[Episode  838/4000] [Steps   27] [reward 28.0]
[Episode  839/4000] [Steps   14] [reward 15.0]
[Episode  840/4000] [Steps   16] [reward 17.0]
[Episode  841/4000] [Steps   11] [reward 12.0]
[Episode  842/4000] [Steps  105] [reward 106.0]
[Episode  843/4000] [Steps  199] [reward 200.0]
[Episode  844/4000] [Steps   28] [reward 29.0]
[Episode  845/4000] [Steps   15] [reward 16.0]
[Episode  846/4000] [Steps   10] [reward 11.0]
[Episode  847/4000] [Steps   20] [reward 21.0]
[Episode  848/4000] [Steps   16] [reward 17.0]
[Episode  849/4000] [Steps   16] [reward 17.0]
[Episode  850/4000] [Steps   10] [reward 11.0]
----------
[TEST Episode 850] [Average Reward 12.3]
----------
[Episode  851/4000] [Steps   15] [reward 16.0]
[Episode  852/4000] [Steps   21] [reward 22.0]
[Episode  853/4000] [Steps   48] [reward 49.0]
[Episode  854/4000] [Steps   14] [reward 15.0]
[Episode  855/4000] [Steps   19] [reward 20.0]
[Episode  856/4000] [Steps   17] [reward 18.0]
[Episode  857/4000] [Steps   20] [reward 21.0]
[Episode  858/4000] [Steps   19] [reward 20.0]
[Episode  859/4000] [Steps   20] [reward 21.0]
[Episode  860/4000] [Steps   11] [reward 12.0]
[Episode  861/4000] [Steps   53] [reward 54.0]
[Episode  862/4000] [Steps   13] [reward 14.0]
[Episode  863/4000] [Steps   11] [reward 12.0]
[Episode  864/4000] [Steps   12] [reward 13.0]
[Episode  865/4000] [Steps   11] [reward 12.0]
[Episode  866/4000] [Steps   11] [reward 12.0]
[Episode  867/4000] [Steps   17] [reward 18.0]
[Episode  868/4000] [Steps   16] [reward 17.0]
[Episode  869/4000] [Steps   11] [reward 12.0]
[Episode  870/4000] [Steps   14] [reward 15.0]
[Episode  871/4000] [Steps   12] [reward 13.0]
[Episode  872/4000] [Steps   12] [reward 13.0]
[Episode  873/4000] [Steps   11] [reward 12.0]
[Episode  874/4000] [Steps   25] [reward 26.0]
[Episode  875/4000] [Steps   22] [reward 23.0]
----------
[TEST Episode 875] [Average Reward 9.1]
----------
[Episode  876/4000] [Steps   10] [reward 11.0]
[Episode  877/4000] [Steps    9] [reward 10.0]
[Episode  878/4000] [Steps   11] [reward 12.0]
[Episode  879/4000] [Steps   18] [reward 19.0]
[Episode  880/4000] [Steps   13] [reward 14.0]
[Episode  881/4000] [Steps    8] [reward 9.0]
[Episode  882/4000] [Steps    8] [reward 9.0]
[Episode  883/4000] [Steps   10] [reward 11.0]
[Episode  884/4000] [Steps   10] [reward 11.0]
[Episode  885/4000] [Steps    9] [reward 10.0]
[Episode  886/4000] [Steps    9] [reward 10.0]
[Episode  887/4000] [Steps   10] [reward 11.0]
[Episode  888/4000] [Steps    8] [reward 9.0]
[Episode  889/4000] [Steps    9] [reward 10.0]
[Episode  890/4000] [Steps   10] [reward 11.0]
[Episode  891/4000] [Steps    9] [reward 10.0]
[Episode  892/4000] [Steps    7] [reward 8.0]
[Episode  893/4000] [Steps    8] [reward 9.0]
[Episode  894/4000] [Steps    9] [reward 10.0]
[Episode  895/4000] [Steps    9] [reward 10.0]
[Episode  896/4000] [Steps   41] [reward 42.0]
[Episode  897/4000] [Steps   11] [reward 12.0]
[Episode  898/4000] [Steps   14] [reward 15.0]
[Episode  899/4000] [Steps   11] [reward 12.0]
[Episode  900/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 900] [Average Reward 14.1]
----------
[Episode  901/4000] [Steps   15] [reward 16.0]
[Episode  902/4000] [Steps  199] [reward 200.0]
[Episode  903/4000] [Steps   33] [reward 34.0]
[Episode  904/4000] [Steps   62] [reward 63.0]
[Episode  905/4000] [Steps  199] [reward 200.0]
[Episode  906/4000] [Steps   38] [reward 39.0]
[Episode  907/4000] [Steps   85] [reward 86.0]
[Episode  908/4000] [Steps  199] [reward 200.0]
[Episode  909/4000] [Steps  199] [reward 200.0]
[Episode  910/4000] [Steps  199] [reward 200.0]
[Episode  911/4000] [Steps   60] [reward 61.0]
[Episode  912/4000] [Steps   15] [reward 16.0]
[Episode  913/4000] [Steps    7] [reward 8.0]
[Episode  914/4000] [Steps   96] [reward 97.0]
[Episode  915/4000] [Steps   27] [reward 28.0]
[Episode  916/4000] [Steps   23] [reward 24.0]
[Episode  917/4000] [Steps   11] [reward 12.0]
[Episode  918/4000] [Steps    9] [reward 10.0]
[Episode  919/4000] [Steps   11] [reward 12.0]
[Episode  920/4000] [Steps   11] [reward 12.0]
[Episode  921/4000] [Steps   11] [reward 12.0]
[Episode  922/4000] [Steps   16] [reward 17.0]
[Episode  923/4000] [Steps   31] [reward 32.0]
[Episode  924/4000] [Steps   74] [reward 75.0]
[Episode  925/4000] [Steps    8] [reward 9.0]
----------
[TEST Episode 925] [Average Reward 9.3]
----------
[Episode  926/4000] [Steps   12] [reward 13.0]
[Episode  927/4000] [Steps   19] [reward 20.0]
[Episode  928/4000] [Steps   12] [reward 13.0]
[Episode  929/4000] [Steps    8] [reward 9.0]
[Episode  930/4000] [Steps    8] [reward 9.0]
[Episode  931/4000] [Steps    8] [reward 9.0]
[Episode  932/4000] [Steps    8] [reward 9.0]
[Episode  933/4000] [Steps    9] [reward 10.0]
[Episode  934/4000] [Steps    9] [reward 10.0]
[Episode  935/4000] [Steps   81] [reward 82.0]
[Episode  936/4000] [Steps    9] [reward 10.0]
[Episode  937/4000] [Steps    9] [reward 10.0]
[Episode  938/4000] [Steps   11] [reward 12.0]
[Episode  939/4000] [Steps    8] [reward 9.0]
[Episode  940/4000] [Steps   10] [reward 11.0]
[Episode  941/4000] [Steps   42] [reward 43.0]
[Episode  942/4000] [Steps   14] [reward 15.0]
[Episode  943/4000] [Steps   15] [reward 16.0]
[Episode  944/4000] [Steps   15] [reward 16.0]
[Episode  945/4000] [Steps   12] [reward 13.0]
[Episode  946/4000] [Steps   10] [reward 11.0]
[Episode  947/4000] [Steps   14] [reward 15.0]
[Episode  948/4000] [Steps   20] [reward 21.0]
[Episode  949/4000] [Steps   15] [reward 16.0]
[Episode  950/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 950] [Average Reward 36.7]
----------
[Episode  951/4000] [Steps   25] [reward 26.0]
[Episode  952/4000] [Steps    9] [reward 10.0]
[Episode  953/4000] [Steps    8] [reward 9.0]
[Episode  954/4000] [Steps    9] [reward 10.0]
[Episode  955/4000] [Steps    8] [reward 9.0]
[Episode  956/4000] [Steps    9] [reward 10.0]
[Episode  957/4000] [Steps    9] [reward 10.0]
[Episode  958/4000] [Steps    9] [reward 10.0]
[Episode  959/4000] [Steps   10] [reward 11.0]
[Episode  960/4000] [Steps    8] [reward 9.0]
[Episode  961/4000] [Steps   12] [reward 13.0]
[Episode  962/4000] [Steps   10] [reward 11.0]
[Episode  963/4000] [Steps    9] [reward 10.0]
[Episode  964/4000] [Steps    9] [reward 10.0]
[Episode  965/4000] [Steps    8] [reward 9.0]
[Episode  966/4000] [Steps   12] [reward 13.0]
[Episode  967/4000] [Steps   16] [reward 17.0]
[Episode  968/4000] [Steps   15] [reward 16.0]
[Episode  969/4000] [Steps   18] [reward 19.0]
[Episode  970/4000] [Steps   16] [reward 17.0]
[Episode  971/4000] [Steps    7] [reward 8.0]
[Episode  972/4000] [Steps    8] [reward 9.0]
[Episode  973/4000] [Steps   11] [reward 12.0]
[Episode  974/4000] [Steps   12] [reward 13.0]
[Episode  975/4000] [Steps    9] [reward 10.0]
----------
[TEST Episode 975] [Average Reward 9.3]
----------
[Episode  976/4000] [Steps   17] [reward 18.0]
[Episode  977/4000] [Steps   16] [reward 17.0]
[Episode  978/4000] [Steps    9] [reward 10.0]
[Episode  979/4000] [Steps   32] [reward 33.0]
[Episode  980/4000] [Steps   35] [reward 36.0]
[Episode  981/4000] [Steps   66] [reward 67.0]
[Episode  982/4000] [Steps   91] [reward 92.0]
[Episode  983/4000] [Steps  168] [reward 169.0]
[Episode  984/4000] [Steps  169] [reward 170.0]
[Episode  985/4000] [Steps   42] [reward 43.0]
[Episode  986/4000] [Steps   62] [reward 63.0]
[Episode  987/4000] [Steps   62] [reward 63.0]
[Episode  988/4000] [Steps   56] [reward 57.0]
[Episode  989/4000] [Steps   57] [reward 58.0]
[Episode  990/4000] [Steps   38] [reward 39.0]
[Episode  991/4000] [Steps   31] [reward 32.0]
[Episode  992/4000] [Steps   90] [reward 91.0]
[Episode  993/4000] [Steps  125] [reward 126.0]
[Episode  994/4000] [Steps   53] [reward 54.0]
[Episode  995/4000] [Steps   27] [reward 28.0]
[Episode  996/4000] [Steps   10] [reward 11.0]
[Episode  997/4000] [Steps   25] [reward 26.0]
[Episode  998/4000] [Steps   24] [reward 25.0]
[Episode  999/4000] [Steps   19] [reward 20.0]
[Episode 1000/4000] [Steps   11] [reward 12.0]
----------
[TEST Episode 1000] [Average Reward 9.9]
----------
[Episode 1001/4000] [Steps  199] [reward 200.0]
[Episode 1002/4000] [Steps  120] [reward 121.0]
[Episode 1003/4000] [Steps  199] [reward 200.0]
[Episode 1004/4000] [Steps   13] [reward 14.0]
[Episode 1005/4000] [Steps   15] [reward 16.0]
[Episode 1006/4000] [Steps   12] [reward 13.0]
[Episode 1007/4000] [Steps    8] [reward 9.0]
[Episode 1008/4000] [Steps    8] [reward 9.0]
[Episode 1009/4000] [Steps   11] [reward 12.0]
[Episode 1010/4000] [Steps    7] [reward 8.0]
[Episode 1011/4000] [Steps    9] [reward 10.0]
[Episode 1012/4000] [Steps   12] [reward 13.0]
[Episode 1013/4000] [Steps   13] [reward 14.0]
[Episode 1014/4000] [Steps   34] [reward 35.0]
[Episode 1015/4000] [Steps   76] [reward 77.0]
[Episode 1016/4000] [Steps   43] [reward 44.0]
[Episode 1017/4000] [Steps   69] [reward 70.0]
[Episode 1018/4000] [Steps   34] [reward 35.0]
[Episode 1019/4000] [Steps   29] [reward 30.0]
[Episode 1020/4000] [Steps   60] [reward 61.0]
[Episode 1021/4000] [Steps   50] [reward 51.0]
[Episode 1022/4000] [Steps   58] [reward 59.0]
[Episode 1023/4000] [Steps   99] [reward 100.0]
[Episode 1024/4000] [Steps   17] [reward 18.0]
[Episode 1025/4000] [Steps   98] [reward 99.0]
----------
[TEST Episode 1025] [Average Reward 19.8]
----------
[Episode 1026/4000] [Steps   24] [reward 25.0]
[Episode 1027/4000] [Steps   35] [reward 36.0]
[Episode 1028/4000] [Steps   33] [reward 34.0]
[Episode 1029/4000] [Steps   25] [reward 26.0]
[Episode 1030/4000] [Steps   18] [reward 19.0]
[Episode 1031/4000] [Steps   24] [reward 25.0]
[Episode 1032/4000] [Steps   42] [reward 43.0]
[Episode 1033/4000] [Steps   11] [reward 12.0]
[Episode 1034/4000] [Steps    9] [reward 10.0]
[Episode 1035/4000] [Steps    9] [reward 10.0]
[Episode 1036/4000] [Steps   11] [reward 12.0]
[Episode 1037/4000] [Steps   13] [reward 14.0]
[Episode 1038/4000] [Steps    8] [reward 9.0]
[Episode 1039/4000] [Steps    9] [reward 10.0]
[Episode 1040/4000] [Steps   59] [reward 60.0]
[Episode 1041/4000] [Steps  199] [reward 200.0]
[Episode 1042/4000] [Steps   99] [reward 100.0]
[Episode 1043/4000] [Steps  199] [reward 200.0]
[Episode 1044/4000] [Steps   49] [reward 50.0]
[Episode 1045/4000] [Steps  199] [reward 200.0]
[Episode 1046/4000] [Steps  199] [reward 200.0]
[Episode 1047/4000] [Steps  177] [reward 178.0]
[Episode 1048/4000] [Steps  199] [reward 200.0]
[Episode 1049/4000] [Steps  199] [reward 200.0]
[Episode 1050/4000] [Steps  199] [reward 200.0]
----------
saving model.
[TEST Episode 1050] [Average Reward 200.0]
----------
[Episode 1051/4000] [Steps  183] [reward 184.0]
[Episode 1052/4000] [Steps  107] [reward 108.0]
[Episode 1053/4000] [Steps   32] [reward 33.0]
[Episode 1054/4000] [Steps  199] [reward 200.0]
[Episode 1055/4000] [Steps   18] [reward 19.0]
[Episode 1056/4000] [Steps   12] [reward 13.0]
[Episode 1057/4000] [Steps   16] [reward 17.0]
[Episode 1058/4000] [Steps   16] [reward 17.0]
[Episode 1059/4000] [Steps    9] [reward 10.0]
[Episode 1060/4000] [Steps   10] [reward 11.0]
[Episode 1061/4000] [Steps  135] [reward 136.0]
[Episode 1062/4000] [Steps   11] [reward 12.0]
[Episode 1063/4000] [Steps   11] [reward 12.0]
[Episode 1064/4000] [Steps    8] [reward 9.0]
[Episode 1065/4000] [Steps   10] [reward 11.0]
[Episode 1066/4000] [Steps   10] [reward 11.0]
[Episode 1067/4000] [Steps    9] [reward 10.0]
[Episode 1068/4000] [Steps   11] [reward 12.0]
[Episode 1069/4000] [Steps    8] [reward 9.0]
[Episode 1070/4000] [Steps   14] [reward 15.0]
[Episode 1071/4000] [Steps   53] [reward 54.0]
[Episode 1072/4000] [Steps  199] [reward 200.0]
[Episode 1073/4000] [Steps  111] [reward 112.0]
[Episode 1074/4000] [Steps   35] [reward 36.0]
[Episode 1075/4000] [Steps   36] [reward 37.0]
----------
[TEST Episode 1075] [Average Reward 61.9]
----------
[Episode 1076/4000] [Steps  199] [reward 200.0]
[Episode 1077/4000] [Steps   78] [reward 79.0]
[Episode 1078/4000] [Steps   59] [reward 60.0]
[Episode 1079/4000] [Steps  103] [reward 104.0]
[Episode 1080/4000] [Steps  170] [reward 171.0]
[Episode 1081/4000] [Steps  199] [reward 200.0]
[Episode 1082/4000] [Steps   13] [reward 14.0]
[Episode 1083/4000] [Steps   10] [reward 11.0]
[Episode 1084/4000] [Steps    8] [reward 9.0]
[Episode 1085/4000] [Steps    8] [reward 9.0]
[Episode 1086/4000] [Steps    9] [reward 10.0]
[Episode 1087/4000] [Steps   12] [reward 13.0]
[Episode 1088/4000] [Steps    9] [reward 10.0]
[Episode 1089/4000] [Steps    9] [reward 10.0]
[Episode 1090/4000] [Steps  199] [reward 200.0]
[Episode 1091/4000] [Steps   11] [reward 12.0]
[Episode 1092/4000] [Steps   11] [reward 12.0]
[Episode 1093/4000] [Steps  199] [reward 200.0]
[Episode 1094/4000] [Steps   10] [reward 11.0]
[Episode 1095/4000] [Steps  199] [reward 200.0]
[Episode 1096/4000] [Steps   10] [reward 11.0]
[Episode 1097/4000] [Steps   13] [reward 14.0]
[Episode 1098/4000] [Steps   11] [reward 12.0]
[Episode 1099/4000] [Steps   13] [reward 14.0]
[Episode 1100/4000] [Steps   77] [reward 78.0]
----------
[TEST Episode 1100] [Average Reward 157.0]
----------
[Episode 1101/4000] [Steps  167] [reward 168.0]
[Episode 1102/4000] [Steps    9] [reward 10.0]
[Episode 1103/4000] [Steps    9] [reward 10.0]
[Episode 1104/4000] [Steps  139] [reward 140.0]
[Episode 1105/4000] [Steps   31] [reward 32.0]
[Episode 1106/4000] [Steps   17] [reward 18.0]
[Episode 1107/4000] [Steps    8] [reward 9.0]
[Episode 1108/4000] [Steps    8] [reward 9.0]
[Episode 1109/4000] [Steps    8] [reward 9.0]
[Episode 1110/4000] [Steps   10] [reward 11.0]
[Episode 1111/4000] [Steps   28] [reward 29.0]
[Episode 1112/4000] [Steps   71] [reward 72.0]
[Episode 1113/4000] [Steps   21] [reward 22.0]
[Episode 1114/4000] [Steps   58] [reward 59.0]
[Episode 1115/4000] [Steps   65] [reward 66.0]
[Episode 1116/4000] [Steps   70] [reward 71.0]
[Episode 1117/4000] [Steps  101] [reward 102.0]
[Episode 1118/4000] [Steps   25] [reward 26.0]
[Episode 1119/4000] [Steps   48] [reward 49.0]
[Episode 1120/4000] [Steps   40] [reward 41.0]
[Episode 1121/4000] [Steps   53] [reward 54.0]
[Episode 1122/4000] [Steps   44] [reward 45.0]
[Episode 1123/4000] [Steps   45] [reward 46.0]
[Episode 1124/4000] [Steps   54] [reward 55.0]
[Episode 1125/4000] [Steps   80] [reward 81.0]
----------
[TEST Episode 1125] [Average Reward 200.0]
----------
[Episode 1126/4000] [Steps  199] [reward 200.0]
[Episode 1127/4000] [Steps   17] [reward 18.0]
[Episode 1128/4000] [Steps  153] [reward 154.0]
[Episode 1129/4000] [Steps   18] [reward 19.0]
[Episode 1130/4000] [Steps   52] [reward 53.0]
[Episode 1131/4000] [Steps   67] [reward 68.0]
[Episode 1132/4000] [Steps  104] [reward 105.0]
[Episode 1133/4000] [Steps   14] [reward 15.0]
[Episode 1134/4000] [Steps   13] [reward 14.0]
[Episode 1135/4000] [Steps   15] [reward 16.0]
[Episode 1136/4000] [Steps   12] [reward 13.0]
[Episode 1137/4000] [Steps   14] [reward 15.0]
[Episode 1138/4000] [Steps   12] [reward 13.0]
[Episode 1139/4000] [Steps   15] [reward 16.0]
[Episode 1140/4000] [Steps   74] [reward 75.0]
[Episode 1141/4000] [Steps   96] [reward 97.0]
[Episode 1142/4000] [Steps   13] [reward 14.0]
[Episode 1143/4000] [Steps   28] [reward 29.0]
[Episode 1144/4000] [Steps   42] [reward 43.0]
[Episode 1145/4000] [Steps   20] [reward 21.0]
[Episode 1146/4000] [Steps   14] [reward 15.0]
[Episode 1147/4000] [Steps   11] [reward 12.0]
[Episode 1148/4000] [Steps    9] [reward 10.0]
[Episode 1149/4000] [Steps   12] [reward 13.0]
[Episode 1150/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1150] [Average Reward 18.3]
----------
[Episode 1151/4000] [Steps   29] [reward 30.0]
[Episode 1152/4000] [Steps  199] [reward 200.0]
[Episode 1153/4000] [Steps  178] [reward 179.0]
[Episode 1154/4000] [Steps  171] [reward 172.0]
[Episode 1155/4000] [Steps   84] [reward 85.0]
[Episode 1156/4000] [Steps   33] [reward 34.0]
[Episode 1157/4000] [Steps   55] [reward 56.0]
[Episode 1158/4000] [Steps  127] [reward 128.0]
[Episode 1159/4000] [Steps   51] [reward 52.0]
[Episode 1160/4000] [Steps   56] [reward 57.0]
[Episode 1161/4000] [Steps   35] [reward 36.0]
[Episode 1162/4000] [Steps   13] [reward 14.0]
[Episode 1163/4000] [Steps   66] [reward 67.0]
[Episode 1164/4000] [Steps   24] [reward 25.0]
[Episode 1165/4000] [Steps   14] [reward 15.0]
[Episode 1166/4000] [Steps   16] [reward 17.0]
[Episode 1167/4000] [Steps   52] [reward 53.0]
[Episode 1168/4000] [Steps   38] [reward 39.0]
[Episode 1169/4000] [Steps   48] [reward 49.0]
[Episode 1170/4000] [Steps   18] [reward 19.0]
[Episode 1171/4000] [Steps   11] [reward 12.0]
[Episode 1172/4000] [Steps   23] [reward 24.0]
[Episode 1173/4000] [Steps   71] [reward 72.0]
[Episode 1174/4000] [Steps   33] [reward 34.0]
[Episode 1175/4000] [Steps   96] [reward 97.0]
----------
[TEST Episode 1175] [Average Reward 22.7]
----------
[Episode 1176/4000] [Steps  111] [reward 112.0]
[Episode 1177/4000] [Steps   12] [reward 13.0]
[Episode 1178/4000] [Steps   19] [reward 20.0]
[Episode 1179/4000] [Steps  135] [reward 136.0]
[Episode 1180/4000] [Steps   96] [reward 97.0]
[Episode 1181/4000] [Steps  124] [reward 125.0]
[Episode 1182/4000] [Steps  122] [reward 123.0]
[Episode 1183/4000] [Steps   96] [reward 97.0]
[Episode 1184/4000] [Steps  142] [reward 143.0]
[Episode 1185/4000] [Steps  110] [reward 111.0]
[Episode 1186/4000] [Steps  153] [reward 154.0]
[Episode 1187/4000] [Steps  106] [reward 107.0]
[Episode 1188/4000] [Steps   90] [reward 91.0]
[Episode 1189/4000] [Steps  124] [reward 125.0]
[Episode 1190/4000] [Steps  139] [reward 140.0]
[Episode 1191/4000] [Steps  199] [reward 200.0]
[Episode 1192/4000] [Steps   10] [reward 11.0]
[Episode 1193/4000] [Steps   10] [reward 11.0]
[Episode 1194/4000] [Steps    9] [reward 10.0]
[Episode 1195/4000] [Steps    9] [reward 10.0]
[Episode 1196/4000] [Steps    9] [reward 10.0]
[Episode 1197/4000] [Steps   89] [reward 90.0]
[Episode 1198/4000] [Steps   14] [reward 15.0]
[Episode 1199/4000] [Steps   11] [reward 12.0]
[Episode 1200/4000] [Steps    9] [reward 10.0]
----------
[TEST Episode 1200] [Average Reward 10.7]
----------
[Episode 1201/4000] [Steps  189] [reward 190.0]
[Episode 1202/4000] [Steps  179] [reward 180.0]
[Episode 1203/4000] [Steps  193] [reward 194.0]
[Episode 1204/4000] [Steps  199] [reward 200.0]
[Episode 1205/4000] [Steps  145] [reward 146.0]
[Episode 1206/4000] [Steps  161] [reward 162.0]
[Episode 1207/4000] [Steps  199] [reward 200.0]
[Episode 1208/4000] [Steps  199] [reward 200.0]
[Episode 1209/4000] [Steps   24] [reward 25.0]
[Episode 1210/4000] [Steps   36] [reward 37.0]
[Episode 1211/4000] [Steps  199] [reward 200.0]
[Episode 1212/4000] [Steps  177] [reward 178.0]
[Episode 1213/4000] [Steps  149] [reward 150.0]
[Episode 1214/4000] [Steps  142] [reward 143.0]
[Episode 1215/4000] [Steps   38] [reward 39.0]
[Episode 1216/4000] [Steps  129] [reward 130.0]
[Episode 1217/4000] [Steps  180] [reward 181.0]
[Episode 1218/4000] [Steps  113] [reward 114.0]
[Episode 1219/4000] [Steps  140] [reward 141.0]
[Episode 1220/4000] [Steps  191] [reward 192.0]
[Episode 1221/4000] [Steps   31] [reward 32.0]
[Episode 1222/4000] [Steps    8] [reward 9.0]
[Episode 1223/4000] [Steps    8] [reward 9.0]
[Episode 1224/4000] [Steps   10] [reward 11.0]
[Episode 1225/4000] [Steps   41] [reward 42.0]
----------
[TEST Episode 1225] [Average Reward 37.6]
----------
[Episode 1226/4000] [Steps   44] [reward 45.0]
[Episode 1227/4000] [Steps   11] [reward 12.0]
[Episode 1228/4000] [Steps   11] [reward 12.0]
[Episode 1229/4000] [Steps   11] [reward 12.0]
[Episode 1230/4000] [Steps   10] [reward 11.0]
[Episode 1231/4000] [Steps   43] [reward 44.0]
[Episode 1232/4000] [Steps    9] [reward 10.0]
[Episode 1233/4000] [Steps   21] [reward 22.0]
[Episode 1234/4000] [Steps    9] [reward 10.0]
[Episode 1235/4000] [Steps   11] [reward 12.0]
[Episode 1236/4000] [Steps   12] [reward 13.0]
[Episode 1237/4000] [Steps   19] [reward 20.0]
[Episode 1238/4000] [Steps   12] [reward 13.0]
[Episode 1239/4000] [Steps   10] [reward 11.0]
[Episode 1240/4000] [Steps    9] [reward 10.0]
[Episode 1241/4000] [Steps  104] [reward 105.0]
[Episode 1242/4000] [Steps   24] [reward 25.0]
[Episode 1243/4000] [Steps  199] [reward 200.0]
[Episode 1244/4000] [Steps  199] [reward 200.0]
[Episode 1245/4000] [Steps  199] [reward 200.0]
[Episode 1246/4000] [Steps  199] [reward 200.0]
[Episode 1247/4000] [Steps  192] [reward 193.0]
[Episode 1248/4000] [Steps  132] [reward 133.0]
[Episode 1249/4000] [Steps  199] [reward 200.0]
[Episode 1250/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1250] [Average Reward 17.1]
----------
[Episode 1251/4000] [Steps   12] [reward 13.0]
[Episode 1252/4000] [Steps  199] [reward 200.0]
[Episode 1253/4000] [Steps   65] [reward 66.0]
[Episode 1254/4000] [Steps   35] [reward 36.0]
[Episode 1255/4000] [Steps   46] [reward 47.0]
[Episode 1256/4000] [Steps  199] [reward 200.0]
[Episode 1257/4000] [Steps   10] [reward 11.0]
[Episode 1258/4000] [Steps    8] [reward 9.0]
[Episode 1259/4000] [Steps   10] [reward 11.0]
[Episode 1260/4000] [Steps   10] [reward 11.0]
[Episode 1261/4000] [Steps  115] [reward 116.0]
[Episode 1262/4000] [Steps  125] [reward 126.0]
[Episode 1263/4000] [Steps  118] [reward 119.0]
[Episode 1264/4000] [Steps  111] [reward 112.0]
[Episode 1265/4000] [Steps   80] [reward 81.0]
[Episode 1266/4000] [Steps  103] [reward 104.0]
[Episode 1267/4000] [Steps  107] [reward 108.0]
[Episode 1268/4000] [Steps  112] [reward 113.0]
[Episode 1269/4000] [Steps  119] [reward 120.0]
[Episode 1270/4000] [Steps  127] [reward 128.0]
[Episode 1271/4000] [Steps  102] [reward 103.0]
[Episode 1272/4000] [Steps  106] [reward 107.0]
[Episode 1273/4000] [Steps  140] [reward 141.0]
[Episode 1274/4000] [Steps  103] [reward 104.0]
[Episode 1275/4000] [Steps   72] [reward 73.0]
----------
[TEST Episode 1275] [Average Reward 58.4]
----------
[Episode 1276/4000] [Steps   80] [reward 81.0]
[Episode 1277/4000] [Steps   71] [reward 72.0]
[Episode 1278/4000] [Steps   64] [reward 65.0]
[Episode 1279/4000] [Steps   57] [reward 58.0]
[Episode 1280/4000] [Steps   82] [reward 83.0]
[Episode 1281/4000] [Steps   84] [reward 85.0]
[Episode 1282/4000] [Steps   70] [reward 71.0]
[Episode 1283/4000] [Steps   92] [reward 93.0]
[Episode 1284/4000] [Steps  121] [reward 122.0]
[Episode 1285/4000] [Steps  101] [reward 102.0]
[Episode 1286/4000] [Steps   76] [reward 77.0]
[Episode 1287/4000] [Steps  109] [reward 110.0]
[Episode 1288/4000] [Steps  116] [reward 117.0]
[Episode 1289/4000] [Steps   88] [reward 89.0]
[Episode 1290/4000] [Steps   89] [reward 90.0]
[Episode 1291/4000] [Steps  105] [reward 106.0]
[Episode 1292/4000] [Steps  101] [reward 102.0]
[Episode 1293/4000] [Steps  115] [reward 116.0]
[Episode 1294/4000] [Steps  115] [reward 116.0]
[Episode 1295/4000] [Steps  121] [reward 122.0]
[Episode 1296/4000] [Steps   99] [reward 100.0]
[Episode 1297/4000] [Steps  131] [reward 132.0]
[Episode 1298/4000] [Steps  150] [reward 151.0]
[Episode 1299/4000] [Steps   92] [reward 93.0]
[Episode 1300/4000] [Steps   91] [reward 92.0]
----------
[TEST Episode 1300] [Average Reward 176.9]
----------
[Episode 1301/4000] [Steps   90] [reward 91.0]
[Episode 1302/4000] [Steps  117] [reward 118.0]
[Episode 1303/4000] [Steps   70] [reward 71.0]
[Episode 1304/4000] [Steps   64] [reward 65.0]
[Episode 1305/4000] [Steps   66] [reward 67.0]
[Episode 1306/4000] [Steps  142] [reward 143.0]
[Episode 1307/4000] [Steps   10] [reward 11.0]
[Episode 1308/4000] [Steps   15] [reward 16.0]
[Episode 1309/4000] [Steps   12] [reward 13.0]
[Episode 1310/4000] [Steps   12] [reward 13.0]
[Episode 1311/4000] [Steps  169] [reward 170.0]
[Episode 1312/4000] [Steps  107] [reward 108.0]
[Episode 1313/4000] [Steps   82] [reward 83.0]
[Episode 1314/4000] [Steps  161] [reward 162.0]
[Episode 1315/4000] [Steps   60] [reward 61.0]
[Episode 1316/4000] [Steps   64] [reward 65.0]
[Episode 1317/4000] [Steps   67] [reward 68.0]
[Episode 1318/4000] [Steps  129] [reward 130.0]
[Episode 1319/4000] [Steps   80] [reward 81.0]
[Episode 1320/4000] [Steps   64] [reward 65.0]
[Episode 1321/4000] [Steps  122] [reward 123.0]
[Episode 1322/4000] [Steps   98] [reward 99.0]
[Episode 1323/4000] [Steps  103] [reward 104.0]
[Episode 1324/4000] [Steps   48] [reward 49.0]
[Episode 1325/4000] [Steps   40] [reward 41.0]
----------
[TEST Episode 1325] [Average Reward 45.2]
----------
[Episode 1326/4000] [Steps   41] [reward 42.0]
[Episode 1327/4000] [Steps   54] [reward 55.0]
[Episode 1328/4000] [Steps   49] [reward 50.0]
[Episode 1329/4000] [Steps   59] [reward 60.0]
[Episode 1330/4000] [Steps   35] [reward 36.0]
[Episode 1331/4000] [Steps   46] [reward 47.0]
[Episode 1332/4000] [Steps   17] [reward 18.0]
[Episode 1333/4000] [Steps   10] [reward 11.0]
[Episode 1334/4000] [Steps    8] [reward 9.0]
[Episode 1335/4000] [Steps   17] [reward 18.0]
[Episode 1336/4000] [Steps    9] [reward 10.0]
[Episode 1337/4000] [Steps    8] [reward 9.0]
[Episode 1338/4000] [Steps    9] [reward 10.0]
[Episode 1339/4000] [Steps    8] [reward 9.0]
[Episode 1340/4000] [Steps   15] [reward 16.0]
[Episode 1341/4000] [Steps   10] [reward 11.0]
[Episode 1342/4000] [Steps   15] [reward 16.0]
[Episode 1343/4000] [Steps   11] [reward 12.0]
[Episode 1344/4000] [Steps    9] [reward 10.0]
[Episode 1345/4000] [Steps    8] [reward 9.0]
[Episode 1346/4000] [Steps    8] [reward 9.0]
[Episode 1347/4000] [Steps   10] [reward 11.0]
[Episode 1348/4000] [Steps   12] [reward 13.0]
[Episode 1349/4000] [Steps    8] [reward 9.0]
[Episode 1350/4000] [Steps   11] [reward 12.0]
----------
[TEST Episode 1350] [Average Reward 10.1]
----------
[Episode 1351/4000] [Steps    9] [reward 10.0]
[Episode 1352/4000] [Steps    9] [reward 10.0]
[Episode 1353/4000] [Steps    8] [reward 9.0]
[Episode 1354/4000] [Steps   13] [reward 14.0]
[Episode 1355/4000] [Steps  175] [reward 176.0]
[Episode 1356/4000] [Steps  136] [reward 137.0]
[Episode 1357/4000] [Steps   91] [reward 92.0]
[Episode 1358/4000] [Steps   82] [reward 83.0]
[Episode 1359/4000] [Steps  114] [reward 115.0]
[Episode 1360/4000] [Steps  159] [reward 160.0]
[Episode 1361/4000] [Steps  123] [reward 124.0]
[Episode 1362/4000] [Steps   47] [reward 48.0]
[Episode 1363/4000] [Steps  199] [reward 200.0]
[Episode 1364/4000] [Steps   10] [reward 11.0]
[Episode 1365/4000] [Steps   11] [reward 12.0]
[Episode 1366/4000] [Steps   10] [reward 11.0]
[Episode 1367/4000] [Steps    9] [reward 10.0]
[Episode 1368/4000] [Steps   10] [reward 11.0]
[Episode 1369/4000] [Steps   11] [reward 12.0]
[Episode 1370/4000] [Steps    8] [reward 9.0]
[Episode 1371/4000] [Steps   36] [reward 37.0]
[Episode 1372/4000] [Steps    9] [reward 10.0]
[Episode 1373/4000] [Steps    8] [reward 9.0]
[Episode 1374/4000] [Steps    8] [reward 9.0]
[Episode 1375/4000] [Steps    8] [reward 9.0]
----------
[TEST Episode 1375] [Average Reward 10.3]
----------
[Episode 1376/4000] [Steps   14] [reward 15.0]
[Episode 1377/4000] [Steps    9] [reward 10.0]
[Episode 1378/4000] [Steps   10] [reward 11.0]
[Episode 1379/4000] [Steps    8] [reward 9.0]
[Episode 1380/4000] [Steps   10] [reward 11.0]
[Episode 1381/4000] [Steps    8] [reward 9.0]
[Episode 1382/4000] [Steps   23] [reward 24.0]
[Episode 1383/4000] [Steps    9] [reward 10.0]
[Episode 1384/4000] [Steps   10] [reward 11.0]
[Episode 1385/4000] [Steps   33] [reward 34.0]
[Episode 1386/4000] [Steps   11] [reward 12.0]
[Episode 1387/4000] [Steps  132] [reward 133.0]
[Episode 1388/4000] [Steps   59] [reward 60.0]
[Episode 1389/4000] [Steps   48] [reward 49.0]
[Episode 1390/4000] [Steps   43] [reward 44.0]
[Episode 1391/4000] [Steps    9] [reward 10.0]
[Episode 1392/4000] [Steps   12] [reward 13.0]
[Episode 1393/4000] [Steps   49] [reward 50.0]
[Episode 1394/4000] [Steps   13] [reward 14.0]
[Episode 1395/4000] [Steps   14] [reward 15.0]
[Episode 1396/4000] [Steps   10] [reward 11.0]
[Episode 1397/4000] [Steps   46] [reward 47.0]
[Episode 1398/4000] [Steps   26] [reward 27.0]
[Episode 1399/4000] [Steps   23] [reward 24.0]
[Episode 1400/4000] [Steps   30] [reward 31.0]
----------
[TEST Episode 1400] [Average Reward 18.5]
----------
[Episode 1401/4000] [Steps   15] [reward 16.0]
[Episode 1402/4000] [Steps   43] [reward 44.0]
[Episode 1403/4000] [Steps   52] [reward 53.0]
[Episode 1404/4000] [Steps   91] [reward 92.0]
[Episode 1405/4000] [Steps   39] [reward 40.0]
[Episode 1406/4000] [Steps   70] [reward 71.0]
[Episode 1407/4000] [Steps   68] [reward 69.0]
[Episode 1408/4000] [Steps   12] [reward 13.0]
[Episode 1409/4000] [Steps   22] [reward 23.0]
[Episode 1410/4000] [Steps   71] [reward 72.0]
[Episode 1411/4000] [Steps  120] [reward 121.0]
[Episode 1412/4000] [Steps   75] [reward 76.0]
[Episode 1413/4000] [Steps  110] [reward 111.0]
[Episode 1414/4000] [Steps   55] [reward 56.0]
[Episode 1415/4000] [Steps  119] [reward 120.0]
[Episode 1416/4000] [Steps   16] [reward 17.0]
[Episode 1417/4000] [Steps   58] [reward 59.0]
[Episode 1418/4000] [Steps   45] [reward 46.0]
[Episode 1419/4000] [Steps   16] [reward 17.0]
[Episode 1420/4000] [Steps   66] [reward 67.0]
[Episode 1421/4000] [Steps  106] [reward 107.0]
[Episode 1422/4000] [Steps   31] [reward 32.0]
[Episode 1423/4000] [Steps   93] [reward 94.0]
[Episode 1424/4000] [Steps   93] [reward 94.0]
[Episode 1425/4000] [Steps   98] [reward 99.0]
----------
[TEST Episode 1425] [Average Reward 15.2]
----------
[Episode 1426/4000] [Steps   97] [reward 98.0]
[Episode 1427/4000] [Steps  132] [reward 133.0]
[Episode 1428/4000] [Steps   91] [reward 92.0]
[Episode 1429/4000] [Steps   14] [reward 15.0]
[Episode 1430/4000] [Steps   12] [reward 13.0]
[Episode 1431/4000] [Steps   13] [reward 14.0]
[Episode 1432/4000] [Steps   55] [reward 56.0]
[Episode 1433/4000] [Steps   47] [reward 48.0]
[Episode 1434/4000] [Steps   75] [reward 76.0]
[Episode 1435/4000] [Steps   10] [reward 11.0]
[Episode 1436/4000] [Steps   10] [reward 11.0]
[Episode 1437/4000] [Steps   11] [reward 12.0]
[Episode 1438/4000] [Steps    8] [reward 9.0]
[Episode 1439/4000] [Steps   15] [reward 16.0]
[Episode 1440/4000] [Steps   11] [reward 12.0]
[Episode 1441/4000] [Steps   40] [reward 41.0]
[Episode 1442/4000] [Steps  129] [reward 130.0]
[Episode 1443/4000] [Steps  199] [reward 200.0]
[Episode 1444/4000] [Steps   14] [reward 15.0]
[Episode 1445/4000] [Steps  154] [reward 155.0]
[Episode 1446/4000] [Steps   10] [reward 11.0]
[Episode 1447/4000] [Steps   15] [reward 16.0]
[Episode 1448/4000] [Steps   12] [reward 13.0]
[Episode 1449/4000] [Steps   15] [reward 16.0]
[Episode 1450/4000] [Steps   88] [reward 89.0]
----------
[TEST Episode 1450] [Average Reward 97.1]
----------
[Episode 1451/4000] [Steps  127] [reward 128.0]
[Episode 1452/4000] [Steps  199] [reward 200.0]
[Episode 1453/4000] [Steps  156] [reward 157.0]
[Episode 1454/4000] [Steps   12] [reward 13.0]
[Episode 1455/4000] [Steps   12] [reward 13.0]
[Episode 1456/4000] [Steps  199] [reward 200.0]
[Episode 1457/4000] [Steps  192] [reward 193.0]
[Episode 1458/4000] [Steps  199] [reward 200.0]
[Episode 1459/4000] [Steps  129] [reward 130.0]
[Episode 1460/4000] [Steps  199] [reward 200.0]
[Episode 1461/4000] [Steps  199] [reward 200.0]
[Episode 1462/4000] [Steps  186] [reward 187.0]
[Episode 1463/4000] [Steps  199] [reward 200.0]
[Episode 1464/4000] [Steps  110] [reward 111.0]
[Episode 1465/4000] [Steps   75] [reward 76.0]
[Episode 1466/4000] [Steps    9] [reward 10.0]
[Episode 1467/4000] [Steps  199] [reward 200.0]
[Episode 1468/4000] [Steps   48] [reward 49.0]
[Episode 1469/4000] [Steps   23] [reward 24.0]
[Episode 1470/4000] [Steps   77] [reward 78.0]
[Episode 1471/4000] [Steps   76] [reward 77.0]
[Episode 1472/4000] [Steps   89] [reward 90.0]
[Episode 1473/4000] [Steps  153] [reward 154.0]
[Episode 1474/4000] [Steps  102] [reward 103.0]
[Episode 1475/4000] [Steps  132] [reward 133.0]
----------
[TEST Episode 1475] [Average Reward 140.8]
----------
[Episode 1476/4000] [Steps  163] [reward 164.0]
[Episode 1477/4000] [Steps  199] [reward 200.0]
[Episode 1478/4000] [Steps   91] [reward 92.0]
[Episode 1479/4000] [Steps  199] [reward 200.0]
[Episode 1480/4000] [Steps  172] [reward 173.0]
[Episode 1481/4000] [Steps  199] [reward 200.0]
[Episode 1482/4000] [Steps   18] [reward 19.0]
[Episode 1483/4000] [Steps   31] [reward 32.0]
[Episode 1484/4000] [Steps  199] [reward 200.0]
[Episode 1485/4000] [Steps  139] [reward 140.0]
[Episode 1486/4000] [Steps  199] [reward 200.0]
[Episode 1487/4000] [Steps   69] [reward 70.0]
[Episode 1488/4000] [Steps  118] [reward 119.0]
[Episode 1489/4000] [Steps   12] [reward 13.0]
[Episode 1490/4000] [Steps   12] [reward 13.0]
[Episode 1491/4000] [Steps   25] [reward 26.0]
[Episode 1492/4000] [Steps  103] [reward 104.0]
[Episode 1493/4000] [Steps  147] [reward 148.0]
[Episode 1494/4000] [Steps  140] [reward 141.0]
[Episode 1495/4000] [Steps  154] [reward 155.0]
[Episode 1496/4000] [Steps  106] [reward 107.0]
[Episode 1497/4000] [Steps  110] [reward 111.0]
[Episode 1498/4000] [Steps  123] [reward 124.0]
[Episode 1499/4000] [Steps   13] [reward 14.0]
[Episode 1500/4000] [Steps   28] [reward 29.0]
----------
[TEST Episode 1500] [Average Reward 73.1]
----------
[Episode 1501/4000] [Steps   48] [reward 49.0]
[Episode 1502/4000] [Steps   11] [reward 12.0]
[Episode 1503/4000] [Steps   33] [reward 34.0]
[Episode 1504/4000] [Steps   63] [reward 64.0]
[Episode 1505/4000] [Steps  141] [reward 142.0]
[Episode 1506/4000] [Steps  143] [reward 144.0]
[Episode 1507/4000] [Steps   78] [reward 79.0]
[Episode 1508/4000] [Steps  148] [reward 149.0]
[Episode 1509/4000] [Steps   53] [reward 54.0]
[Episode 1510/4000] [Steps   54] [reward 55.0]
[Episode 1511/4000] [Steps  199] [reward 200.0]
[Episode 1512/4000] [Steps  199] [reward 200.0]
[Episode 1513/4000] [Steps   59] [reward 60.0]
[Episode 1514/4000] [Steps  199] [reward 200.0]
[Episode 1515/4000] [Steps  199] [reward 200.0]
[Episode 1516/4000] [Steps  199] [reward 200.0]
[Episode 1517/4000] [Steps   20] [reward 21.0]
[Episode 1518/4000] [Steps    8] [reward 9.0]
[Episode 1519/4000] [Steps   43] [reward 44.0]
[Episode 1520/4000] [Steps  199] [reward 200.0]
[Episode 1521/4000] [Steps   91] [reward 92.0]
[Episode 1522/4000] [Steps  199] [reward 200.0]
[Episode 1523/4000] [Steps  199] [reward 200.0]
[Episode 1524/4000] [Steps  140] [reward 141.0]
[Episode 1525/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1525] [Average Reward 200.0]
----------
[Episode 1526/4000] [Steps  105] [reward 106.0]
[Episode 1527/4000] [Steps    9] [reward 10.0]
[Episode 1528/4000] [Steps   14] [reward 15.0]
[Episode 1529/4000] [Steps    9] [reward 10.0]
[Episode 1530/4000] [Steps   35] [reward 36.0]
[Episode 1531/4000] [Steps   74] [reward 75.0]
[Episode 1532/4000] [Steps   10] [reward 11.0]
[Episode 1533/4000] [Steps   10] [reward 11.0]
[Episode 1534/4000] [Steps   10] [reward 11.0]
[Episode 1535/4000] [Steps    9] [reward 10.0]
[Episode 1536/4000] [Steps    9] [reward 10.0]
[Episode 1537/4000] [Steps   10] [reward 11.0]
[Episode 1538/4000] [Steps    8] [reward 9.0]
[Episode 1539/4000] [Steps    9] [reward 10.0]
[Episode 1540/4000] [Steps    8] [reward 9.0]
[Episode 1541/4000] [Steps  113] [reward 114.0]
[Episode 1542/4000] [Steps  110] [reward 111.0]
[Episode 1543/4000] [Steps  199] [reward 200.0]
[Episode 1544/4000] [Steps    9] [reward 10.0]
[Episode 1545/4000] [Steps   10] [reward 11.0]
[Episode 1546/4000] [Steps  199] [reward 200.0]
[Episode 1547/4000] [Steps  199] [reward 200.0]
[Episode 1548/4000] [Steps  150] [reward 151.0]
[Episode 1549/4000] [Steps  199] [reward 200.0]
[Episode 1550/4000] [Steps  137] [reward 138.0]
----------
[TEST Episode 1550] [Average Reward 115.6]
----------
[Episode 1551/4000] [Steps   71] [reward 72.0]
[Episode 1552/4000] [Steps   45] [reward 46.0]
[Episode 1553/4000] [Steps   82] [reward 83.0]
[Episode 1554/4000] [Steps  116] [reward 117.0]
[Episode 1555/4000] [Steps  199] [reward 200.0]
[Episode 1556/4000] [Steps  107] [reward 108.0]
[Episode 1557/4000] [Steps  163] [reward 164.0]
[Episode 1558/4000] [Steps   86] [reward 87.0]
[Episode 1559/4000] [Steps  199] [reward 200.0]
[Episode 1560/4000] [Steps   39] [reward 40.0]
[Episode 1561/4000] [Steps   57] [reward 58.0]
[Episode 1562/4000] [Steps   45] [reward 46.0]
[Episode 1563/4000] [Steps   10] [reward 11.0]
[Episode 1564/4000] [Steps    8] [reward 9.0]
[Episode 1565/4000] [Steps   34] [reward 35.0]
[Episode 1566/4000] [Steps   32] [reward 33.0]
[Episode 1567/4000] [Steps   34] [reward 35.0]
[Episode 1568/4000] [Steps   42] [reward 43.0]
[Episode 1569/4000] [Steps   53] [reward 54.0]
[Episode 1570/4000] [Steps    8] [reward 9.0]
[Episode 1571/4000] [Steps   49] [reward 50.0]
[Episode 1572/4000] [Steps   48] [reward 49.0]
[Episode 1573/4000] [Steps   50] [reward 51.0]
[Episode 1574/4000] [Steps   79] [reward 80.0]
[Episode 1575/4000] [Steps   36] [reward 37.0]
----------
[TEST Episode 1575] [Average Reward 21.6]
----------
[Episode 1576/4000] [Steps   78] [reward 79.0]
[Episode 1577/4000] [Steps  108] [reward 109.0]
[Episode 1578/4000] [Steps   87] [reward 88.0]
[Episode 1579/4000] [Steps   63] [reward 64.0]
[Episode 1580/4000] [Steps   78] [reward 79.0]
[Episode 1581/4000] [Steps   49] [reward 50.0]
[Episode 1582/4000] [Steps  128] [reward 129.0]
[Episode 1583/4000] [Steps   70] [reward 71.0]
[Episode 1584/4000] [Steps   58] [reward 59.0]
[Episode 1585/4000] [Steps   39] [reward 40.0]
[Episode 1586/4000] [Steps   15] [reward 16.0]
[Episode 1587/4000] [Steps   19] [reward 20.0]
[Episode 1588/4000] [Steps   13] [reward 14.0]
[Episode 1589/4000] [Steps   29] [reward 30.0]
[Episode 1590/4000] [Steps   33] [reward 34.0]
[Episode 1591/4000] [Steps   42] [reward 43.0]
[Episode 1592/4000] [Steps   16] [reward 17.0]
[Episode 1593/4000] [Steps   15] [reward 16.0]
[Episode 1594/4000] [Steps   41] [reward 42.0]
[Episode 1595/4000] [Steps   39] [reward 40.0]
[Episode 1596/4000] [Steps   36] [reward 37.0]
[Episode 1597/4000] [Steps   21] [reward 22.0]
[Episode 1598/4000] [Steps   22] [reward 23.0]
[Episode 1599/4000] [Steps   17] [reward 18.0]
[Episode 1600/4000] [Steps   18] [reward 19.0]
----------
[TEST Episode 1600] [Average Reward 19.5]
----------
[Episode 1601/4000] [Steps   58] [reward 59.0]
[Episode 1602/4000] [Steps   39] [reward 40.0]
[Episode 1603/4000] [Steps   18] [reward 19.0]
[Episode 1604/4000] [Steps   41] [reward 42.0]
[Episode 1605/4000] [Steps   17] [reward 18.0]
[Episode 1606/4000] [Steps   28] [reward 29.0]
[Episode 1607/4000] [Steps   14] [reward 15.0]
[Episode 1608/4000] [Steps   34] [reward 35.0]
[Episode 1609/4000] [Steps    9] [reward 10.0]
[Episode 1610/4000] [Steps   10] [reward 11.0]
[Episode 1611/4000] [Steps    9] [reward 10.0]
[Episode 1612/4000] [Steps   11] [reward 12.0]
[Episode 1613/4000] [Steps    9] [reward 10.0]
[Episode 1614/4000] [Steps    7] [reward 8.0]
[Episode 1615/4000] [Steps   32] [reward 33.0]
[Episode 1616/4000] [Steps   10] [reward 11.0]
[Episode 1617/4000] [Steps   11] [reward 12.0]
[Episode 1618/4000] [Steps    9] [reward 10.0]
[Episode 1619/4000] [Steps   16] [reward 17.0]
[Episode 1620/4000] [Steps   14] [reward 15.0]
[Episode 1621/4000] [Steps   62] [reward 63.0]
[Episode 1622/4000] [Steps   10] [reward 11.0]
[Episode 1623/4000] [Steps   21] [reward 22.0]
[Episode 1624/4000] [Steps    8] [reward 9.0]
[Episode 1625/4000] [Steps   43] [reward 44.0]
----------
[TEST Episode 1625] [Average Reward 33.5]
----------
[Episode 1626/4000] [Steps   47] [reward 48.0]
[Episode 1627/4000] [Steps  199] [reward 200.0]
[Episode 1628/4000] [Steps  179] [reward 180.0]
[Episode 1629/4000] [Steps  199] [reward 200.0]
[Episode 1630/4000] [Steps  199] [reward 200.0]
[Episode 1631/4000] [Steps   37] [reward 38.0]
[Episode 1632/4000] [Steps   49] [reward 50.0]
[Episode 1633/4000] [Steps   10] [reward 11.0]
[Episode 1634/4000] [Steps    9] [reward 10.0]
[Episode 1635/4000] [Steps    9] [reward 10.0]
[Episode 1636/4000] [Steps    9] [reward 10.0]
[Episode 1637/4000] [Steps   10] [reward 11.0]
[Episode 1638/4000] [Steps    9] [reward 10.0]
[Episode 1639/4000] [Steps    8] [reward 9.0]
[Episode 1640/4000] [Steps   13] [reward 14.0]
[Episode 1641/4000] [Steps   15] [reward 16.0]
[Episode 1642/4000] [Steps   35] [reward 36.0]
[Episode 1643/4000] [Steps   23] [reward 24.0]
[Episode 1644/4000] [Steps   28] [reward 29.0]
[Episode 1645/4000] [Steps   35] [reward 36.0]
[Episode 1646/4000] [Steps   21] [reward 22.0]
[Episode 1647/4000] [Steps   37] [reward 38.0]
[Episode 1648/4000] [Steps   22] [reward 23.0]
[Episode 1649/4000] [Steps   20] [reward 21.0]
[Episode 1650/4000] [Steps   20] [reward 21.0]
----------
[TEST Episode 1650] [Average Reward 24.5]
----------
[Episode 1651/4000] [Steps   28] [reward 29.0]
[Episode 1652/4000] [Steps   35] [reward 36.0]
[Episode 1653/4000] [Steps   70] [reward 71.0]
[Episode 1654/4000] [Steps   52] [reward 53.0]
[Episode 1655/4000] [Steps   82] [reward 83.0]
[Episode 1656/4000] [Steps   25] [reward 26.0]
[Episode 1657/4000] [Steps   20] [reward 21.0]
[Episode 1658/4000] [Steps   22] [reward 23.0]
[Episode 1659/4000] [Steps   28] [reward 29.0]
[Episode 1660/4000] [Steps   70] [reward 71.0]
[Episode 1661/4000] [Steps   39] [reward 40.0]
[Episode 1662/4000] [Steps   27] [reward 28.0]
[Episode 1663/4000] [Steps   32] [reward 33.0]
[Episode 1664/4000] [Steps   31] [reward 32.0]
[Episode 1665/4000] [Steps   32] [reward 33.0]
[Episode 1666/4000] [Steps   66] [reward 67.0]
[Episode 1667/4000] [Steps   49] [reward 50.0]
[Episode 1668/4000] [Steps   96] [reward 97.0]
[Episode 1669/4000] [Steps  108] [reward 109.0]
[Episode 1670/4000] [Steps   70] [reward 71.0]
[Episode 1671/4000] [Steps   86] [reward 87.0]
[Episode 1672/4000] [Steps  144] [reward 145.0]
[Episode 1673/4000] [Steps   38] [reward 39.0]
[Episode 1674/4000] [Steps  179] [reward 180.0]
[Episode 1675/4000] [Steps   94] [reward 95.0]
----------
[TEST Episode 1675] [Average Reward 127.8]
----------
[Episode 1676/4000] [Steps   10] [reward 11.0]
[Episode 1677/4000] [Steps  199] [reward 200.0]
[Episode 1678/4000] [Steps    9] [reward 10.0]
[Episode 1679/4000] [Steps   10] [reward 11.0]
[Episode 1680/4000] [Steps   86] [reward 87.0]
[Episode 1681/4000] [Steps   98] [reward 99.0]
[Episode 1682/4000] [Steps   19] [reward 20.0]
[Episode 1683/4000] [Steps  126] [reward 127.0]
[Episode 1684/4000] [Steps  132] [reward 133.0]
[Episode 1685/4000] [Steps  160] [reward 161.0]
[Episode 1686/4000] [Steps  119] [reward 120.0]
[Episode 1687/4000] [Steps  199] [reward 200.0]
[Episode 1688/4000] [Steps  195] [reward 196.0]
[Episode 1689/4000] [Steps  199] [reward 200.0]
[Episode 1690/4000] [Steps  186] [reward 187.0]
[Episode 1691/4000] [Steps  199] [reward 200.0]
[Episode 1692/4000] [Steps  128] [reward 129.0]
[Episode 1693/4000] [Steps   11] [reward 12.0]
[Episode 1694/4000] [Steps   15] [reward 16.0]
[Episode 1695/4000] [Steps    9] [reward 10.0]
[Episode 1696/4000] [Steps   10] [reward 11.0]
[Episode 1697/4000] [Steps    9] [reward 10.0]
[Episode 1698/4000] [Steps    9] [reward 10.0]
[Episode 1699/4000] [Steps  141] [reward 142.0]
[Episode 1700/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1700] [Average Reward 67.5]
----------
[Episode 1701/4000] [Steps   91] [reward 92.0]
[Episode 1702/4000] [Steps  199] [reward 200.0]
[Episode 1703/4000] [Steps  170] [reward 171.0]
[Episode 1704/4000] [Steps  166] [reward 167.0]
[Episode 1705/4000] [Steps  175] [reward 176.0]
[Episode 1706/4000] [Steps  170] [reward 171.0]
[Episode 1707/4000] [Steps  199] [reward 200.0]
[Episode 1708/4000] [Steps  199] [reward 200.0]
[Episode 1709/4000] [Steps  161] [reward 162.0]
[Episode 1710/4000] [Steps  185] [reward 186.0]
[Episode 1711/4000] [Steps  143] [reward 144.0]
[Episode 1712/4000] [Steps  170] [reward 171.0]
[Episode 1713/4000] [Steps  183] [reward 184.0]
[Episode 1714/4000] [Steps  199] [reward 200.0]
[Episode 1715/4000] [Steps  111] [reward 112.0]
[Episode 1716/4000] [Steps   35] [reward 36.0]
[Episode 1717/4000] [Steps  156] [reward 157.0]
[Episode 1718/4000] [Steps  113] [reward 114.0]
[Episode 1719/4000] [Steps   12] [reward 13.0]
[Episode 1720/4000] [Steps  188] [reward 189.0]
[Episode 1721/4000] [Steps   86] [reward 87.0]
[Episode 1722/4000] [Steps  105] [reward 106.0]
[Episode 1723/4000] [Steps   94] [reward 95.0]
[Episode 1724/4000] [Steps   77] [reward 78.0]
[Episode 1725/4000] [Steps  102] [reward 103.0]
----------
[TEST Episode 1725] [Average Reward 85.0]
----------
[Episode 1726/4000] [Steps   88] [reward 89.0]
[Episode 1727/4000] [Steps  199] [reward 200.0]
[Episode 1728/4000] [Steps  104] [reward 105.0]
[Episode 1729/4000] [Steps  105] [reward 106.0]
[Episode 1730/4000] [Steps   51] [reward 52.0]
[Episode 1731/4000] [Steps   70] [reward 71.0]
[Episode 1732/4000] [Steps   81] [reward 82.0]
[Episode 1733/4000] [Steps   77] [reward 78.0]
[Episode 1734/4000] [Steps   84] [reward 85.0]
[Episode 1735/4000] [Steps   93] [reward 94.0]
[Episode 1736/4000] [Steps   95] [reward 96.0]
[Episode 1737/4000] [Steps   78] [reward 79.0]
[Episode 1738/4000] [Steps   93] [reward 94.0]
[Episode 1739/4000] [Steps  109] [reward 110.0]
[Episode 1740/4000] [Steps   95] [reward 96.0]
[Episode 1741/4000] [Steps   73] [reward 74.0]
[Episode 1742/4000] [Steps   87] [reward 88.0]
[Episode 1743/4000] [Steps   52] [reward 53.0]
[Episode 1744/4000] [Steps   85] [reward 86.0]
[Episode 1745/4000] [Steps   60] [reward 61.0]
[Episode 1746/4000] [Steps   22] [reward 23.0]
[Episode 1747/4000] [Steps   10] [reward 11.0]
[Episode 1748/4000] [Steps   10] [reward 11.0]
[Episode 1749/4000] [Steps  136] [reward 137.0]
[Episode 1750/4000] [Steps    9] [reward 10.0]
----------
[TEST Episode 1750] [Average Reward 9.0]
----------
[Episode 1751/4000] [Steps    7] [reward 8.0]
[Episode 1752/4000] [Steps   10] [reward 11.0]
[Episode 1753/4000] [Steps    9] [reward 10.0]
[Episode 1754/4000] [Steps   11] [reward 12.0]
[Episode 1755/4000] [Steps   32] [reward 33.0]
[Episode 1756/4000] [Steps   11] [reward 12.0]
[Episode 1757/4000] [Steps    9] [reward 10.0]
[Episode 1758/4000] [Steps   16] [reward 17.0]
[Episode 1759/4000] [Steps   11] [reward 12.0]
[Episode 1760/4000] [Steps   16] [reward 17.0]
[Episode 1761/4000] [Steps  106] [reward 107.0]
[Episode 1762/4000] [Steps  146] [reward 147.0]
[Episode 1763/4000] [Steps   55] [reward 56.0]
[Episode 1764/4000] [Steps  103] [reward 104.0]
[Episode 1765/4000] [Steps   31] [reward 32.0]
[Episode 1766/4000] [Steps  133] [reward 134.0]
[Episode 1767/4000] [Steps   92] [reward 93.0]
[Episode 1768/4000] [Steps   11] [reward 12.0]
[Episode 1769/4000] [Steps   49] [reward 50.0]
[Episode 1770/4000] [Steps   74] [reward 75.0]
[Episode 1771/4000] [Steps   51] [reward 52.0]
[Episode 1772/4000] [Steps   63] [reward 64.0]
[Episode 1773/4000] [Steps   77] [reward 78.0]
[Episode 1774/4000] [Steps   49] [reward 50.0]
[Episode 1775/4000] [Steps   81] [reward 82.0]
----------
[TEST Episode 1775] [Average Reward 87.0]
----------
[Episode 1776/4000] [Steps   60] [reward 61.0]
[Episode 1777/4000] [Steps  121] [reward 122.0]
[Episode 1778/4000] [Steps  143] [reward 144.0]
[Episode 1779/4000] [Steps  154] [reward 155.0]
[Episode 1780/4000] [Steps  117] [reward 118.0]
[Episode 1781/4000] [Steps  133] [reward 134.0]
[Episode 1782/4000] [Steps   78] [reward 79.0]
[Episode 1783/4000] [Steps   72] [reward 73.0]
[Episode 1784/4000] [Steps   67] [reward 68.0]
[Episode 1785/4000] [Steps   64] [reward 65.0]
[Episode 1786/4000] [Steps   10] [reward 11.0]
[Episode 1787/4000] [Steps   63] [reward 64.0]
[Episode 1788/4000] [Steps   13] [reward 14.0]
[Episode 1789/4000] [Steps    9] [reward 10.0]
[Episode 1790/4000] [Steps  126] [reward 127.0]
[Episode 1791/4000] [Steps   14] [reward 15.0]
[Episode 1792/4000] [Steps   60] [reward 61.0]
[Episode 1793/4000] [Steps   11] [reward 12.0]
[Episode 1794/4000] [Steps    7] [reward 8.0]
[Episode 1795/4000] [Steps    9] [reward 10.0]
[Episode 1796/4000] [Steps    8] [reward 9.0]
[Episode 1797/4000] [Steps   10] [reward 11.0]
[Episode 1798/4000] [Steps   11] [reward 12.0]
[Episode 1799/4000] [Steps    9] [reward 10.0]
[Episode 1800/4000] [Steps    8] [reward 9.0]
----------
[TEST Episode 1800] [Average Reward 10.3]
----------
[Episode 1801/4000] [Steps    9] [reward 10.0]
[Episode 1802/4000] [Steps   10] [reward 11.0]
[Episode 1803/4000] [Steps   90] [reward 91.0]
[Episode 1804/4000] [Steps  103] [reward 104.0]
[Episode 1805/4000] [Steps  147] [reward 148.0]
[Episode 1806/4000] [Steps  149] [reward 150.0]
[Episode 1807/4000] [Steps  176] [reward 177.0]
[Episode 1808/4000] [Steps  153] [reward 154.0]
[Episode 1809/4000] [Steps   66] [reward 67.0]
[Episode 1810/4000] [Steps   89] [reward 90.0]
[Episode 1811/4000] [Steps  134] [reward 135.0]
[Episode 1812/4000] [Steps   10] [reward 11.0]
[Episode 1813/4000] [Steps   13] [reward 14.0]
[Episode 1814/4000] [Steps    8] [reward 9.0]
[Episode 1815/4000] [Steps   10] [reward 11.0]
[Episode 1816/4000] [Steps    9] [reward 10.0]
[Episode 1817/4000] [Steps   12] [reward 13.0]
[Episode 1818/4000] [Steps    8] [reward 9.0]
[Episode 1819/4000] [Steps    9] [reward 10.0]
[Episode 1820/4000] [Steps    9] [reward 10.0]
[Episode 1821/4000] [Steps  102] [reward 103.0]
[Episode 1822/4000] [Steps  149] [reward 150.0]
[Episode 1823/4000] [Steps  134] [reward 135.0]
[Episode 1824/4000] [Steps  127] [reward 128.0]
[Episode 1825/4000] [Steps  142] [reward 143.0]
----------
[TEST Episode 1825] [Average Reward 65.8]
----------
[Episode 1826/4000] [Steps  146] [reward 147.0]
[Episode 1827/4000] [Steps  135] [reward 136.0]
[Episode 1828/4000] [Steps  144] [reward 145.0]
[Episode 1829/4000] [Steps  128] [reward 129.0]
[Episode 1830/4000] [Steps  112] [reward 113.0]
[Episode 1831/4000] [Steps  128] [reward 129.0]
[Episode 1832/4000] [Steps   47] [reward 48.0]
[Episode 1833/4000] [Steps   13] [reward 14.0]
[Episode 1834/4000] [Steps   30] [reward 31.0]
[Episode 1835/4000] [Steps   12] [reward 13.0]
[Episode 1836/4000] [Steps   13] [reward 14.0]
[Episode 1837/4000] [Steps   11] [reward 12.0]
[Episode 1838/4000] [Steps    9] [reward 10.0]
[Episode 1839/4000] [Steps    8] [reward 9.0]
[Episode 1840/4000] [Steps   19] [reward 20.0]
[Episode 1841/4000] [Steps    9] [reward 10.0]
[Episode 1842/4000] [Steps   12] [reward 13.0]
[Episode 1843/4000] [Steps   12] [reward 13.0]
[Episode 1844/4000] [Steps   10] [reward 11.0]
[Episode 1845/4000] [Steps    9] [reward 10.0]
[Episode 1846/4000] [Steps    8] [reward 9.0]
[Episode 1847/4000] [Steps   31] [reward 32.0]
[Episode 1848/4000] [Steps   11] [reward 12.0]
[Episode 1849/4000] [Steps    9] [reward 10.0]
[Episode 1850/4000] [Steps    7] [reward 8.0]
----------
[TEST Episode 1850] [Average Reward 16.6]
----------
[Episode 1851/4000] [Steps   21] [reward 22.0]
[Episode 1852/4000] [Steps   10] [reward 11.0]
[Episode 1853/4000] [Steps   40] [reward 41.0]
[Episode 1854/4000] [Steps   21] [reward 22.0]
[Episode 1855/4000] [Steps    9] [reward 10.0]
[Episode 1856/4000] [Steps   10] [reward 11.0]
[Episode 1857/4000] [Steps   10] [reward 11.0]
[Episode 1858/4000] [Steps   88] [reward 89.0]
[Episode 1859/4000] [Steps   94] [reward 95.0]
[Episode 1860/4000] [Steps   77] [reward 78.0]
[Episode 1861/4000] [Steps   56] [reward 57.0]
[Episode 1862/4000] [Steps   10] [reward 11.0]
[Episode 1863/4000] [Steps   57] [reward 58.0]
[Episode 1864/4000] [Steps   10] [reward 11.0]
[Episode 1865/4000] [Steps   10] [reward 11.0]
[Episode 1866/4000] [Steps  142] [reward 143.0]
[Episode 1867/4000] [Steps   57] [reward 58.0]
[Episode 1868/4000] [Steps   10] [reward 11.0]
[Episode 1869/4000] [Steps   11] [reward 12.0]
[Episode 1870/4000] [Steps    8] [reward 9.0]
[Episode 1871/4000] [Steps   37] [reward 38.0]
[Episode 1872/4000] [Steps  148] [reward 149.0]
[Episode 1873/4000] [Steps   80] [reward 81.0]
[Episode 1874/4000] [Steps  144] [reward 145.0]
[Episode 1875/4000] [Steps   26] [reward 27.0]
----------
[TEST Episode 1875] [Average Reward 16.8]
----------
[Episode 1876/4000] [Steps   52] [reward 53.0]
[Episode 1877/4000] [Steps   34] [reward 35.0]
[Episode 1878/4000] [Steps   43] [reward 44.0]
[Episode 1879/4000] [Steps   66] [reward 67.0]
[Episode 1880/4000] [Steps   81] [reward 82.0]
[Episode 1881/4000] [Steps   50] [reward 51.0]
[Episode 1882/4000] [Steps   31] [reward 32.0]
[Episode 1883/4000] [Steps   11] [reward 12.0]
[Episode 1884/4000] [Steps   22] [reward 23.0]
[Episode 1885/4000] [Steps   28] [reward 29.0]
[Episode 1886/4000] [Steps   22] [reward 23.0]
[Episode 1887/4000] [Steps    8] [reward 9.0]
[Episode 1888/4000] [Steps    9] [reward 10.0]
[Episode 1889/4000] [Steps    8] [reward 9.0]
[Episode 1890/4000] [Steps    9] [reward 10.0]
[Episode 1891/4000] [Steps    8] [reward 9.0]
[Episode 1892/4000] [Steps   10] [reward 11.0]
[Episode 1893/4000] [Steps   11] [reward 12.0]
[Episode 1894/4000] [Steps   10] [reward 11.0]
[Episode 1895/4000] [Steps    8] [reward 9.0]
[Episode 1896/4000] [Steps    8] [reward 9.0]
[Episode 1897/4000] [Steps    9] [reward 10.0]
[Episode 1898/4000] [Steps   12] [reward 13.0]
[Episode 1899/4000] [Steps    9] [reward 10.0]
[Episode 1900/4000] [Steps   12] [reward 13.0]
----------
[TEST Episode 1900] [Average Reward 36.2]
----------
[Episode 1901/4000] [Steps   14] [reward 15.0]
[Episode 1902/4000] [Steps    8] [reward 9.0]
[Episode 1903/4000] [Steps   20] [reward 21.0]
[Episode 1904/4000] [Steps    8] [reward 9.0]
[Episode 1905/4000] [Steps   11] [reward 12.0]
[Episode 1906/4000] [Steps    9] [reward 10.0]
[Episode 1907/4000] [Steps   18] [reward 19.0]
[Episode 1908/4000] [Steps   13] [reward 14.0]
[Episode 1909/4000] [Steps   21] [reward 22.0]
[Episode 1910/4000] [Steps   10] [reward 11.0]
[Episode 1911/4000] [Steps   13] [reward 14.0]
[Episode 1912/4000] [Steps    8] [reward 9.0]
[Episode 1913/4000] [Steps   16] [reward 17.0]
[Episode 1914/4000] [Steps   12] [reward 13.0]
[Episode 1915/4000] [Steps   12] [reward 13.0]
[Episode 1916/4000] [Steps   16] [reward 17.0]
[Episode 1917/4000] [Steps   11] [reward 12.0]
[Episode 1918/4000] [Steps    9] [reward 10.0]
[Episode 1919/4000] [Steps   40] [reward 41.0]
[Episode 1920/4000] [Steps   11] [reward 12.0]
[Episode 1921/4000] [Steps   19] [reward 20.0]
[Episode 1922/4000] [Steps   14] [reward 15.0]
[Episode 1923/4000] [Steps    8] [reward 9.0]
[Episode 1924/4000] [Steps   19] [reward 20.0]
[Episode 1925/4000] [Steps   12] [reward 13.0]
----------
[TEST Episode 1925] [Average Reward 9.2]
----------
[Episode 1926/4000] [Steps    8] [reward 9.0]
[Episode 1927/4000] [Steps    9] [reward 10.0]
[Episode 1928/4000] [Steps   29] [reward 30.0]
[Episode 1929/4000] [Steps   12] [reward 13.0]
[Episode 1930/4000] [Steps   12] [reward 13.0]
[Episode 1931/4000] [Steps   16] [reward 17.0]
[Episode 1932/4000] [Steps   38] [reward 39.0]
[Episode 1933/4000] [Steps   10] [reward 11.0]
[Episode 1934/4000] [Steps   25] [reward 26.0]
[Episode 1935/4000] [Steps   18] [reward 19.0]
[Episode 1936/4000] [Steps   16] [reward 17.0]
[Episode 1937/4000] [Steps   15] [reward 16.0]
[Episode 1938/4000] [Steps   18] [reward 19.0]
[Episode 1939/4000] [Steps   36] [reward 37.0]
[Episode 1940/4000] [Steps   20] [reward 21.0]
[Episode 1941/4000] [Steps    9] [reward 10.0]
[Episode 1942/4000] [Steps   10] [reward 11.0]
[Episode 1943/4000] [Steps   29] [reward 30.0]
[Episode 1944/4000] [Steps   24] [reward 25.0]
[Episode 1945/4000] [Steps   31] [reward 32.0]
[Episode 1946/4000] [Steps   15] [reward 16.0]
[Episode 1947/4000] [Steps   19] [reward 20.0]
[Episode 1948/4000] [Steps   17] [reward 18.0]
[Episode 1949/4000] [Steps    9] [reward 10.0]
[Episode 1950/4000] [Steps   22] [reward 23.0]
----------
[TEST Episode 1950] [Average Reward 48.0]
----------
[Episode 1951/4000] [Steps   16] [reward 17.0]
[Episode 1952/4000] [Steps   26] [reward 27.0]
[Episode 1953/4000] [Steps   29] [reward 30.0]
[Episode 1954/4000] [Steps    9] [reward 10.0]
[Episode 1955/4000] [Steps   14] [reward 15.0]
[Episode 1956/4000] [Steps   17] [reward 18.0]
[Episode 1957/4000] [Steps   31] [reward 32.0]
[Episode 1958/4000] [Steps   14] [reward 15.0]
[Episode 1959/4000] [Steps   23] [reward 24.0]
[Episode 1960/4000] [Steps   13] [reward 14.0]
[Episode 1961/4000] [Steps   10] [reward 11.0]
[Episode 1962/4000] [Steps   10] [reward 11.0]
[Episode 1963/4000] [Steps   34] [reward 35.0]
[Episode 1964/4000] [Steps   31] [reward 32.0]
[Episode 1965/4000] [Steps   20] [reward 21.0]
[Episode 1966/4000] [Steps   24] [reward 25.0]
[Episode 1967/4000] [Steps   16] [reward 17.0]
[Episode 1968/4000] [Steps   11] [reward 12.0]
[Episode 1969/4000] [Steps   12] [reward 13.0]
[Episode 1970/4000] [Steps   25] [reward 26.0]
[Episode 1971/4000] [Steps   24] [reward 25.0]
[Episode 1972/4000] [Steps   25] [reward 26.0]
[Episode 1973/4000] [Steps   18] [reward 19.0]
[Episode 1974/4000] [Steps   19] [reward 20.0]
[Episode 1975/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 1975] [Average Reward 36.9]
----------
[Episode 1976/4000] [Steps   25] [reward 26.0]
[Episode 1977/4000] [Steps   10] [reward 11.0]
[Episode 1978/4000] [Steps   21] [reward 22.0]
[Episode 1979/4000] [Steps    9] [reward 10.0]
[Episode 1980/4000] [Steps    8] [reward 9.0]
[Episode 1981/4000] [Steps   27] [reward 28.0]
[Episode 1982/4000] [Steps   14] [reward 15.0]
[Episode 1983/4000] [Steps   11] [reward 12.0]
[Episode 1984/4000] [Steps   10] [reward 11.0]
[Episode 1985/4000] [Steps   12] [reward 13.0]
[Episode 1986/4000] [Steps   31] [reward 32.0]
[Episode 1987/4000] [Steps   25] [reward 26.0]
[Episode 1988/4000] [Steps   35] [reward 36.0]
[Episode 1989/4000] [Steps   24] [reward 25.0]
[Episode 1990/4000] [Steps   12] [reward 13.0]
[Episode 1991/4000] [Steps   21] [reward 22.0]
[Episode 1992/4000] [Steps   28] [reward 29.0]
[Episode 1993/4000] [Steps    9] [reward 10.0]
[Episode 1994/4000] [Steps   28] [reward 29.0]
[Episode 1995/4000] [Steps   23] [reward 24.0]
[Episode 1996/4000] [Steps   32] [reward 33.0]
[Episode 1997/4000] [Steps   26] [reward 27.0]
[Episode 1998/4000] [Steps   21] [reward 22.0]
[Episode 1999/4000] [Steps   42] [reward 43.0]
[Episode 2000/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 2000] [Average Reward 34.9]
----------
[Episode 2001/4000] [Steps   19] [reward 20.0]
[Episode 2002/4000] [Steps   26] [reward 27.0]
[Episode 2003/4000] [Steps   27] [reward 28.0]
[Episode 2004/4000] [Steps   14] [reward 15.0]
[Episode 2005/4000] [Steps   35] [reward 36.0]
[Episode 2006/4000] [Steps   34] [reward 35.0]
[Episode 2007/4000] [Steps   11] [reward 12.0]
[Episode 2008/4000] [Steps    7] [reward 8.0]
[Episode 2009/4000] [Steps   29] [reward 30.0]
[Episode 2010/4000] [Steps   11] [reward 12.0]
[Episode 2011/4000] [Steps   27] [reward 28.0]
[Episode 2012/4000] [Steps   16] [reward 17.0]
[Episode 2013/4000] [Steps   35] [reward 36.0]
[Episode 2014/4000] [Steps   13] [reward 14.0]
[Episode 2015/4000] [Steps   43] [reward 44.0]
[Episode 2016/4000] [Steps   23] [reward 24.0]
[Episode 2017/4000] [Steps   20] [reward 21.0]
[Episode 2018/4000] [Steps   32] [reward 33.0]
[Episode 2019/4000] [Steps   22] [reward 23.0]
[Episode 2020/4000] [Steps   13] [reward 14.0]
[Episode 2021/4000] [Steps   15] [reward 16.0]
[Episode 2022/4000] [Steps   15] [reward 16.0]
[Episode 2023/4000] [Steps   28] [reward 29.0]
[Episode 2024/4000] [Steps   31] [reward 32.0]
[Episode 2025/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 2025] [Average Reward 29.5]
----------
[Episode 2026/4000] [Steps   13] [reward 14.0]
[Episode 2027/4000] [Steps   31] [reward 32.0]
[Episode 2028/4000] [Steps   41] [reward 42.0]
[Episode 2029/4000] [Steps   23] [reward 24.0]
[Episode 2030/4000] [Steps   10] [reward 11.0]
[Episode 2031/4000] [Steps   20] [reward 21.0]
[Episode 2032/4000] [Steps   21] [reward 22.0]
[Episode 2033/4000] [Steps   33] [reward 34.0]
[Episode 2034/4000] [Steps   14] [reward 15.0]
[Episode 2035/4000] [Steps   35] [reward 36.0]
[Episode 2036/4000] [Steps   12] [reward 13.0]
[Episode 2037/4000] [Steps   40] [reward 41.0]
[Episode 2038/4000] [Steps    9] [reward 10.0]
[Episode 2039/4000] [Steps   25] [reward 26.0]
[Episode 2040/4000] [Steps   38] [reward 39.0]
[Episode 2041/4000] [Steps   19] [reward 20.0]
[Episode 2042/4000] [Steps   25] [reward 26.0]
[Episode 2043/4000] [Steps   47] [reward 48.0]
[Episode 2044/4000] [Steps   24] [reward 25.0]
[Episode 2045/4000] [Steps   14] [reward 15.0]
[Episode 2046/4000] [Steps   20] [reward 21.0]
[Episode 2047/4000] [Steps   28] [reward 29.0]
[Episode 2048/4000] [Steps    8] [reward 9.0]
[Episode 2049/4000] [Steps   14] [reward 15.0]
[Episode 2050/4000] [Steps   73] [reward 74.0]
----------
[TEST Episode 2050] [Average Reward 37.7]
----------
[Episode 2051/4000] [Steps   15] [reward 16.0]
[Episode 2052/4000] [Steps   39] [reward 40.0]
[Episode 2053/4000] [Steps   11] [reward 12.0]
[Episode 2054/4000] [Steps   29] [reward 30.0]
[Episode 2055/4000] [Steps   28] [reward 29.0]
[Episode 2056/4000] [Steps   25] [reward 26.0]
[Episode 2057/4000] [Steps   17] [reward 18.0]
[Episode 2058/4000] [Steps   21] [reward 22.0]
[Episode 2059/4000] [Steps   13] [reward 14.0]
[Episode 2060/4000] [Steps   22] [reward 23.0]
[Episode 2061/4000] [Steps   21] [reward 22.0]
[Episode 2062/4000] [Steps   31] [reward 32.0]
[Episode 2063/4000] [Steps   43] [reward 44.0]
[Episode 2064/4000] [Steps   14] [reward 15.0]
[Episode 2065/4000] [Steps   35] [reward 36.0]
[Episode 2066/4000] [Steps   23] [reward 24.0]
[Episode 2067/4000] [Steps   27] [reward 28.0]
[Episode 2068/4000] [Steps   18] [reward 19.0]
[Episode 2069/4000] [Steps   10] [reward 11.0]
[Episode 2070/4000] [Steps   46] [reward 47.0]
[Episode 2071/4000] [Steps   12] [reward 13.0]
[Episode 2072/4000] [Steps   27] [reward 28.0]
[Episode 2073/4000] [Steps   30] [reward 31.0]
[Episode 2074/4000] [Steps   26] [reward 27.0]
[Episode 2075/4000] [Steps   10] [reward 11.0]
----------
[TEST Episode 2075] [Average Reward 33.1]
----------
[Episode 2076/4000] [Steps   34] [reward 35.0]
[Episode 2077/4000] [Steps   35] [reward 36.0]
[Episode 2078/4000] [Steps   25] [reward 26.0]
[Episode 2079/4000] [Steps   26] [reward 27.0]
[Episode 2080/4000] [Steps   52] [reward 53.0]
[Episode 2081/4000] [Steps   16] [reward 17.0]
[Episode 2082/4000] [Steps   21] [reward 22.0]
[Episode 2083/4000] [Steps   19] [reward 20.0]
[Episode 2084/4000] [Steps   37] [reward 38.0]
[Episode 2085/4000] [Steps   42] [reward 43.0]
[Episode 2086/4000] [Steps   40] [reward 41.0]
[Episode 2087/4000] [Steps   13] [reward 14.0]
[Episode 2088/4000] [Steps   34] [reward 35.0]
[Episode 2089/4000] [Steps   10] [reward 11.0]
[Episode 2090/4000] [Steps   21] [reward 22.0]
[Episode 2091/4000] [Steps   30] [reward 31.0]
[Episode 2092/4000] [Steps   15] [reward 16.0]
[Episode 2093/4000] [Steps    8] [reward 9.0]
[Episode 2094/4000] [Steps   35] [reward 36.0]
[Episode 2095/4000] [Steps   10] [reward 11.0]
[Episode 2096/4000] [Steps   18] [reward 19.0]
[Episode 2097/4000] [Steps   23] [reward 24.0]
[Episode 2098/4000] [Steps   45] [reward 46.0]
[Episode 2099/4000] [Steps   14] [reward 15.0]
[Episode 2100/4000] [Steps   35] [reward 36.0]
----------
[TEST Episode 2100] [Average Reward 38.2]
----------
[Episode 2101/4000] [Steps   23] [reward 24.0]
[Episode 2102/4000] [Steps    8] [reward 9.0]
[Episode 2103/4000] [Steps   25] [reward 26.0]
[Episode 2104/4000] [Steps   27] [reward 28.0]
[Episode 2105/4000] [Steps   10] [reward 11.0]
[Episode 2106/4000] [Steps   25] [reward 26.0]
[Episode 2107/4000] [Steps   16] [reward 17.0]
[Episode 2108/4000] [Steps   16] [reward 17.0]
[Episode 2109/4000] [Steps   31] [reward 32.0]
[Episode 2110/4000] [Steps   24] [reward 25.0]
[Episode 2111/4000] [Steps   30] [reward 31.0]
[Episode 2112/4000] [Steps   38] [reward 39.0]
[Episode 2113/4000] [Steps   26] [reward 27.0]
[Episode 2114/4000] [Steps   17] [reward 18.0]
[Episode 2115/4000] [Steps   23] [reward 24.0]
[Episode 2116/4000] [Steps   14] [reward 15.0]
[Episode 2117/4000] [Steps   27] [reward 28.0]
[Episode 2118/4000] [Steps   45] [reward 46.0]
[Episode 2119/4000] [Steps   32] [reward 33.0]
[Episode 2120/4000] [Steps   18] [reward 19.0]
[Episode 2121/4000] [Steps   15] [reward 16.0]
[Episode 2122/4000] [Steps   49] [reward 50.0]
[Episode 2123/4000] [Steps   26] [reward 27.0]
[Episode 2124/4000] [Steps   15] [reward 16.0]
[Episode 2125/4000] [Steps   30] [reward 31.0]
----------
[TEST Episode 2125] [Average Reward 30.7]
----------
[Episode 2126/4000] [Steps   31] [reward 32.0]
[Episode 2127/4000] [Steps   27] [reward 28.0]
[Episode 2128/4000] [Steps    8] [reward 9.0]
[Episode 2129/4000] [Steps   13] [reward 14.0]
[Episode 2130/4000] [Steps   26] [reward 27.0]
[Episode 2131/4000] [Steps   24] [reward 25.0]
[Episode 2132/4000] [Steps   36] [reward 37.0]
[Episode 2133/4000] [Steps   24] [reward 25.0]
[Episode 2134/4000] [Steps   11] [reward 12.0]
[Episode 2135/4000] [Steps   32] [reward 33.0]
[Episode 2136/4000] [Steps   29] [reward 30.0]
[Episode 2137/4000] [Steps   11] [reward 12.0]
[Episode 2138/4000] [Steps   13] [reward 14.0]
[Episode 2139/4000] [Steps   16] [reward 17.0]
[Episode 2140/4000] [Steps   16] [reward 17.0]
[Episode 2141/4000] [Steps   18] [reward 19.0]
[Episode 2142/4000] [Steps   20] [reward 21.0]
[Episode 2143/4000] [Steps   33] [reward 34.0]
[Episode 2144/4000] [Steps   29] [reward 30.0]
[Episode 2145/4000] [Steps   30] [reward 31.0]
[Episode 2146/4000] [Steps   43] [reward 44.0]
[Episode 2147/4000] [Steps   25] [reward 26.0]
[Episode 2148/4000] [Steps   14] [reward 15.0]
[Episode 2149/4000] [Steps   15] [reward 16.0]
[Episode 2150/4000] [Steps   10] [reward 11.0]
----------
[TEST Episode 2150] [Average Reward 36.0]
----------
[Episode 2151/4000] [Steps   28] [reward 29.0]
[Episode 2152/4000] [Steps   48] [reward 49.0]
[Episode 2153/4000] [Steps   27] [reward 28.0]
[Episode 2154/4000] [Steps    8] [reward 9.0]
[Episode 2155/4000] [Steps   35] [reward 36.0]
[Episode 2156/4000] [Steps   20] [reward 21.0]
[Episode 2157/4000] [Steps   15] [reward 16.0]
[Episode 2158/4000] [Steps   13] [reward 14.0]
[Episode 2159/4000] [Steps   33] [reward 34.0]
[Episode 2160/4000] [Steps    9] [reward 10.0]
[Episode 2161/4000] [Steps   23] [reward 24.0]
[Episode 2162/4000] [Steps   48] [reward 49.0]
[Episode 2163/4000] [Steps   13] [reward 14.0]
[Episode 2164/4000] [Steps   32] [reward 33.0]
[Episode 2165/4000] [Steps    7] [reward 8.0]
[Episode 2166/4000] [Steps   50] [reward 51.0]
[Episode 2167/4000] [Steps   73] [reward 74.0]
[Episode 2168/4000] [Steps   10] [reward 11.0]
[Episode 2169/4000] [Steps   24] [reward 25.0]
[Episode 2170/4000] [Steps   35] [reward 36.0]
[Episode 2171/4000] [Steps   23] [reward 24.0]
[Episode 2172/4000] [Steps   30] [reward 31.0]
[Episode 2173/4000] [Steps   43] [reward 44.0]
[Episode 2174/4000] [Steps   33] [reward 34.0]
[Episode 2175/4000] [Steps   31] [reward 32.0]
----------
[TEST Episode 2175] [Average Reward 35.3]
----------
[Episode 2176/4000] [Steps   39] [reward 40.0]
[Episode 2177/4000] [Steps   45] [reward 46.0]
[Episode 2178/4000] [Steps   36] [reward 37.0]
[Episode 2179/4000] [Steps   30] [reward 31.0]
[Episode 2180/4000] [Steps   26] [reward 27.0]
[Episode 2181/4000] [Steps    8] [reward 9.0]
[Episode 2182/4000] [Steps   30] [reward 31.0]
[Episode 2183/4000] [Steps   31] [reward 32.0]
[Episode 2184/4000] [Steps   26] [reward 27.0]
[Episode 2185/4000] [Steps   23] [reward 24.0]
[Episode 2186/4000] [Steps   31] [reward 32.0]
[Episode 2187/4000] [Steps   25] [reward 26.0]
[Episode 2188/4000] [Steps   41] [reward 42.0]
[Episode 2189/4000] [Steps   24] [reward 25.0]
[Episode 2190/4000] [Steps   17] [reward 18.0]
[Episode 2191/4000] [Steps   50] [reward 51.0]
[Episode 2192/4000] [Steps   47] [reward 48.0]
[Episode 2193/4000] [Steps   45] [reward 46.0]
[Episode 2194/4000] [Steps   30] [reward 31.0]
[Episode 2195/4000] [Steps   43] [reward 44.0]
[Episode 2196/4000] [Steps   33] [reward 34.0]
[Episode 2197/4000] [Steps   33] [reward 34.0]
[Episode 2198/4000] [Steps   51] [reward 52.0]
[Episode 2199/4000] [Steps   56] [reward 57.0]
[Episode 2200/4000] [Steps   39] [reward 40.0]
----------
[TEST Episode 2200] [Average Reward 38.4]
----------
[Episode 2201/4000] [Steps   38] [reward 39.0]
[Episode 2202/4000] [Steps   29] [reward 30.0]
[Episode 2203/4000] [Steps    8] [reward 9.0]
[Episode 2204/4000] [Steps   27] [reward 28.0]
[Episode 2205/4000] [Steps   25] [reward 26.0]
[Episode 2206/4000] [Steps   37] [reward 38.0]
[Episode 2207/4000] [Steps    9] [reward 10.0]
[Episode 2208/4000] [Steps   39] [reward 40.0]
[Episode 2209/4000] [Steps   41] [reward 42.0]
[Episode 2210/4000] [Steps   14] [reward 15.0]
[Episode 2211/4000] [Steps   57] [reward 58.0]
[Episode 2212/4000] [Steps   23] [reward 24.0]
[Episode 2213/4000] [Steps   32] [reward 33.0]
[Episode 2214/4000] [Steps   45] [reward 46.0]
[Episode 2215/4000] [Steps   29] [reward 30.0]
[Episode 2216/4000] [Steps   33] [reward 34.0]
[Episode 2217/4000] [Steps   31] [reward 32.0]
[Episode 2218/4000] [Steps   13] [reward 14.0]
[Episode 2219/4000] [Steps   31] [reward 32.0]
[Episode 2220/4000] [Steps   34] [reward 35.0]
[Episode 2221/4000] [Steps   28] [reward 29.0]
[Episode 2222/4000] [Steps   28] [reward 29.0]
[Episode 2223/4000] [Steps   53] [reward 54.0]
[Episode 2224/4000] [Steps   16] [reward 17.0]
[Episode 2225/4000] [Steps   33] [reward 34.0]
----------
[TEST Episode 2225] [Average Reward 42.9]
----------
[Episode 2226/4000] [Steps   31] [reward 32.0]
[Episode 2227/4000] [Steps   20] [reward 21.0]
[Episode 2228/4000] [Steps   21] [reward 22.0]
[Episode 2229/4000] [Steps   36] [reward 37.0]
[Episode 2230/4000] [Steps   32] [reward 33.0]
[Episode 2231/4000] [Steps   26] [reward 27.0]
[Episode 2232/4000] [Steps   31] [reward 32.0]
[Episode 2233/4000] [Steps   43] [reward 44.0]
[Episode 2234/4000] [Steps   14] [reward 15.0]
[Episode 2235/4000] [Steps   44] [reward 45.0]
[Episode 2236/4000] [Steps   31] [reward 32.0]
[Episode 2237/4000] [Steps   52] [reward 53.0]
[Episode 2238/4000] [Steps   19] [reward 20.0]
[Episode 2239/4000] [Steps   59] [reward 60.0]
[Episode 2240/4000] [Steps   34] [reward 35.0]
[Episode 2241/4000] [Steps   45] [reward 46.0]
[Episode 2242/4000] [Steps   42] [reward 43.0]
[Episode 2243/4000] [Steps   27] [reward 28.0]
[Episode 2244/4000] [Steps   70] [reward 71.0]
[Episode 2245/4000] [Steps   59] [reward 60.0]
[Episode 2246/4000] [Steps   25] [reward 26.0]
[Episode 2247/4000] [Steps   13] [reward 14.0]
[Episode 2248/4000] [Steps   54] [reward 55.0]
[Episode 2249/4000] [Steps   38] [reward 39.0]
[Episode 2250/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 2250] [Average Reward 47.5]
----------
[Episode 2251/4000] [Steps   67] [reward 68.0]
[Episode 2252/4000] [Steps   39] [reward 40.0]
[Episode 2253/4000] [Steps   18] [reward 19.0]
[Episode 2254/4000] [Steps   53] [reward 54.0]
[Episode 2255/4000] [Steps   34] [reward 35.0]
[Episode 2256/4000] [Steps   21] [reward 22.0]
[Episode 2257/4000] [Steps   53] [reward 54.0]
[Episode 2258/4000] [Steps   64] [reward 65.0]
[Episode 2259/4000] [Steps   46] [reward 47.0]
[Episode 2260/4000] [Steps   53] [reward 54.0]
[Episode 2261/4000] [Steps   60] [reward 61.0]
[Episode 2262/4000] [Steps   35] [reward 36.0]
[Episode 2263/4000] [Steps   44] [reward 45.0]
[Episode 2264/4000] [Steps  167] [reward 168.0]
[Episode 2265/4000] [Steps   32] [reward 33.0]
[Episode 2266/4000] [Steps   62] [reward 63.0]
[Episode 2267/4000] [Steps   42] [reward 43.0]
[Episode 2268/4000] [Steps   55] [reward 56.0]
[Episode 2269/4000] [Steps   59] [reward 60.0]
[Episode 2270/4000] [Steps   44] [reward 45.0]
[Episode 2271/4000] [Steps   13] [reward 14.0]
[Episode 2272/4000] [Steps   42] [reward 43.0]
[Episode 2273/4000] [Steps   12] [reward 13.0]
[Episode 2274/4000] [Steps   36] [reward 37.0]
[Episode 2275/4000] [Steps   74] [reward 75.0]
----------
[TEST Episode 2275] [Average Reward 61.3]
----------
[Episode 2276/4000] [Steps   95] [reward 96.0]
[Episode 2277/4000] [Steps   76] [reward 77.0]
[Episode 2278/4000] [Steps   88] [reward 89.0]
[Episode 2279/4000] [Steps   34] [reward 35.0]
[Episode 2280/4000] [Steps   52] [reward 53.0]
[Episode 2281/4000] [Steps   75] [reward 76.0]
[Episode 2282/4000] [Steps  107] [reward 108.0]
[Episode 2283/4000] [Steps   87] [reward 88.0]
[Episode 2284/4000] [Steps   16] [reward 17.0]
[Episode 2285/4000] [Steps   10] [reward 11.0]
[Episode 2286/4000] [Steps    7] [reward 8.0]
[Episode 2287/4000] [Steps   10] [reward 11.0]
[Episode 2288/4000] [Steps    9] [reward 10.0]
[Episode 2289/4000] [Steps    8] [reward 9.0]
[Episode 2290/4000] [Steps    9] [reward 10.0]
[Episode 2291/4000] [Steps   10] [reward 11.0]
[Episode 2292/4000] [Steps   54] [reward 55.0]
[Episode 2293/4000] [Steps   66] [reward 67.0]
[Episode 2294/4000] [Steps   62] [reward 63.0]
[Episode 2295/4000] [Steps   45] [reward 46.0]
[Episode 2296/4000] [Steps   52] [reward 53.0]
[Episode 2297/4000] [Steps   59] [reward 60.0]
[Episode 2298/4000] [Steps   71] [reward 72.0]
[Episode 2299/4000] [Steps   70] [reward 71.0]
[Episode 2300/4000] [Steps   37] [reward 38.0]
----------
[TEST Episode 2300] [Average Reward 56.9]
----------
[Episode 2301/4000] [Steps   51] [reward 52.0]
[Episode 2302/4000] [Steps   49] [reward 50.0]
[Episode 2303/4000] [Steps   48] [reward 49.0]
[Episode 2304/4000] [Steps   42] [reward 43.0]
[Episode 2305/4000] [Steps   54] [reward 55.0]
[Episode 2306/4000] [Steps   55] [reward 56.0]
[Episode 2307/4000] [Steps   64] [reward 65.0]
[Episode 2308/4000] [Steps   78] [reward 79.0]
[Episode 2309/4000] [Steps  143] [reward 144.0]
[Episode 2310/4000] [Steps  134] [reward 135.0]
[Episode 2311/4000] [Steps   52] [reward 53.0]
[Episode 2312/4000] [Steps   52] [reward 53.0]
[Episode 2313/4000] [Steps  145] [reward 146.0]
[Episode 2314/4000] [Steps   86] [reward 87.0]
[Episode 2315/4000] [Steps   77] [reward 78.0]
[Episode 2316/4000] [Steps   58] [reward 59.0]
[Episode 2317/4000] [Steps  100] [reward 101.0]
[Episode 2318/4000] [Steps  117] [reward 118.0]
[Episode 2319/4000] [Steps   58] [reward 59.0]
[Episode 2320/4000] [Steps   77] [reward 78.0]
[Episode 2321/4000] [Steps   73] [reward 74.0]
[Episode 2322/4000] [Steps   27] [reward 28.0]
[Episode 2323/4000] [Steps   10] [reward 11.0]
[Episode 2324/4000] [Steps   22] [reward 23.0]
[Episode 2325/4000] [Steps   67] [reward 68.0]
----------
[TEST Episode 2325] [Average Reward 71.6]
----------
[Episode 2326/4000] [Steps   54] [reward 55.0]
[Episode 2327/4000] [Steps   72] [reward 73.0]
[Episode 2328/4000] [Steps   64] [reward 65.0]
[Episode 2329/4000] [Steps   58] [reward 59.0]
[Episode 2330/4000] [Steps   27] [reward 28.0]
[Episode 2331/4000] [Steps   20] [reward 21.0]
[Episode 2332/4000] [Steps   39] [reward 40.0]
[Episode 2333/4000] [Steps   46] [reward 47.0]
[Episode 2334/4000] [Steps   54] [reward 55.0]
[Episode 2335/4000] [Steps   67] [reward 68.0]
[Episode 2336/4000] [Steps   82] [reward 83.0]
[Episode 2337/4000] [Steps   63] [reward 64.0]
[Episode 2338/4000] [Steps   59] [reward 60.0]
[Episode 2339/4000] [Steps   83] [reward 84.0]
[Episode 2340/4000] [Steps   60] [reward 61.0]
[Episode 2341/4000] [Steps   51] [reward 52.0]
[Episode 2342/4000] [Steps   28] [reward 29.0]
[Episode 2343/4000] [Steps   48] [reward 49.0]
[Episode 2344/4000] [Steps   42] [reward 43.0]
[Episode 2345/4000] [Steps   98] [reward 99.0]
[Episode 2346/4000] [Steps   96] [reward 97.0]
[Episode 2347/4000] [Steps   51] [reward 52.0]
[Episode 2348/4000] [Steps   42] [reward 43.0]
[Episode 2349/4000] [Steps   87] [reward 88.0]
[Episode 2350/4000] [Steps  114] [reward 115.0]
----------
[TEST Episode 2350] [Average Reward 67.0]
----------
[Episode 2351/4000] [Steps   56] [reward 57.0]
[Episode 2352/4000] [Steps   44] [reward 45.0]
[Episode 2353/4000] [Steps   89] [reward 90.0]
[Episode 2354/4000] [Steps  150] [reward 151.0]
[Episode 2355/4000] [Steps   62] [reward 63.0]
[Episode 2356/4000] [Steps   57] [reward 58.0]
[Episode 2357/4000] [Steps   10] [reward 11.0]
[Episode 2358/4000] [Steps    8] [reward 9.0]
[Episode 2359/4000] [Steps   11] [reward 12.0]
[Episode 2360/4000] [Steps    7] [reward 8.0]
[Episode 2361/4000] [Steps    9] [reward 10.0]
[Episode 2362/4000] [Steps    9] [reward 10.0]
[Episode 2363/4000] [Steps   41] [reward 42.0]
[Episode 2364/4000] [Steps   65] [reward 66.0]
[Episode 2365/4000] [Steps   42] [reward 43.0]
[Episode 2366/4000] [Steps   26] [reward 27.0]
[Episode 2367/4000] [Steps   27] [reward 28.0]
[Episode 2368/4000] [Steps   44] [reward 45.0]
[Episode 2369/4000] [Steps   46] [reward 47.0]
[Episode 2370/4000] [Steps   53] [reward 54.0]
[Episode 2371/4000] [Steps   37] [reward 38.0]
[Episode 2372/4000] [Steps   48] [reward 49.0]
[Episode 2373/4000] [Steps   24] [reward 25.0]
[Episode 2374/4000] [Steps   32] [reward 33.0]
[Episode 2375/4000] [Steps   44] [reward 45.0]
----------
[TEST Episode 2375] [Average Reward 43.0]
----------
[Episode 2376/4000] [Steps   32] [reward 33.0]
[Episode 2377/4000] [Steps   46] [reward 47.0]
[Episode 2378/4000] [Steps   27] [reward 28.0]
[Episode 2379/4000] [Steps   37] [reward 38.0]
[Episode 2380/4000] [Steps   24] [reward 25.0]
[Episode 2381/4000] [Steps   27] [reward 28.0]
[Episode 2382/4000] [Steps   23] [reward 24.0]
[Episode 2383/4000] [Steps   48] [reward 49.0]
[Episode 2384/4000] [Steps   29] [reward 30.0]
[Episode 2385/4000] [Steps   30] [reward 31.0]
[Episode 2386/4000] [Steps   26] [reward 27.0]
[Episode 2387/4000] [Steps   26] [reward 27.0]
[Episode 2388/4000] [Steps   37] [reward 38.0]
[Episode 2389/4000] [Steps   36] [reward 37.0]
[Episode 2390/4000] [Steps   33] [reward 34.0]
[Episode 2391/4000] [Steps  101] [reward 102.0]
[Episode 2392/4000] [Steps   14] [reward 15.0]
[Episode 2393/4000] [Steps   16] [reward 17.0]
[Episode 2394/4000] [Steps   29] [reward 30.0]
[Episode 2395/4000] [Steps   71] [reward 72.0]
[Episode 2396/4000] [Steps   14] [reward 15.0]
[Episode 2397/4000] [Steps    9] [reward 10.0]
[Episode 2398/4000] [Steps    7] [reward 8.0]
[Episode 2399/4000] [Steps   10] [reward 11.0]
[Episode 2400/4000] [Steps   10] [reward 11.0]
----------
[TEST Episode 2400] [Average Reward 9.6]
----------
[Episode 2401/4000] [Steps    9] [reward 10.0]
[Episode 2402/4000] [Steps    8] [reward 9.0]
[Episode 2403/4000] [Steps    8] [reward 9.0]
[Episode 2404/4000] [Steps    8] [reward 9.0]
[Episode 2405/4000] [Steps    9] [reward 10.0]
[Episode 2406/4000] [Steps   11] [reward 12.0]
[Episode 2407/4000] [Steps    9] [reward 10.0]
[Episode 2408/4000] [Steps   14] [reward 15.0]
[Episode 2409/4000] [Steps   17] [reward 18.0]
[Episode 2410/4000] [Steps    8] [reward 9.0]
[Episode 2411/4000] [Steps    9] [reward 10.0]
[Episode 2412/4000] [Steps   20] [reward 21.0]
[Episode 2413/4000] [Steps   24] [reward 25.0]
[Episode 2414/4000] [Steps   25] [reward 26.0]
[Episode 2415/4000] [Steps   25] [reward 26.0]
[Episode 2416/4000] [Steps   43] [reward 44.0]
[Episode 2417/4000] [Steps   37] [reward 38.0]
[Episode 2418/4000] [Steps   47] [reward 48.0]
[Episode 2419/4000] [Steps   46] [reward 47.0]
[Episode 2420/4000] [Steps   48] [reward 49.0]
[Episode 2421/4000] [Steps   26] [reward 27.0]
[Episode 2422/4000] [Steps   11] [reward 12.0]
[Episode 2423/4000] [Steps   38] [reward 39.0]
[Episode 2424/4000] [Steps   48] [reward 49.0]
[Episode 2425/4000] [Steps   25] [reward 26.0]
----------
[TEST Episode 2425] [Average Reward 30.1]
----------
[Episode 2426/4000] [Steps   15] [reward 16.0]
[Episode 2427/4000] [Steps   14] [reward 15.0]
[Episode 2428/4000] [Steps    9] [reward 10.0]
[Episode 2429/4000] [Steps   12] [reward 13.0]
[Episode 2430/4000] [Steps   18] [reward 19.0]
[Episode 2431/4000] [Steps    9] [reward 10.0]
[Episode 2432/4000] [Steps    9] [reward 10.0]
[Episode 2433/4000] [Steps   63] [reward 64.0]
[Episode 2434/4000] [Steps   25] [reward 26.0]
[Episode 2435/4000] [Steps   19] [reward 20.0]
[Episode 2436/4000] [Steps   16] [reward 17.0]
[Episode 2437/4000] [Steps   21] [reward 22.0]
[Episode 2438/4000] [Steps   16] [reward 17.0]
[Episode 2439/4000] [Steps    9] [reward 10.0]
[Episode 2440/4000] [Steps    8] [reward 9.0]
[Episode 2441/4000] [Steps   13] [reward 14.0]
[Episode 2442/4000] [Steps   10] [reward 11.0]
[Episode 2443/4000] [Steps   12] [reward 13.0]
[Episode 2444/4000] [Steps   12] [reward 13.0]
[Episode 2445/4000] [Steps   11] [reward 12.0]
[Episode 2446/4000] [Steps   10] [reward 11.0]
[Episode 2447/4000] [Steps   14] [reward 15.0]
[Episode 2448/4000] [Steps   19] [reward 20.0]
[Episode 2449/4000] [Steps   13] [reward 14.0]
[Episode 2450/4000] [Steps   19] [reward 20.0]
----------
[TEST Episode 2450] [Average Reward 16.9]
----------
[Episode 2451/4000] [Steps   16] [reward 17.0]
[Episode 2452/4000] [Steps   18] [reward 19.0]
[Episode 2453/4000] [Steps   16] [reward 17.0]
[Episode 2454/4000] [Steps   15] [reward 16.0]
[Episode 2455/4000] [Steps   18] [reward 19.0]
[Episode 2456/4000] [Steps   15] [reward 16.0]
[Episode 2457/4000] [Steps   18] [reward 19.0]
[Episode 2458/4000] [Steps    8] [reward 9.0]
[Episode 2459/4000] [Steps   16] [reward 17.0]
[Episode 2460/4000] [Steps   12] [reward 13.0]
[Episode 2461/4000] [Steps   17] [reward 18.0]
[Episode 2462/4000] [Steps   17] [reward 18.0]
[Episode 2463/4000] [Steps   16] [reward 17.0]
[Episode 2464/4000] [Steps   21] [reward 22.0]
[Episode 2465/4000] [Steps   20] [reward 21.0]
[Episode 2466/4000] [Steps   17] [reward 18.0]
[Episode 2467/4000] [Steps   17] [reward 18.0]
[Episode 2468/4000] [Steps   17] [reward 18.0]
[Episode 2469/4000] [Steps   21] [reward 22.0]
[Episode 2470/4000] [Steps   12] [reward 13.0]
[Episode 2471/4000] [Steps   18] [reward 19.0]
[Episode 2472/4000] [Steps   16] [reward 17.0]
[Episode 2473/4000] [Steps   15] [reward 16.0]
[Episode 2474/4000] [Steps   15] [reward 16.0]
[Episode 2475/4000] [Steps   13] [reward 14.0]
----------
[TEST Episode 2475] [Average Reward 15.6]
----------
[Episode 2476/4000] [Steps   12] [reward 13.0]
[Episode 2477/4000] [Steps   22] [reward 23.0]
[Episode 2478/4000] [Steps   14] [reward 15.0]
[Episode 2479/4000] [Steps   14] [reward 15.0]
[Episode 2480/4000] [Steps   14] [reward 15.0]
[Episode 2481/4000] [Steps   20] [reward 21.0]
[Episode 2482/4000] [Steps   21] [reward 22.0]
[Episode 2483/4000] [Steps   20] [reward 21.0]
[Episode 2484/4000] [Steps   25] [reward 26.0]
[Episode 2485/4000] [Steps   18] [reward 19.0]
[Episode 2486/4000] [Steps   17] [reward 18.0]
[Episode 2487/4000] [Steps   19] [reward 20.0]
[Episode 2488/4000] [Steps   22] [reward 23.0]
[Episode 2489/4000] [Steps   16] [reward 17.0]
[Episode 2490/4000] [Steps   18] [reward 19.0]
[Episode 2491/4000] [Steps   20] [reward 21.0]
[Episode 2492/4000] [Steps   16] [reward 17.0]
[Episode 2493/4000] [Steps   16] [reward 17.0]
[Episode 2494/4000] [Steps   18] [reward 19.0]
[Episode 2495/4000] [Steps   13] [reward 14.0]
[Episode 2496/4000] [Steps    8] [reward 9.0]
[Episode 2497/4000] [Steps   19] [reward 20.0]
[Episode 2498/4000] [Steps   18] [reward 19.0]
[Episode 2499/4000] [Steps   19] [reward 20.0]
[Episode 2500/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 2500] [Average Reward 17.3]
----------
[Episode 2501/4000] [Steps   19] [reward 20.0]
[Episode 2502/4000] [Steps   20] [reward 21.0]
[Episode 2503/4000] [Steps   11] [reward 12.0]
[Episode 2504/4000] [Steps   15] [reward 16.0]
[Episode 2505/4000] [Steps   16] [reward 17.0]
[Episode 2506/4000] [Steps   21] [reward 22.0]
[Episode 2507/4000] [Steps   16] [reward 17.0]
[Episode 2508/4000] [Steps   14] [reward 15.0]
[Episode 2509/4000] [Steps   17] [reward 18.0]
[Episode 2510/4000] [Steps   17] [reward 18.0]
[Episode 2511/4000] [Steps   19] [reward 20.0]
[Episode 2512/4000] [Steps   15] [reward 16.0]
[Episode 2513/4000] [Steps   17] [reward 18.0]
[Episode 2514/4000] [Steps   13] [reward 14.0]
[Episode 2515/4000] [Steps   23] [reward 24.0]
[Episode 2516/4000] [Steps   18] [reward 19.0]
[Episode 2517/4000] [Steps   20] [reward 21.0]
[Episode 2518/4000] [Steps   14] [reward 15.0]
[Episode 2519/4000] [Steps   17] [reward 18.0]
[Episode 2520/4000] [Steps   18] [reward 19.0]
[Episode 2521/4000] [Steps   15] [reward 16.0]
[Episode 2522/4000] [Steps   16] [reward 17.0]
[Episode 2523/4000] [Steps   19] [reward 20.0]
[Episode 2524/4000] [Steps   20] [reward 21.0]
[Episode 2525/4000] [Steps   19] [reward 20.0]
----------
[TEST Episode 2525] [Average Reward 16.6]
----------
[Episode 2526/4000] [Steps   15] [reward 16.0]
[Episode 2527/4000] [Steps   17] [reward 18.0]
[Episode 2528/4000] [Steps   15] [reward 16.0]
[Episode 2529/4000] [Steps   16] [reward 17.0]
[Episode 2530/4000] [Steps   15] [reward 16.0]
[Episode 2531/4000] [Steps   16] [reward 17.0]
[Episode 2532/4000] [Steps   21] [reward 22.0]
[Episode 2533/4000] [Steps   17] [reward 18.0]
[Episode 2534/4000] [Steps   13] [reward 14.0]
[Episode 2535/4000] [Steps   14] [reward 15.0]
[Episode 2536/4000] [Steps   15] [reward 16.0]
[Episode 2537/4000] [Steps   24] [reward 25.0]
[Episode 2538/4000] [Steps   21] [reward 22.0]
[Episode 2539/4000] [Steps   18] [reward 19.0]
[Episode 2540/4000] [Steps   16] [reward 17.0]
[Episode 2541/4000] [Steps   18] [reward 19.0]
[Episode 2542/4000] [Steps   14] [reward 15.0]
[Episode 2543/4000] [Steps   12] [reward 13.0]
[Episode 2544/4000] [Steps   16] [reward 17.0]
[Episode 2545/4000] [Steps   14] [reward 15.0]
[Episode 2546/4000] [Steps   17] [reward 18.0]
[Episode 2547/4000] [Steps   17] [reward 18.0]
[Episode 2548/4000] [Steps   19] [reward 20.0]
[Episode 2549/4000] [Steps   18] [reward 19.0]
[Episode 2550/4000] [Steps   17] [reward 18.0]
----------
[TEST Episode 2550] [Average Reward 17.1]
----------
[Episode 2551/4000] [Steps   14] [reward 15.0]
[Episode 2552/4000] [Steps   17] [reward 18.0]
[Episode 2553/4000] [Steps   17] [reward 18.0]
[Episode 2554/4000] [Steps   24] [reward 25.0]
[Episode 2555/4000] [Steps   18] [reward 19.0]
[Episode 2556/4000] [Steps   17] [reward 18.0]
[Episode 2557/4000] [Steps   15] [reward 16.0]
[Episode 2558/4000] [Steps   20] [reward 21.0]
[Episode 2559/4000] [Steps   19] [reward 20.0]
[Episode 2560/4000] [Steps   15] [reward 16.0]
[Episode 2561/4000] [Steps   15] [reward 16.0]
[Episode 2562/4000] [Steps   19] [reward 20.0]
[Episode 2563/4000] [Steps   17] [reward 18.0]
[Episode 2564/4000] [Steps   18] [reward 19.0]
[Episode 2565/4000] [Steps   13] [reward 14.0]
[Episode 2566/4000] [Steps   18] [reward 19.0]
[Episode 2567/4000] [Steps   15] [reward 16.0]
[Episode 2568/4000] [Steps   20] [reward 21.0]
[Episode 2569/4000] [Steps   15] [reward 16.0]
[Episode 2570/4000] [Steps   16] [reward 17.0]
[Episode 2571/4000] [Steps   18] [reward 19.0]
[Episode 2572/4000] [Steps   17] [reward 18.0]
[Episode 2573/4000] [Steps   16] [reward 17.0]
[Episode 2574/4000] [Steps   17] [reward 18.0]
[Episode 2575/4000] [Steps   18] [reward 19.0]
----------
[TEST Episode 2575] [Average Reward 18.8]
----------
[Episode 2576/4000] [Steps   18] [reward 19.0]
[Episode 2577/4000] [Steps   21] [reward 22.0]
[Episode 2578/4000] [Steps   19] [reward 20.0]
[Episode 2579/4000] [Steps   11] [reward 12.0]
[Episode 2580/4000] [Steps   15] [reward 16.0]
[Episode 2581/4000] [Steps   15] [reward 16.0]
[Episode 2582/4000] [Steps   16] [reward 17.0]
[Episode 2583/4000] [Steps   16] [reward 17.0]
[Episode 2584/4000] [Steps   19] [reward 20.0]
[Episode 2585/4000] [Steps   16] [reward 17.0]
[Episode 2586/4000] [Steps   19] [reward 20.0]
[Episode 2587/4000] [Steps   17] [reward 18.0]
[Episode 2588/4000] [Steps   13] [reward 14.0]
[Episode 2589/4000] [Steps   17] [reward 18.0]
[Episode 2590/4000] [Steps   20] [reward 21.0]
[Episode 2591/4000] [Steps   19] [reward 20.0]
[Episode 2592/4000] [Steps   16] [reward 17.0]
[Episode 2593/4000] [Steps   17] [reward 18.0]
[Episode 2594/4000] [Steps   15] [reward 16.0]
[Episode 2595/4000] [Steps   16] [reward 17.0]
[Episode 2596/4000] [Steps   15] [reward 16.0]
[Episode 2597/4000] [Steps   21] [reward 22.0]
[Episode 2598/4000] [Steps   19] [reward 20.0]
[Episode 2599/4000] [Steps   16] [reward 17.0]
[Episode 2600/4000] [Steps   21] [reward 22.0]
----------
[TEST Episode 2600] [Average Reward 17.3]
----------
[Episode 2601/4000] [Steps   36] [reward 37.0]
[Episode 2602/4000] [Steps   19] [reward 20.0]
[Episode 2603/4000] [Steps   18] [reward 19.0]
[Episode 2604/4000] [Steps   16] [reward 17.0]
[Episode 2605/4000] [Steps   19] [reward 20.0]
[Episode 2606/4000] [Steps   14] [reward 15.0]
[Episode 2607/4000] [Steps   13] [reward 14.0]
[Episode 2608/4000] [Steps   18] [reward 19.0]
[Episode 2609/4000] [Steps   22] [reward 23.0]
[Episode 2610/4000] [Steps   18] [reward 19.0]
[Episode 2611/4000] [Steps   19] [reward 20.0]
[Episode 2612/4000] [Steps   19] [reward 20.0]
[Episode 2613/4000] [Steps   16] [reward 17.0]
[Episode 2614/4000] [Steps   25] [reward 26.0]
[Episode 2615/4000] [Steps   16] [reward 17.0]
[Episode 2616/4000] [Steps   20] [reward 21.0]
[Episode 2617/4000] [Steps   19] [reward 20.0]
[Episode 2618/4000] [Steps   19] [reward 20.0]
[Episode 2619/4000] [Steps   18] [reward 19.0]
[Episode 2620/4000] [Steps   18] [reward 19.0]
[Episode 2621/4000] [Steps   20] [reward 21.0]
[Episode 2622/4000] [Steps   14] [reward 15.0]
[Episode 2623/4000] [Steps   17] [reward 18.0]
[Episode 2624/4000] [Steps   24] [reward 25.0]
[Episode 2625/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 2625] [Average Reward 17.0]
----------
[Episode 2626/4000] [Steps   17] [reward 18.0]
[Episode 2627/4000] [Steps   29] [reward 30.0]
[Episode 2628/4000] [Steps   19] [reward 20.0]
[Episode 2629/4000] [Steps   16] [reward 17.0]
[Episode 2630/4000] [Steps   17] [reward 18.0]
[Episode 2631/4000] [Steps   16] [reward 17.0]
[Episode 2632/4000] [Steps   20] [reward 21.0]
[Episode 2633/4000] [Steps   21] [reward 22.0]
[Episode 2634/4000] [Steps   18] [reward 19.0]
[Episode 2635/4000] [Steps   16] [reward 17.0]
[Episode 2636/4000] [Steps   11] [reward 12.0]
[Episode 2637/4000] [Steps   16] [reward 17.0]
[Episode 2638/4000] [Steps   18] [reward 19.0]
[Episode 2639/4000] [Steps   20] [reward 21.0]
[Episode 2640/4000] [Steps   17] [reward 18.0]
[Episode 2641/4000] [Steps   19] [reward 20.0]
[Episode 2642/4000] [Steps   20] [reward 21.0]
[Episode 2643/4000] [Steps   17] [reward 18.0]
[Episode 2644/4000] [Steps   22] [reward 23.0]
[Episode 2645/4000] [Steps   19] [reward 20.0]
[Episode 2646/4000] [Steps   15] [reward 16.0]
[Episode 2647/4000] [Steps   17] [reward 18.0]
[Episode 2648/4000] [Steps   15] [reward 16.0]
[Episode 2649/4000] [Steps   15] [reward 16.0]
[Episode 2650/4000] [Steps   18] [reward 19.0]
----------
[TEST Episode 2650] [Average Reward 18.3]
----------
[Episode 2651/4000] [Steps   20] [reward 21.0]
[Episode 2652/4000] [Steps   17] [reward 18.0]
[Episode 2653/4000] [Steps   23] [reward 24.0]
[Episode 2654/4000] [Steps   14] [reward 15.0]
[Episode 2655/4000] [Steps   17] [reward 18.0]
[Episode 2656/4000] [Steps   18] [reward 19.0]
[Episode 2657/4000] [Steps   17] [reward 18.0]
[Episode 2658/4000] [Steps   21] [reward 22.0]
[Episode 2659/4000] [Steps   14] [reward 15.0]
[Episode 2660/4000] [Steps   17] [reward 18.0]
[Episode 2661/4000] [Steps   18] [reward 19.0]
[Episode 2662/4000] [Steps   19] [reward 20.0]
[Episode 2663/4000] [Steps   17] [reward 18.0]
[Episode 2664/4000] [Steps   21] [reward 22.0]
[Episode 2665/4000] [Steps   11] [reward 12.0]
[Episode 2666/4000] [Steps   21] [reward 22.0]
[Episode 2667/4000] [Steps   13] [reward 14.0]
[Episode 2668/4000] [Steps   18] [reward 19.0]
[Episode 2669/4000] [Steps   21] [reward 22.0]
[Episode 2670/4000] [Steps   21] [reward 22.0]
[Episode 2671/4000] [Steps   14] [reward 15.0]
[Episode 2672/4000] [Steps   16] [reward 17.0]
[Episode 2673/4000] [Steps   12] [reward 13.0]
[Episode 2674/4000] [Steps   14] [reward 15.0]
[Episode 2675/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 2675] [Average Reward 17.5]
----------
[Episode 2676/4000] [Steps   18] [reward 19.0]
[Episode 2677/4000] [Steps   12] [reward 13.0]
[Episode 2678/4000] [Steps   17] [reward 18.0]
[Episode 2679/4000] [Steps   21] [reward 22.0]
[Episode 2680/4000] [Steps   13] [reward 14.0]
[Episode 2681/4000] [Steps   18] [reward 19.0]
[Episode 2682/4000] [Steps   19] [reward 20.0]
[Episode 2683/4000] [Steps   16] [reward 17.0]
[Episode 2684/4000] [Steps   15] [reward 16.0]
[Episode 2685/4000] [Steps   16] [reward 17.0]
[Episode 2686/4000] [Steps   17] [reward 18.0]
[Episode 2687/4000] [Steps   28] [reward 29.0]
[Episode 2688/4000] [Steps   22] [reward 23.0]
[Episode 2689/4000] [Steps   18] [reward 19.0]
[Episode 2690/4000] [Steps   14] [reward 15.0]
[Episode 2691/4000] [Steps   19] [reward 20.0]
[Episode 2692/4000] [Steps   12] [reward 13.0]
[Episode 2693/4000] [Steps   14] [reward 15.0]
[Episode 2694/4000] [Steps   17] [reward 18.0]
[Episode 2695/4000] [Steps   14] [reward 15.0]
[Episode 2696/4000] [Steps   14] [reward 15.0]
[Episode 2697/4000] [Steps   16] [reward 17.0]
[Episode 2698/4000] [Steps   15] [reward 16.0]
[Episode 2699/4000] [Steps   22] [reward 23.0]
[Episode 2700/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 2700] [Average Reward 17.6]
----------
[Episode 2701/4000] [Steps   15] [reward 16.0]
[Episode 2702/4000] [Steps   16] [reward 17.0]
[Episode 2703/4000] [Steps   16] [reward 17.0]
[Episode 2704/4000] [Steps   14] [reward 15.0]
[Episode 2705/4000] [Steps   16] [reward 17.0]
[Episode 2706/4000] [Steps   13] [reward 14.0]
[Episode 2707/4000] [Steps   15] [reward 16.0]
[Episode 2708/4000] [Steps   15] [reward 16.0]
[Episode 2709/4000] [Steps   18] [reward 19.0]
[Episode 2710/4000] [Steps   19] [reward 20.0]
[Episode 2711/4000] [Steps   27] [reward 28.0]
[Episode 2712/4000] [Steps   19] [reward 20.0]
[Episode 2713/4000] [Steps   19] [reward 20.0]
[Episode 2714/4000] [Steps   21] [reward 22.0]
[Episode 2715/4000] [Steps   11] [reward 12.0]
[Episode 2716/4000] [Steps   20] [reward 21.0]
[Episode 2717/4000] [Steps   19] [reward 20.0]
[Episode 2718/4000] [Steps   15] [reward 16.0]
[Episode 2719/4000] [Steps   17] [reward 18.0]
[Episode 2720/4000] [Steps   21] [reward 22.0]
[Episode 2721/4000] [Steps   20] [reward 21.0]
[Episode 2722/4000] [Steps   23] [reward 24.0]
[Episode 2723/4000] [Steps   18] [reward 19.0]
[Episode 2724/4000] [Steps   17] [reward 18.0]
[Episode 2725/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 2725] [Average Reward 17.5]
----------
[Episode 2726/4000] [Steps   15] [reward 16.0]
[Episode 2727/4000] [Steps   17] [reward 18.0]
[Episode 2728/4000] [Steps   20] [reward 21.0]
[Episode 2729/4000] [Steps   21] [reward 22.0]
[Episode 2730/4000] [Steps   16] [reward 17.0]
[Episode 2731/4000] [Steps   16] [reward 17.0]
[Episode 2732/4000] [Steps   15] [reward 16.0]
[Episode 2733/4000] [Steps   19] [reward 20.0]
[Episode 2734/4000] [Steps   18] [reward 19.0]
[Episode 2735/4000] [Steps   16] [reward 17.0]
[Episode 2736/4000] [Steps   20] [reward 21.0]
[Episode 2737/4000] [Steps   19] [reward 20.0]
[Episode 2738/4000] [Steps   18] [reward 19.0]
[Episode 2739/4000] [Steps   17] [reward 18.0]
[Episode 2740/4000] [Steps   14] [reward 15.0]
[Episode 2741/4000] [Steps   21] [reward 22.0]
[Episode 2742/4000] [Steps   22] [reward 23.0]
[Episode 2743/4000] [Steps   19] [reward 20.0]
[Episode 2744/4000] [Steps   15] [reward 16.0]
[Episode 2745/4000] [Steps   19] [reward 20.0]
[Episode 2746/4000] [Steps   21] [reward 22.0]
[Episode 2747/4000] [Steps   19] [reward 20.0]
[Episode 2748/4000] [Steps   19] [reward 20.0]
[Episode 2749/4000] [Steps    9] [reward 10.0]
[Episode 2750/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 2750] [Average Reward 27.0]
----------
[Episode 2751/4000] [Steps  115] [reward 116.0]
[Episode 2752/4000] [Steps   20] [reward 21.0]
[Episode 2753/4000] [Steps   20] [reward 21.0]
[Episode 2754/4000] [Steps   12] [reward 13.0]
[Episode 2755/4000] [Steps   18] [reward 19.0]
[Episode 2756/4000] [Steps   17] [reward 18.0]
[Episode 2757/4000] [Steps   15] [reward 16.0]
[Episode 2758/4000] [Steps   13] [reward 14.0]
[Episode 2759/4000] [Steps   15] [reward 16.0]
[Episode 2760/4000] [Steps   14] [reward 15.0]
[Episode 2761/4000] [Steps   21] [reward 22.0]
[Episode 2762/4000] [Steps   14] [reward 15.0]
[Episode 2763/4000] [Steps   14] [reward 15.0]
[Episode 2764/4000] [Steps   16] [reward 17.0]
[Episode 2765/4000] [Steps   20] [reward 21.0]
[Episode 2766/4000] [Steps   15] [reward 16.0]
[Episode 2767/4000] [Steps   12] [reward 13.0]
[Episode 2768/4000] [Steps   16] [reward 17.0]
[Episode 2769/4000] [Steps   18] [reward 19.0]
[Episode 2770/4000] [Steps   15] [reward 16.0]
[Episode 2771/4000] [Steps   17] [reward 18.0]
[Episode 2772/4000] [Steps   15] [reward 16.0]
[Episode 2773/4000] [Steps   94] [reward 95.0]
[Episode 2774/4000] [Steps   18] [reward 19.0]
[Episode 2775/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 2775] [Average Reward 17.2]
----------
[Episode 2776/4000] [Steps   21] [reward 22.0]
[Episode 2777/4000] [Steps   13] [reward 14.0]
[Episode 2778/4000] [Steps   17] [reward 18.0]
[Episode 2779/4000] [Steps   17] [reward 18.0]
[Episode 2780/4000] [Steps   18] [reward 19.0]
[Episode 2781/4000] [Steps   20] [reward 21.0]
[Episode 2782/4000] [Steps   20] [reward 21.0]
[Episode 2783/4000] [Steps   20] [reward 21.0]
[Episode 2784/4000] [Steps   18] [reward 19.0]
[Episode 2785/4000] [Steps   16] [reward 17.0]
[Episode 2786/4000] [Steps   13] [reward 14.0]
[Episode 2787/4000] [Steps   16] [reward 17.0]
[Episode 2788/4000] [Steps   19] [reward 20.0]
[Episode 2789/4000] [Steps   97] [reward 98.0]
[Episode 2790/4000] [Steps   18] [reward 19.0]
[Episode 2791/4000] [Steps   19] [reward 20.0]
[Episode 2792/4000] [Steps   17] [reward 18.0]
[Episode 2793/4000] [Steps   18] [reward 19.0]
[Episode 2794/4000] [Steps   15] [reward 16.0]
[Episode 2795/4000] [Steps   87] [reward 88.0]
[Episode 2796/4000] [Steps   16] [reward 17.0]
[Episode 2797/4000] [Steps   19] [reward 20.0]
[Episode 2798/4000] [Steps   14] [reward 15.0]
[Episode 2799/4000] [Steps   13] [reward 14.0]
[Episode 2800/4000] [Steps   15] [reward 16.0]
----------
[TEST Episode 2800] [Average Reward 18.0]
----------
[Episode 2801/4000] [Steps   18] [reward 19.0]
[Episode 2802/4000] [Steps   13] [reward 14.0]
[Episode 2803/4000] [Steps   16] [reward 17.0]
[Episode 2804/4000] [Steps   14] [reward 15.0]
[Episode 2805/4000] [Steps   18] [reward 19.0]
[Episode 2806/4000] [Steps   17] [reward 18.0]
[Episode 2807/4000] [Steps   20] [reward 21.0]
[Episode 2808/4000] [Steps   13] [reward 14.0]
[Episode 2809/4000] [Steps   15] [reward 16.0]
[Episode 2810/4000] [Steps   17] [reward 18.0]
[Episode 2811/4000] [Steps   16] [reward 17.0]
[Episode 2812/4000] [Steps   16] [reward 17.0]
[Episode 2813/4000] [Steps   13] [reward 14.0]
[Episode 2814/4000] [Steps   21] [reward 22.0]
[Episode 2815/4000] [Steps   17] [reward 18.0]
[Episode 2816/4000] [Steps   20] [reward 21.0]
[Episode 2817/4000] [Steps   15] [reward 16.0]
[Episode 2818/4000] [Steps   14] [reward 15.0]
[Episode 2819/4000] [Steps   17] [reward 18.0]
[Episode 2820/4000] [Steps   17] [reward 18.0]
[Episode 2821/4000] [Steps   20] [reward 21.0]
[Episode 2822/4000] [Steps   17] [reward 18.0]
[Episode 2823/4000] [Steps   17] [reward 18.0]
[Episode 2824/4000] [Steps   18] [reward 19.0]
[Episode 2825/4000] [Steps   19] [reward 20.0]
----------
[TEST Episode 2825] [Average Reward 18.8]
----------
[Episode 2826/4000] [Steps   14] [reward 15.0]
[Episode 2827/4000] [Steps   15] [reward 16.0]
[Episode 2828/4000] [Steps   14] [reward 15.0]
[Episode 2829/4000] [Steps   14] [reward 15.0]
[Episode 2830/4000] [Steps   21] [reward 22.0]
[Episode 2831/4000] [Steps   14] [reward 15.0]
[Episode 2832/4000] [Steps   17] [reward 18.0]
[Episode 2833/4000] [Steps   15] [reward 16.0]
[Episode 2834/4000] [Steps   15] [reward 16.0]
[Episode 2835/4000] [Steps   24] [reward 25.0]
[Episode 2836/4000] [Steps   17] [reward 18.0]
[Episode 2837/4000] [Steps   20] [reward 21.0]
[Episode 2838/4000] [Steps   94] [reward 95.0]
[Episode 2839/4000] [Steps   22] [reward 23.0]
[Episode 2840/4000] [Steps   15] [reward 16.0]
[Episode 2841/4000] [Steps   17] [reward 18.0]
[Episode 2842/4000] [Steps   15] [reward 16.0]
[Episode 2843/4000] [Steps   45] [reward 46.0]
[Episode 2844/4000] [Steps   20] [reward 21.0]
[Episode 2845/4000] [Steps   15] [reward 16.0]
[Episode 2846/4000] [Steps   14] [reward 15.0]
[Episode 2847/4000] [Steps   15] [reward 16.0]
[Episode 2848/4000] [Steps   16] [reward 17.0]
[Episode 2849/4000] [Steps   21] [reward 22.0]
[Episode 2850/4000] [Steps   88] [reward 89.0]
----------
[TEST Episode 2850] [Average Reward 24.2]
----------
[Episode 2851/4000] [Steps   19] [reward 20.0]
[Episode 2852/4000] [Steps   20] [reward 21.0]
[Episode 2853/4000] [Steps   14] [reward 15.0]
[Episode 2854/4000] [Steps   16] [reward 17.0]
[Episode 2855/4000] [Steps   15] [reward 16.0]
[Episode 2856/4000] [Steps   17] [reward 18.0]
[Episode 2857/4000] [Steps   19] [reward 20.0]
[Episode 2858/4000] [Steps   17] [reward 18.0]
[Episode 2859/4000] [Steps   15] [reward 16.0]
[Episode 2860/4000] [Steps   19] [reward 20.0]
[Episode 2861/4000] [Steps   88] [reward 89.0]
[Episode 2862/4000] [Steps   89] [reward 90.0]
[Episode 2863/4000] [Steps   74] [reward 75.0]
[Episode 2864/4000] [Steps   15] [reward 16.0]
[Episode 2865/4000] [Steps   61] [reward 62.0]
[Episode 2866/4000] [Steps   61] [reward 62.0]
[Episode 2867/4000] [Steps   11] [reward 12.0]
[Episode 2868/4000] [Steps   13] [reward 14.0]
[Episode 2869/4000] [Steps   15] [reward 16.0]
[Episode 2870/4000] [Steps   19] [reward 20.0]
[Episode 2871/4000] [Steps   14] [reward 15.0]
[Episode 2872/4000] [Steps   16] [reward 17.0]
[Episode 2873/4000] [Steps   83] [reward 84.0]
[Episode 2874/4000] [Steps   18] [reward 19.0]
[Episode 2875/4000] [Steps   21] [reward 22.0]
----------
[TEST Episode 2875] [Average Reward 23.3]
----------
[Episode 2876/4000] [Steps   14] [reward 15.0]
[Episode 2877/4000] [Steps   81] [reward 82.0]
[Episode 2878/4000] [Steps   18] [reward 19.0]
[Episode 2879/4000] [Steps   21] [reward 22.0]
[Episode 2880/4000] [Steps   20] [reward 21.0]
[Episode 2881/4000] [Steps   14] [reward 15.0]
[Episode 2882/4000] [Steps   19] [reward 20.0]
[Episode 2883/4000] [Steps   17] [reward 18.0]
[Episode 2884/4000] [Steps   16] [reward 17.0]
[Episode 2885/4000] [Steps   16] [reward 17.0]
[Episode 2886/4000] [Steps   17] [reward 18.0]
[Episode 2887/4000] [Steps   22] [reward 23.0]
[Episode 2888/4000] [Steps   73] [reward 74.0]
[Episode 2889/4000] [Steps   66] [reward 67.0]
[Episode 2890/4000] [Steps   16] [reward 17.0]
[Episode 2891/4000] [Steps   20] [reward 21.0]
[Episode 2892/4000] [Steps   18] [reward 19.0]
[Episode 2893/4000] [Steps   11] [reward 12.0]
[Episode 2894/4000] [Steps   13] [reward 14.0]
[Episode 2895/4000] [Steps   67] [reward 68.0]
[Episode 2896/4000] [Steps   19] [reward 20.0]
[Episode 2897/4000] [Steps   19] [reward 20.0]
[Episode 2898/4000] [Steps   20] [reward 21.0]
[Episode 2899/4000] [Steps   14] [reward 15.0]
[Episode 2900/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 2900] [Average Reward 33.9]
----------
[Episode 2901/4000] [Steps   23] [reward 24.0]
[Episode 2902/4000] [Steps   17] [reward 18.0]
[Episode 2903/4000] [Steps   58] [reward 59.0]
[Episode 2904/4000] [Steps   58] [reward 59.0]
[Episode 2905/4000] [Steps   16] [reward 17.0]
[Episode 2906/4000] [Steps   55] [reward 56.0]
[Episode 2907/4000] [Steps   12] [reward 13.0]
[Episode 2908/4000] [Steps   21] [reward 22.0]
[Episode 2909/4000] [Steps   22] [reward 23.0]
[Episode 2910/4000] [Steps   72] [reward 73.0]
[Episode 2911/4000] [Steps   12] [reward 13.0]
[Episode 2912/4000] [Steps   74] [reward 75.0]
[Episode 2913/4000] [Steps   18] [reward 19.0]
[Episode 2914/4000] [Steps   17] [reward 18.0]
[Episode 2915/4000] [Steps   19] [reward 20.0]
[Episode 2916/4000] [Steps   25] [reward 26.0]
[Episode 2917/4000] [Steps   63] [reward 64.0]
[Episode 2918/4000] [Steps   21] [reward 22.0]
[Episode 2919/4000] [Steps   55] [reward 56.0]
[Episode 2920/4000] [Steps   55] [reward 56.0]
[Episode 2921/4000] [Steps   22] [reward 23.0]
[Episode 2922/4000] [Steps   18] [reward 19.0]
[Episode 2923/4000] [Steps   60] [reward 61.0]
[Episode 2924/4000] [Steps   16] [reward 17.0]
[Episode 2925/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 2925] [Average Reward 59.9]
----------
[Episode 2926/4000] [Steps   56] [reward 57.0]
[Episode 2927/4000] [Steps   49] [reward 50.0]
[Episode 2928/4000] [Steps   66] [reward 67.0]
[Episode 2929/4000] [Steps   53] [reward 54.0]
[Episode 2930/4000] [Steps   45] [reward 46.0]
[Episode 2931/4000] [Steps    8] [reward 9.0]
[Episode 2932/4000] [Steps   25] [reward 26.0]
[Episode 2933/4000] [Steps    9] [reward 10.0]
[Episode 2934/4000] [Steps    8] [reward 9.0]
[Episode 2935/4000] [Steps    7] [reward 8.0]
[Episode 2936/4000] [Steps    9] [reward 10.0]
[Episode 2937/4000] [Steps   11] [reward 12.0]
[Episode 2938/4000] [Steps    8] [reward 9.0]
[Episode 2939/4000] [Steps    8] [reward 9.0]
[Episode 2940/4000] [Steps   10] [reward 11.0]
[Episode 2941/4000] [Steps   29] [reward 30.0]
[Episode 2942/4000] [Steps    7] [reward 8.0]
[Episode 2943/4000] [Steps   10] [reward 11.0]
[Episode 2944/4000] [Steps    9] [reward 10.0]
[Episode 2945/4000] [Steps   10] [reward 11.0]
[Episode 2946/4000] [Steps    8] [reward 9.0]
[Episode 2947/4000] [Steps    8] [reward 9.0]
[Episode 2948/4000] [Steps   15] [reward 16.0]
[Episode 2949/4000] [Steps    8] [reward 9.0]
[Episode 2950/4000] [Steps    9] [reward 10.0]
----------
[TEST Episode 2950] [Average Reward 9.5]
----------
[Episode 2951/4000] [Steps   11] [reward 12.0]
[Episode 2952/4000] [Steps   10] [reward 11.0]
[Episode 2953/4000] [Steps    9] [reward 10.0]
[Episode 2954/4000] [Steps    9] [reward 10.0]
[Episode 2955/4000] [Steps   10] [reward 11.0]
[Episode 2956/4000] [Steps   13] [reward 14.0]
[Episode 2957/4000] [Steps    8] [reward 9.0]
[Episode 2958/4000] [Steps    8] [reward 9.0]
[Episode 2959/4000] [Steps   11] [reward 12.0]
[Episode 2960/4000] [Steps    8] [reward 9.0]
[Episode 2961/4000] [Steps   11] [reward 12.0]
[Episode 2962/4000] [Steps    9] [reward 10.0]
[Episode 2963/4000] [Steps    9] [reward 10.0]
[Episode 2964/4000] [Steps   10] [reward 11.0]
[Episode 2965/4000] [Steps   11] [reward 12.0]
[Episode 2966/4000] [Steps    9] [reward 10.0]
[Episode 2967/4000] [Steps   13] [reward 14.0]
[Episode 2968/4000] [Steps    9] [reward 10.0]
[Episode 2969/4000] [Steps    8] [reward 9.0]
[Episode 2970/4000] [Steps   13] [reward 14.0]
[Episode 2971/4000] [Steps   20] [reward 21.0]
[Episode 2972/4000] [Steps   11] [reward 12.0]
[Episode 2973/4000] [Steps   10] [reward 11.0]
[Episode 2974/4000] [Steps   12] [reward 13.0]
[Episode 2975/4000] [Steps   10] [reward 11.0]
----------
[TEST Episode 2975] [Average Reward 10.7]
----------
[Episode 2976/4000] [Steps   11] [reward 12.0]
[Episode 2977/4000] [Steps   13] [reward 14.0]
[Episode 2978/4000] [Steps   10] [reward 11.0]
[Episode 2979/4000] [Steps   17] [reward 18.0]
[Episode 2980/4000] [Steps   11] [reward 12.0]
[Episode 2981/4000] [Steps   13] [reward 14.0]
[Episode 2982/4000] [Steps    9] [reward 10.0]
[Episode 2983/4000] [Steps   12] [reward 13.0]
[Episode 2984/4000] [Steps    9] [reward 10.0]
[Episode 2985/4000] [Steps    9] [reward 10.0]
[Episode 2986/4000] [Steps    8] [reward 9.0]
[Episode 2987/4000] [Steps   10] [reward 11.0]
[Episode 2988/4000] [Steps    9] [reward 10.0]
[Episode 2989/4000] [Steps   12] [reward 13.0]
[Episode 2990/4000] [Steps    8] [reward 9.0]
[Episode 2991/4000] [Steps   10] [reward 11.0]
[Episode 2992/4000] [Steps   10] [reward 11.0]
[Episode 2993/4000] [Steps   10] [reward 11.0]
[Episode 2994/4000] [Steps    8] [reward 9.0]
[Episode 2995/4000] [Steps    7] [reward 8.0]
[Episode 2996/4000] [Steps    8] [reward 9.0]
[Episode 2997/4000] [Steps    8] [reward 9.0]
[Episode 2998/4000] [Steps    8] [reward 9.0]
[Episode 2999/4000] [Steps    9] [reward 10.0]
[Episode 3000/4000] [Steps   15] [reward 16.0]
----------
[TEST Episode 3000] [Average Reward 10.2]
----------
[Episode 3001/4000] [Steps   11] [reward 12.0]
[Episode 3002/4000] [Steps   11] [reward 12.0]
[Episode 3003/4000] [Steps   15] [reward 16.0]
[Episode 3004/4000] [Steps    9] [reward 10.0]
[Episode 3005/4000] [Steps   10] [reward 11.0]
[Episode 3006/4000] [Steps   12] [reward 13.0]
[Episode 3007/4000] [Steps    9] [reward 10.0]
[Episode 3008/4000] [Steps    8] [reward 9.0]
[Episode 3009/4000] [Steps    8] [reward 9.0]
[Episode 3010/4000] [Steps   12] [reward 13.0]
[Episode 3011/4000] [Steps   10] [reward 11.0]
[Episode 3012/4000] [Steps   11] [reward 12.0]
[Episode 3013/4000] [Steps   11] [reward 12.0]
[Episode 3014/4000] [Steps    9] [reward 10.0]
[Episode 3015/4000] [Steps    9] [reward 10.0]
[Episode 3016/4000] [Steps    7] [reward 8.0]
[Episode 3017/4000] [Steps    8] [reward 9.0]
[Episode 3018/4000] [Steps   15] [reward 16.0]
[Episode 3019/4000] [Steps    9] [reward 10.0]
[Episode 3020/4000] [Steps   13] [reward 14.0]
[Episode 3021/4000] [Steps   11] [reward 12.0]
[Episode 3022/4000] [Steps   11] [reward 12.0]
[Episode 3023/4000] [Steps   13] [reward 14.0]
[Episode 3024/4000] [Steps   14] [reward 15.0]
[Episode 3025/4000] [Steps   13] [reward 14.0]
----------
[TEST Episode 3025] [Average Reward 11.2]
----------
[Episode 3026/4000] [Steps   11] [reward 12.0]
[Episode 3027/4000] [Steps    8] [reward 9.0]
[Episode 3028/4000] [Steps    9] [reward 10.0]
[Episode 3029/4000] [Steps    9] [reward 10.0]
[Episode 3030/4000] [Steps   10] [reward 11.0]
[Episode 3031/4000] [Steps    9] [reward 10.0]
[Episode 3032/4000] [Steps    9] [reward 10.0]
[Episode 3033/4000] [Steps   11] [reward 12.0]
[Episode 3034/4000] [Steps   13] [reward 14.0]
[Episode 3035/4000] [Steps   15] [reward 16.0]
[Episode 3036/4000] [Steps   10] [reward 11.0]
[Episode 3037/4000] [Steps   19] [reward 20.0]
[Episode 3038/4000] [Steps   29] [reward 30.0]
[Episode 3039/4000] [Steps   37] [reward 38.0]
[Episode 3040/4000] [Steps   91] [reward 92.0]
[Episode 3041/4000] [Steps   15] [reward 16.0]
[Episode 3042/4000] [Steps   19] [reward 20.0]
[Episode 3043/4000] [Steps  199] [reward 200.0]
[Episode 3044/4000] [Steps   18] [reward 19.0]
[Episode 3045/4000] [Steps   17] [reward 18.0]
[Episode 3046/4000] [Steps   20] [reward 21.0]
[Episode 3047/4000] [Steps   20] [reward 21.0]
[Episode 3048/4000] [Steps   14] [reward 15.0]
[Episode 3049/4000] [Steps   16] [reward 17.0]
[Episode 3050/4000] [Steps  105] [reward 106.0]
----------
[TEST Episode 3050] [Average Reward 18.3]
----------
[Episode 3051/4000] [Steps  114] [reward 115.0]
[Episode 3052/4000] [Steps   35] [reward 36.0]
[Episode 3053/4000] [Steps   26] [reward 27.0]
[Episode 3054/4000] [Steps   35] [reward 36.0]
[Episode 3055/4000] [Steps   62] [reward 63.0]
[Episode 3056/4000] [Steps   36] [reward 37.0]
[Episode 3057/4000] [Steps   36] [reward 37.0]
[Episode 3058/4000] [Steps   85] [reward 86.0]
[Episode 3059/4000] [Steps   23] [reward 24.0]
[Episode 3060/4000] [Steps   23] [reward 24.0]
[Episode 3061/4000] [Steps   30] [reward 31.0]
[Episode 3062/4000] [Steps  155] [reward 156.0]
[Episode 3063/4000] [Steps   39] [reward 40.0]
[Episode 3064/4000] [Steps   27] [reward 28.0]
[Episode 3065/4000] [Steps   84] [reward 85.0]
[Episode 3066/4000] [Steps   94] [reward 95.0]
[Episode 3067/4000] [Steps   88] [reward 89.0]
[Episode 3068/4000] [Steps   69] [reward 70.0]
[Episode 3069/4000] [Steps   22] [reward 23.0]
[Episode 3070/4000] [Steps   21] [reward 22.0]
[Episode 3071/4000] [Steps   21] [reward 22.0]
[Episode 3072/4000] [Steps   20] [reward 21.0]
[Episode 3073/4000] [Steps   17] [reward 18.0]
[Episode 3074/4000] [Steps   18] [reward 19.0]
[Episode 3075/4000] [Steps   15] [reward 16.0]
----------
[TEST Episode 3075] [Average Reward 43.3]
----------
[Episode 3076/4000] [Steps  199] [reward 200.0]
[Episode 3077/4000] [Steps   57] [reward 58.0]
[Episode 3078/4000] [Steps   20] [reward 21.0]
[Episode 3079/4000] [Steps   65] [reward 66.0]
[Episode 3080/4000] [Steps   38] [reward 39.0]
[Episode 3081/4000] [Steps   17] [reward 18.0]
[Episode 3082/4000] [Steps   14] [reward 15.0]
[Episode 3083/4000] [Steps   33] [reward 34.0]
[Episode 3084/4000] [Steps  104] [reward 105.0]
[Episode 3085/4000] [Steps   17] [reward 18.0]
[Episode 3086/4000] [Steps  134] [reward 135.0]
[Episode 3087/4000] [Steps   19] [reward 20.0]
[Episode 3088/4000] [Steps   18] [reward 19.0]
[Episode 3089/4000] [Steps   14] [reward 15.0]
[Episode 3090/4000] [Steps   16] [reward 17.0]
[Episode 3091/4000] [Steps   32] [reward 33.0]
[Episode 3092/4000] [Steps   22] [reward 23.0]
[Episode 3093/4000] [Steps   16] [reward 17.0]
[Episode 3094/4000] [Steps   16] [reward 17.0]
[Episode 3095/4000] [Steps   15] [reward 16.0]
[Episode 3096/4000] [Steps   30] [reward 31.0]
[Episode 3097/4000] [Steps   50] [reward 51.0]
[Episode 3098/4000] [Steps   25] [reward 26.0]
[Episode 3099/4000] [Steps   23] [reward 24.0]
[Episode 3100/4000] [Steps   19] [reward 20.0]
----------
[TEST Episode 3100] [Average Reward 154.6]
----------
[Episode 3101/4000] [Steps   18] [reward 19.0]
[Episode 3102/4000] [Steps  143] [reward 144.0]
[Episode 3103/4000] [Steps   37] [reward 38.0]
[Episode 3104/4000] [Steps   13] [reward 14.0]
[Episode 3105/4000] [Steps   34] [reward 35.0]
[Episode 3106/4000] [Steps   25] [reward 26.0]
[Episode 3107/4000] [Steps   13] [reward 14.0]
[Episode 3108/4000] [Steps   56] [reward 57.0]
[Episode 3109/4000] [Steps   12] [reward 13.0]
[Episode 3110/4000] [Steps   56] [reward 57.0]
[Episode 3111/4000] [Steps   23] [reward 24.0]
[Episode 3112/4000] [Steps   43] [reward 44.0]
[Episode 3113/4000] [Steps   25] [reward 26.0]
[Episode 3114/4000] [Steps  109] [reward 110.0]
[Episode 3115/4000] [Steps  169] [reward 170.0]
[Episode 3116/4000] [Steps  143] [reward 144.0]
[Episode 3117/4000] [Steps   21] [reward 22.0]
[Episode 3118/4000] [Steps   11] [reward 12.0]
[Episode 3119/4000] [Steps  154] [reward 155.0]
[Episode 3120/4000] [Steps   82] [reward 83.0]
[Episode 3121/4000] [Steps  132] [reward 133.0]
[Episode 3122/4000] [Steps   38] [reward 39.0]
[Episode 3123/4000] [Steps  117] [reward 118.0]
[Episode 3124/4000] [Steps   23] [reward 24.0]
[Episode 3125/4000] [Steps   42] [reward 43.0]
----------
[TEST Episode 3125] [Average Reward 39.5]
----------
[Episode 3126/4000] [Steps   35] [reward 36.0]
[Episode 3127/4000] [Steps   61] [reward 62.0]
[Episode 3128/4000] [Steps   25] [reward 26.0]
[Episode 3129/4000] [Steps   15] [reward 16.0]
[Episode 3130/4000] [Steps   18] [reward 19.0]
[Episode 3131/4000] [Steps   37] [reward 38.0]
[Episode 3132/4000] [Steps   89] [reward 90.0]
[Episode 3133/4000] [Steps   92] [reward 93.0]
[Episode 3134/4000] [Steps   17] [reward 18.0]
[Episode 3135/4000] [Steps   15] [reward 16.0]
[Episode 3136/4000] [Steps   18] [reward 19.0]
[Episode 3137/4000] [Steps  150] [reward 151.0]
[Episode 3138/4000] [Steps   67] [reward 68.0]
[Episode 3139/4000] [Steps   96] [reward 97.0]
[Episode 3140/4000] [Steps   27] [reward 28.0]
[Episode 3141/4000] [Steps  100] [reward 101.0]
[Episode 3142/4000] [Steps  109] [reward 110.0]
[Episode 3143/4000] [Steps   59] [reward 60.0]
[Episode 3144/4000] [Steps   87] [reward 88.0]
[Episode 3145/4000] [Steps   67] [reward 68.0]
[Episode 3146/4000] [Steps   78] [reward 79.0]
[Episode 3147/4000] [Steps  120] [reward 121.0]
[Episode 3148/4000] [Steps   18] [reward 19.0]
[Episode 3149/4000] [Steps   35] [reward 36.0]
[Episode 3150/4000] [Steps  102] [reward 103.0]
----------
[TEST Episode 3150] [Average Reward 111.2]
----------
[Episode 3151/4000] [Steps   12] [reward 13.0]
[Episode 3152/4000] [Steps   46] [reward 47.0]
[Episode 3153/4000] [Steps   91] [reward 92.0]
[Episode 3154/4000] [Steps  102] [reward 103.0]
[Episode 3155/4000] [Steps   89] [reward 90.0]
[Episode 3156/4000] [Steps   13] [reward 14.0]
[Episode 3157/4000] [Steps   16] [reward 17.0]
[Episode 3158/4000] [Steps   56] [reward 57.0]
[Episode 3159/4000] [Steps   73] [reward 74.0]
[Episode 3160/4000] [Steps   54] [reward 55.0]
[Episode 3161/4000] [Steps  103] [reward 104.0]
[Episode 3162/4000] [Steps   19] [reward 20.0]
[Episode 3163/4000] [Steps   18] [reward 19.0]
[Episode 3164/4000] [Steps   85] [reward 86.0]
[Episode 3165/4000] [Steps   61] [reward 62.0]
[Episode 3166/4000] [Steps   64] [reward 65.0]
[Episode 3167/4000] [Steps   36] [reward 37.0]
[Episode 3168/4000] [Steps   15] [reward 16.0]
[Episode 3169/4000] [Steps   12] [reward 13.0]
[Episode 3170/4000] [Steps   57] [reward 58.0]
[Episode 3171/4000] [Steps   27] [reward 28.0]
[Episode 3172/4000] [Steps   17] [reward 18.0]
[Episode 3173/4000] [Steps   32] [reward 33.0]
[Episode 3174/4000] [Steps   27] [reward 28.0]
[Episode 3175/4000] [Steps   17] [reward 18.0]
----------
[TEST Episode 3175] [Average Reward 48.4]
----------
[Episode 3176/4000] [Steps   68] [reward 69.0]
[Episode 3177/4000] [Steps   81] [reward 82.0]
[Episode 3178/4000] [Steps   13] [reward 14.0]
[Episode 3179/4000] [Steps   11] [reward 12.0]
[Episode 3180/4000] [Steps   14] [reward 15.0]
[Episode 3181/4000] [Steps  110] [reward 111.0]
[Episode 3182/4000] [Steps   15] [reward 16.0]
[Episode 3183/4000] [Steps   41] [reward 42.0]
[Episode 3184/4000] [Steps   36] [reward 37.0]
[Episode 3185/4000] [Steps   12] [reward 13.0]
[Episode 3186/4000] [Steps   34] [reward 35.0]
[Episode 3187/4000] [Steps   41] [reward 42.0]
[Episode 3188/4000] [Steps   11] [reward 12.0]
[Episode 3189/4000] [Steps   71] [reward 72.0]
[Episode 3190/4000] [Steps   62] [reward 63.0]
[Episode 3191/4000] [Steps   84] [reward 85.0]
[Episode 3192/4000] [Steps   18] [reward 19.0]
[Episode 3193/4000] [Steps   34] [reward 35.0]
[Episode 3194/4000] [Steps   96] [reward 97.0]
[Episode 3195/4000] [Steps  117] [reward 118.0]
[Episode 3196/4000] [Steps   15] [reward 16.0]
[Episode 3197/4000] [Steps   71] [reward 72.0]
[Episode 3198/4000] [Steps   59] [reward 60.0]
[Episode 3199/4000] [Steps  106] [reward 107.0]
[Episode 3200/4000] [Steps   18] [reward 19.0]
----------
[TEST Episode 3200] [Average Reward 97.9]
----------
[Episode 3201/4000] [Steps   62] [reward 63.0]
[Episode 3202/4000] [Steps   14] [reward 15.0]
[Episode 3203/4000] [Steps   74] [reward 75.0]
[Episode 3204/4000] [Steps   11] [reward 12.0]
[Episode 3205/4000] [Steps   80] [reward 81.0]
[Episode 3206/4000] [Steps    9] [reward 10.0]
[Episode 3207/4000] [Steps   93] [reward 94.0]
[Episode 3208/4000] [Steps    9] [reward 10.0]
[Episode 3209/4000] [Steps   81] [reward 82.0]
[Episode 3210/4000] [Steps   63] [reward 64.0]
[Episode 3211/4000] [Steps   15] [reward 16.0]
[Episode 3212/4000] [Steps   10] [reward 11.0]
[Episode 3213/4000] [Steps   14] [reward 15.0]
[Episode 3214/4000] [Steps   13] [reward 14.0]
[Episode 3215/4000] [Steps   15] [reward 16.0]
[Episode 3216/4000] [Steps    9] [reward 10.0]
[Episode 3217/4000] [Steps   12] [reward 13.0]
[Episode 3218/4000] [Steps   11] [reward 12.0]
[Episode 3219/4000] [Steps   11] [reward 12.0]
[Episode 3220/4000] [Steps   12] [reward 13.0]
[Episode 3221/4000] [Steps   11] [reward 12.0]
[Episode 3222/4000] [Steps   11] [reward 12.0]
[Episode 3223/4000] [Steps   16] [reward 17.0]
[Episode 3224/4000] [Steps   13] [reward 14.0]
[Episode 3225/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 3225] [Average Reward 12.4]
----------
[Episode 3226/4000] [Steps    9] [reward 10.0]
[Episode 3227/4000] [Steps   16] [reward 17.0]
[Episode 3228/4000] [Steps   14] [reward 15.0]
[Episode 3229/4000] [Steps   19] [reward 20.0]
[Episode 3230/4000] [Steps   11] [reward 12.0]
[Episode 3231/4000] [Steps   11] [reward 12.0]
[Episode 3232/4000] [Steps   11] [reward 12.0]
[Episode 3233/4000] [Steps   12] [reward 13.0]
[Episode 3234/4000] [Steps   10] [reward 11.0]
[Episode 3235/4000] [Steps   11] [reward 12.0]
[Episode 3236/4000] [Steps   12] [reward 13.0]
[Episode 3237/4000] [Steps   15] [reward 16.0]
[Episode 3238/4000] [Steps   13] [reward 14.0]
[Episode 3239/4000] [Steps   42] [reward 43.0]
[Episode 3240/4000] [Steps   17] [reward 18.0]
[Episode 3241/4000] [Steps   70] [reward 71.0]
[Episode 3242/4000] [Steps   59] [reward 60.0]
[Episode 3243/4000] [Steps   13] [reward 14.0]
[Episode 3244/4000] [Steps   72] [reward 73.0]
[Episode 3245/4000] [Steps  108] [reward 109.0]
[Episode 3246/4000] [Steps   63] [reward 64.0]
[Episode 3247/4000] [Steps   52] [reward 53.0]
[Episode 3248/4000] [Steps   51] [reward 52.0]
[Episode 3249/4000] [Steps   51] [reward 52.0]
[Episode 3250/4000] [Steps   11] [reward 12.0]
----------
[TEST Episode 3250] [Average Reward 49.0]
----------
[Episode 3251/4000] [Steps   41] [reward 42.0]
[Episode 3252/4000] [Steps   49] [reward 50.0]
[Episode 3253/4000] [Steps   51] [reward 52.0]
[Episode 3254/4000] [Steps   41] [reward 42.0]
[Episode 3255/4000] [Steps   49] [reward 50.0]
[Episode 3256/4000] [Steps   45] [reward 46.0]
[Episode 3257/4000] [Steps   41] [reward 42.0]
[Episode 3258/4000] [Steps   47] [reward 48.0]
[Episode 3259/4000] [Steps   51] [reward 52.0]
[Episode 3260/4000] [Steps   41] [reward 42.0]
[Episode 3261/4000] [Steps   39] [reward 40.0]
[Episode 3262/4000] [Steps   49] [reward 50.0]
[Episode 3263/4000] [Steps   45] [reward 46.0]
[Episode 3264/4000] [Steps   78] [reward 79.0]
[Episode 3265/4000] [Steps   55] [reward 56.0]
[Episode 3266/4000] [Steps   48] [reward 49.0]
[Episode 3267/4000] [Steps   46] [reward 47.0]
[Episode 3268/4000] [Steps   45] [reward 46.0]
[Episode 3269/4000] [Steps   22] [reward 23.0]
[Episode 3270/4000] [Steps   38] [reward 39.0]
[Episode 3271/4000] [Steps   50] [reward 51.0]
[Episode 3272/4000] [Steps   65] [reward 66.0]
[Episode 3273/4000] [Steps   72] [reward 73.0]
[Episode 3274/4000] [Steps   58] [reward 59.0]
[Episode 3275/4000] [Steps   41] [reward 42.0]
----------
[TEST Episode 3275] [Average Reward 50.8]
----------
[Episode 3276/4000] [Steps   46] [reward 47.0]
[Episode 3277/4000] [Steps   47] [reward 48.0]
[Episode 3278/4000] [Steps   50] [reward 51.0]
[Episode 3279/4000] [Steps   59] [reward 60.0]
[Episode 3280/4000] [Steps   51] [reward 52.0]
[Episode 3281/4000] [Steps   47] [reward 48.0]
[Episode 3282/4000] [Steps   58] [reward 59.0]
[Episode 3283/4000] [Steps   90] [reward 91.0]
[Episode 3284/4000] [Steps   24] [reward 25.0]
[Episode 3285/4000] [Steps   65] [reward 66.0]
[Episode 3286/4000] [Steps   93] [reward 94.0]
[Episode 3287/4000] [Steps   90] [reward 91.0]
[Episode 3288/4000] [Steps  120] [reward 121.0]
[Episode 3289/4000] [Steps   31] [reward 32.0]
[Episode 3290/4000] [Steps   59] [reward 60.0]
[Episode 3291/4000] [Steps   69] [reward 70.0]
[Episode 3292/4000] [Steps   68] [reward 69.0]
[Episode 3293/4000] [Steps   56] [reward 57.0]
[Episode 3294/4000] [Steps   66] [reward 67.0]
[Episode 3295/4000] [Steps   99] [reward 100.0]
[Episode 3296/4000] [Steps  120] [reward 121.0]
[Episode 3297/4000] [Steps   63] [reward 64.0]
[Episode 3298/4000] [Steps   13] [reward 14.0]
[Episode 3299/4000] [Steps   60] [reward 61.0]
[Episode 3300/4000] [Steps   56] [reward 57.0]
----------
[TEST Episode 3300] [Average Reward 63.6]
----------
[Episode 3301/4000] [Steps   68] [reward 69.0]
[Episode 3302/4000] [Steps  146] [reward 147.0]
[Episode 3303/4000] [Steps   62] [reward 63.0]
[Episode 3304/4000] [Steps   49] [reward 50.0]
[Episode 3305/4000] [Steps   63] [reward 64.0]
[Episode 3306/4000] [Steps   77] [reward 78.0]
[Episode 3307/4000] [Steps   73] [reward 74.0]
[Episode 3308/4000] [Steps   77] [reward 78.0]
[Episode 3309/4000] [Steps   82] [reward 83.0]
[Episode 3310/4000] [Steps   68] [reward 69.0]
[Episode 3311/4000] [Steps  110] [reward 111.0]
[Episode 3312/4000] [Steps   67] [reward 68.0]
[Episode 3313/4000] [Steps   59] [reward 60.0]
[Episode 3314/4000] [Steps   58] [reward 59.0]
[Episode 3315/4000] [Steps   49] [reward 50.0]
[Episode 3316/4000] [Steps   78] [reward 79.0]
[Episode 3317/4000] [Steps   15] [reward 16.0]
[Episode 3318/4000] [Steps   75] [reward 76.0]
[Episode 3319/4000] [Steps   62] [reward 63.0]
[Episode 3320/4000] [Steps   54] [reward 55.0]
[Episode 3321/4000] [Steps   48] [reward 49.0]
[Episode 3322/4000] [Steps   56] [reward 57.0]
[Episode 3323/4000] [Steps   56] [reward 57.0]
[Episode 3324/4000] [Steps   47] [reward 48.0]
[Episode 3325/4000] [Steps   65] [reward 66.0]
----------
[TEST Episode 3325] [Average Reward 63.6]
----------
[Episode 3326/4000] [Steps   55] [reward 56.0]
[Episode 3327/4000] [Steps   13] [reward 14.0]
[Episode 3328/4000] [Steps   60] [reward 61.0]
[Episode 3329/4000] [Steps   54] [reward 55.0]
[Episode 3330/4000] [Steps   76] [reward 77.0]
[Episode 3331/4000] [Steps   15] [reward 16.0]
[Episode 3332/4000] [Steps   13] [reward 14.0]
[Episode 3333/4000] [Steps   14] [reward 15.0]
[Episode 3334/4000] [Steps   51] [reward 52.0]
[Episode 3335/4000] [Steps   10] [reward 11.0]
[Episode 3336/4000] [Steps   93] [reward 94.0]
[Episode 3337/4000] [Steps    8] [reward 9.0]
[Episode 3338/4000] [Steps   66] [reward 67.0]
[Episode 3339/4000] [Steps   11] [reward 12.0]
[Episode 3340/4000] [Steps    9] [reward 10.0]
[Episode 3341/4000] [Steps   62] [reward 63.0]
[Episode 3342/4000] [Steps   12] [reward 13.0]
[Episode 3343/4000] [Steps   78] [reward 79.0]
[Episode 3344/4000] [Steps   66] [reward 67.0]
[Episode 3345/4000] [Steps  108] [reward 109.0]
[Episode 3346/4000] [Steps   18] [reward 19.0]
[Episode 3347/4000] [Steps   17] [reward 18.0]
[Episode 3348/4000] [Steps   86] [reward 87.0]
[Episode 3349/4000] [Steps   14] [reward 15.0]
[Episode 3350/4000] [Steps  108] [reward 109.0]
----------
[TEST Episode 3350] [Average Reward 94.3]
----------
[Episode 3351/4000] [Steps   76] [reward 77.0]
[Episode 3352/4000] [Steps   92] [reward 93.0]
[Episode 3353/4000] [Steps   72] [reward 73.0]
[Episode 3354/4000] [Steps   20] [reward 21.0]
[Episode 3355/4000] [Steps  106] [reward 107.0]
[Episode 3356/4000] [Steps   69] [reward 70.0]
[Episode 3357/4000] [Steps  110] [reward 111.0]
[Episode 3358/4000] [Steps   31] [reward 32.0]
[Episode 3359/4000] [Steps   77] [reward 78.0]
[Episode 3360/4000] [Steps   17] [reward 18.0]
[Episode 3361/4000] [Steps   13] [reward 14.0]
[Episode 3362/4000] [Steps   67] [reward 68.0]
[Episode 3363/4000] [Steps   58] [reward 59.0]
[Episode 3364/4000] [Steps   75] [reward 76.0]
[Episode 3365/4000] [Steps   76] [reward 77.0]
[Episode 3366/4000] [Steps   11] [reward 12.0]
[Episode 3367/4000] [Steps   51] [reward 52.0]
[Episode 3368/4000] [Steps  110] [reward 111.0]
[Episode 3369/4000] [Steps  122] [reward 123.0]
[Episode 3370/4000] [Steps  199] [reward 200.0]
[Episode 3371/4000] [Steps   43] [reward 44.0]
[Episode 3372/4000] [Steps   77] [reward 78.0]
[Episode 3373/4000] [Steps   57] [reward 58.0]
[Episode 3374/4000] [Steps   61] [reward 62.0]
[Episode 3375/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 3375] [Average Reward 77.0]
----------
[Episode 3376/4000] [Steps   53] [reward 54.0]
[Episode 3377/4000] [Steps   74] [reward 75.0]
[Episode 3378/4000] [Steps   70] [reward 71.0]
[Episode 3379/4000] [Steps   51] [reward 52.0]
[Episode 3380/4000] [Steps   37] [reward 38.0]
[Episode 3381/4000] [Steps   37] [reward 38.0]
[Episode 3382/4000] [Steps   74] [reward 75.0]
[Episode 3383/4000] [Steps   46] [reward 47.0]
[Episode 3384/4000] [Steps   13] [reward 14.0]
[Episode 3385/4000] [Steps    8] [reward 9.0]
[Episode 3386/4000] [Steps   11] [reward 12.0]
[Episode 3387/4000] [Steps    9] [reward 10.0]
[Episode 3388/4000] [Steps   13] [reward 14.0]
[Episode 3389/4000] [Steps  125] [reward 126.0]
[Episode 3390/4000] [Steps    9] [reward 10.0]
[Episode 3391/4000] [Steps    8] [reward 9.0]
[Episode 3392/4000] [Steps  199] [reward 200.0]
[Episode 3393/4000] [Steps   70] [reward 71.0]
[Episode 3394/4000] [Steps  199] [reward 200.0]
[Episode 3395/4000] [Steps   17] [reward 18.0]
[Episode 3396/4000] [Steps  199] [reward 200.0]
[Episode 3397/4000] [Steps  199] [reward 200.0]
[Episode 3398/4000] [Steps  199] [reward 200.0]
[Episode 3399/4000] [Steps  199] [reward 200.0]
[Episode 3400/4000] [Steps   51] [reward 52.0]
----------
[TEST Episode 3400] [Average Reward 200.0]
----------
[Episode 3401/4000] [Steps  199] [reward 200.0]
[Episode 3402/4000] [Steps  199] [reward 200.0]
[Episode 3403/4000] [Steps   95] [reward 96.0]
[Episode 3404/4000] [Steps  199] [reward 200.0]
[Episode 3405/4000] [Steps  199] [reward 200.0]
[Episode 3406/4000] [Steps  143] [reward 144.0]
[Episode 3407/4000] [Steps  199] [reward 200.0]
[Episode 3408/4000] [Steps  199] [reward 200.0]
[Episode 3409/4000] [Steps   17] [reward 18.0]
[Episode 3410/4000] [Steps  138] [reward 139.0]
[Episode 3411/4000] [Steps  199] [reward 200.0]
[Episode 3412/4000] [Steps  125] [reward 126.0]
[Episode 3413/4000] [Steps  107] [reward 108.0]
[Episode 3414/4000] [Steps   84] [reward 85.0]
[Episode 3415/4000] [Steps  199] [reward 200.0]
[Episode 3416/4000] [Steps  134] [reward 135.0]
[Episode 3417/4000] [Steps  141] [reward 142.0]
[Episode 3418/4000] [Steps  104] [reward 105.0]
[Episode 3419/4000] [Steps   66] [reward 67.0]
[Episode 3420/4000] [Steps  199] [reward 200.0]
[Episode 3421/4000] [Steps  154] [reward 155.0]
[Episode 3422/4000] [Steps  117] [reward 118.0]
[Episode 3423/4000] [Steps  199] [reward 200.0]
[Episode 3424/4000] [Steps  134] [reward 135.0]
[Episode 3425/4000] [Steps  135] [reward 136.0]
----------
[TEST Episode 3425] [Average Reward 111.8]
----------
[Episode 3426/4000] [Steps   63] [reward 64.0]
[Episode 3427/4000] [Steps  199] [reward 200.0]
[Episode 3428/4000] [Steps   79] [reward 80.0]
[Episode 3429/4000] [Steps   25] [reward 26.0]
[Episode 3430/4000] [Steps  144] [reward 145.0]
[Episode 3431/4000] [Steps  166] [reward 167.0]
[Episode 3432/4000] [Steps   67] [reward 68.0]
[Episode 3433/4000] [Steps  111] [reward 112.0]
[Episode 3434/4000] [Steps   90] [reward 91.0]
[Episode 3435/4000] [Steps  120] [reward 121.0]
[Episode 3436/4000] [Steps  199] [reward 200.0]
[Episode 3437/4000] [Steps   22] [reward 23.0]
[Episode 3438/4000] [Steps  129] [reward 130.0]
[Episode 3439/4000] [Steps  143] [reward 144.0]
[Episode 3440/4000] [Steps   26] [reward 27.0]
[Episode 3441/4000] [Steps   24] [reward 25.0]
[Episode 3442/4000] [Steps   21] [reward 22.0]
[Episode 3443/4000] [Steps   19] [reward 20.0]
[Episode 3444/4000] [Steps   19] [reward 20.0]
[Episode 3445/4000] [Steps   24] [reward 25.0]
[Episode 3446/4000] [Steps  103] [reward 104.0]
[Episode 3447/4000] [Steps   33] [reward 34.0]
[Episode 3448/4000] [Steps   79] [reward 80.0]
[Episode 3449/4000] [Steps   13] [reward 14.0]
[Episode 3450/4000] [Steps   74] [reward 75.0]
----------
[TEST Episode 3450] [Average Reward 114.5]
----------
[Episode 3451/4000] [Steps  138] [reward 139.0]
[Episode 3452/4000] [Steps  100] [reward 101.0]
[Episode 3453/4000] [Steps   85] [reward 86.0]
[Episode 3454/4000] [Steps   73] [reward 74.0]
[Episode 3455/4000] [Steps   13] [reward 14.0]
[Episode 3456/4000] [Steps   13] [reward 14.0]
[Episode 3457/4000] [Steps   12] [reward 13.0]
[Episode 3458/4000] [Steps   10] [reward 11.0]
[Episode 3459/4000] [Steps   18] [reward 19.0]
[Episode 3460/4000] [Steps   11] [reward 12.0]
[Episode 3461/4000] [Steps   13] [reward 14.0]
[Episode 3462/4000] [Steps   11] [reward 12.0]
[Episode 3463/4000] [Steps   13] [reward 14.0]
[Episode 3464/4000] [Steps   11] [reward 12.0]
[Episode 3465/4000] [Steps   61] [reward 62.0]
[Episode 3466/4000] [Steps   22] [reward 23.0]
[Episode 3467/4000] [Steps   17] [reward 18.0]
[Episode 3468/4000] [Steps   19] [reward 20.0]
[Episode 3469/4000] [Steps   14] [reward 15.0]
[Episode 3470/4000] [Steps   17] [reward 18.0]
[Episode 3471/4000] [Steps   19] [reward 20.0]
[Episode 3472/4000] [Steps   15] [reward 16.0]
[Episode 3473/4000] [Steps   16] [reward 17.0]
[Episode 3474/4000] [Steps   20] [reward 21.0]
[Episode 3475/4000] [Steps   29] [reward 30.0]
----------
[TEST Episode 3475] [Average Reward 26.8]
----------
[Episode 3476/4000] [Steps   10] [reward 11.0]
[Episode 3477/4000] [Steps   10] [reward 11.0]
[Episode 3478/4000] [Steps   11] [reward 12.0]
[Episode 3479/4000] [Steps   10] [reward 11.0]
[Episode 3480/4000] [Steps   13] [reward 14.0]
[Episode 3481/4000] [Steps   12] [reward 13.0]
[Episode 3482/4000] [Steps   12] [reward 13.0]
[Episode 3483/4000] [Steps  134] [reward 135.0]
[Episode 3484/4000] [Steps   75] [reward 76.0]
[Episode 3485/4000] [Steps   14] [reward 15.0]
[Episode 3486/4000] [Steps   14] [reward 15.0]
[Episode 3487/4000] [Steps   13] [reward 14.0]
[Episode 3488/4000] [Steps   13] [reward 14.0]
[Episode 3489/4000] [Steps   16] [reward 17.0]
[Episode 3490/4000] [Steps  115] [reward 116.0]
[Episode 3491/4000] [Steps  129] [reward 130.0]
[Episode 3492/4000] [Steps  101] [reward 102.0]
[Episode 3493/4000] [Steps   11] [reward 12.0]
[Episode 3494/4000] [Steps  131] [reward 132.0]
[Episode 3495/4000] [Steps   13] [reward 14.0]
[Episode 3496/4000] [Steps  101] [reward 102.0]
[Episode 3497/4000] [Steps  119] [reward 120.0]
[Episode 3498/4000] [Steps  146] [reward 147.0]
[Episode 3499/4000] [Steps   14] [reward 15.0]
[Episode 3500/4000] [Steps   12] [reward 13.0]
----------
[TEST Episode 3500] [Average Reward 146.4]
----------
[Episode 3501/4000] [Steps  161] [reward 162.0]
[Episode 3502/4000] [Steps  199] [reward 200.0]
[Episode 3503/4000] [Steps  188] [reward 189.0]
[Episode 3504/4000] [Steps  199] [reward 200.0]
[Episode 3505/4000] [Steps  140] [reward 141.0]
[Episode 3506/4000] [Steps  124] [reward 125.0]
[Episode 3507/4000] [Steps  199] [reward 200.0]
[Episode 3508/4000] [Steps   19] [reward 20.0]
[Episode 3509/4000] [Steps  139] [reward 140.0]
[Episode 3510/4000] [Steps  108] [reward 109.0]
[Episode 3511/4000] [Steps   17] [reward 18.0]
[Episode 3512/4000] [Steps  136] [reward 137.0]
[Episode 3513/4000] [Steps  128] [reward 129.0]
[Episode 3514/4000] [Steps  116] [reward 117.0]
[Episode 3515/4000] [Steps   11] [reward 12.0]
[Episode 3516/4000] [Steps   14] [reward 15.0]
[Episode 3517/4000] [Steps   12] [reward 13.0]
[Episode 3518/4000] [Steps  110] [reward 111.0]
[Episode 3519/4000] [Steps  104] [reward 105.0]
[Episode 3520/4000] [Steps   86] [reward 87.0]
[Episode 3521/4000] [Steps   80] [reward 81.0]
[Episode 3522/4000] [Steps   39] [reward 40.0]
[Episode 3523/4000] [Steps   74] [reward 75.0]
[Episode 3524/4000] [Steps   13] [reward 14.0]
[Episode 3525/4000] [Steps  109] [reward 110.0]
----------
[TEST Episode 3525] [Average Reward 135.6]
----------
[Episode 3526/4000] [Steps  111] [reward 112.0]
[Episode 3527/4000] [Steps  107] [reward 108.0]
[Episode 3528/4000] [Steps   84] [reward 85.0]
[Episode 3529/4000] [Steps   94] [reward 95.0]
[Episode 3530/4000] [Steps  117] [reward 118.0]
[Episode 3531/4000] [Steps   42] [reward 43.0]
[Episode 3532/4000] [Steps  140] [reward 141.0]
[Episode 3533/4000] [Steps   11] [reward 12.0]
[Episode 3534/4000] [Steps   14] [reward 15.0]
[Episode 3535/4000] [Steps   13] [reward 14.0]
[Episode 3536/4000] [Steps    8] [reward 9.0]
[Episode 3537/4000] [Steps   16] [reward 17.0]
[Episode 3538/4000] [Steps    9] [reward 10.0]
[Episode 3539/4000] [Steps   12] [reward 13.0]
[Episode 3540/4000] [Steps   10] [reward 11.0]
[Episode 3541/4000] [Steps   13] [reward 14.0]
[Episode 3542/4000] [Steps    9] [reward 10.0]
[Episode 3543/4000] [Steps   13] [reward 14.0]
[Episode 3544/4000] [Steps    9] [reward 10.0]
[Episode 3545/4000] [Steps   11] [reward 12.0]
[Episode 3546/4000] [Steps    8] [reward 9.0]
[Episode 3547/4000] [Steps   13] [reward 14.0]
[Episode 3548/4000] [Steps   15] [reward 16.0]
[Episode 3549/4000] [Steps   11] [reward 12.0]
[Episode 3550/4000] [Steps   10] [reward 11.0]
----------
[TEST Episode 3550] [Average Reward 42.0]
----------
[Episode 3551/4000] [Steps   10] [reward 11.0]
[Episode 3552/4000] [Steps   10] [reward 11.0]
[Episode 3553/4000] [Steps  133] [reward 134.0]
[Episode 3554/4000] [Steps   13] [reward 14.0]
[Episode 3555/4000] [Steps  199] [reward 200.0]
[Episode 3556/4000] [Steps  126] [reward 127.0]
[Episode 3557/4000] [Steps   72] [reward 73.0]
[Episode 3558/4000] [Steps  199] [reward 200.0]
[Episode 3559/4000] [Steps  125] [reward 126.0]
[Episode 3560/4000] [Steps  113] [reward 114.0]
[Episode 3561/4000] [Steps   29] [reward 30.0]
[Episode 3562/4000] [Steps  180] [reward 181.0]
[Episode 3563/4000] [Steps  105] [reward 106.0]
[Episode 3564/4000] [Steps  154] [reward 155.0]
[Episode 3565/4000] [Steps  145] [reward 146.0]
[Episode 3566/4000] [Steps   36] [reward 37.0]
[Episode 3567/4000] [Steps   98] [reward 99.0]
[Episode 3568/4000] [Steps   13] [reward 14.0]
[Episode 3569/4000] [Steps  100] [reward 101.0]
[Episode 3570/4000] [Steps   13] [reward 14.0]
[Episode 3571/4000] [Steps   67] [reward 68.0]
[Episode 3572/4000] [Steps   68] [reward 69.0]
[Episode 3573/4000] [Steps   31] [reward 32.0]
[Episode 3574/4000] [Steps   12] [reward 13.0]
[Episode 3575/4000] [Steps   15] [reward 16.0]
----------
[TEST Episode 3575] [Average Reward 17.9]
----------
[Episode 3576/4000] [Steps   19] [reward 20.0]
[Episode 3577/4000] [Steps   18] [reward 19.0]
[Episode 3578/4000] [Steps   12] [reward 13.0]
[Episode 3579/4000] [Steps  128] [reward 129.0]
[Episode 3580/4000] [Steps   60] [reward 61.0]
[Episode 3581/4000] [Steps   56] [reward 57.0]
[Episode 3582/4000] [Steps   14] [reward 15.0]
[Episode 3583/4000] [Steps   18] [reward 19.0]
[Episode 3584/4000] [Steps   35] [reward 36.0]
[Episode 3585/4000] [Steps   66] [reward 67.0]
[Episode 3586/4000] [Steps   39] [reward 40.0]
[Episode 3587/4000] [Steps  149] [reward 150.0]
[Episode 3588/4000] [Steps  127] [reward 128.0]
[Episode 3589/4000] [Steps   12] [reward 13.0]
[Episode 3590/4000] [Steps   24] [reward 25.0]
[Episode 3591/4000] [Steps   15] [reward 16.0]
[Episode 3592/4000] [Steps   14] [reward 15.0]
[Episode 3593/4000] [Steps   12] [reward 13.0]
[Episode 3594/4000] [Steps   20] [reward 21.0]
[Episode 3595/4000] [Steps   20] [reward 21.0]
[Episode 3596/4000] [Steps   13] [reward 14.0]
[Episode 3597/4000] [Steps   15] [reward 16.0]
[Episode 3598/4000] [Steps   15] [reward 16.0]
[Episode 3599/4000] [Steps   19] [reward 20.0]
[Episode 3600/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 3600] [Average Reward 15.9]
----------
[Episode 3601/4000] [Steps   18] [reward 19.0]
[Episode 3602/4000] [Steps   16] [reward 17.0]
[Episode 3603/4000] [Steps   17] [reward 18.0]
[Episode 3604/4000] [Steps   63] [reward 64.0]
[Episode 3605/4000] [Steps   19] [reward 20.0]
[Episode 3606/4000] [Steps   56] [reward 57.0]
[Episode 3607/4000] [Steps    8] [reward 9.0]
[Episode 3608/4000] [Steps   55] [reward 56.0]
[Episode 3609/4000] [Steps  128] [reward 129.0]
[Episode 3610/4000] [Steps   81] [reward 82.0]
[Episode 3611/4000] [Steps   31] [reward 32.0]
[Episode 3612/4000] [Steps  151] [reward 152.0]
[Episode 3613/4000] [Steps   13] [reward 14.0]
[Episode 3614/4000] [Steps   19] [reward 20.0]
[Episode 3615/4000] [Steps   13] [reward 14.0]
[Episode 3616/4000] [Steps   19] [reward 20.0]
[Episode 3617/4000] [Steps   15] [reward 16.0]
[Episode 3618/4000] [Steps   13] [reward 14.0]
[Episode 3619/4000] [Steps   87] [reward 88.0]
[Episode 3620/4000] [Steps   78] [reward 79.0]
[Episode 3621/4000] [Steps   66] [reward 67.0]
[Episode 3622/4000] [Steps   32] [reward 33.0]
[Episode 3623/4000] [Steps   19] [reward 20.0]
[Episode 3624/4000] [Steps  104] [reward 105.0]
[Episode 3625/4000] [Steps   14] [reward 15.0]
----------
[TEST Episode 3625] [Average Reward 92.7]
----------
[Episode 3626/4000] [Steps   96] [reward 97.0]
[Episode 3627/4000] [Steps   14] [reward 15.0]
[Episode 3628/4000] [Steps   17] [reward 18.0]
[Episode 3629/4000] [Steps   10] [reward 11.0]
[Episode 3630/4000] [Steps   13] [reward 14.0]
[Episode 3631/4000] [Steps   11] [reward 12.0]
[Episode 3632/4000] [Steps   10] [reward 11.0]
[Episode 3633/4000] [Steps   12] [reward 13.0]
[Episode 3634/4000] [Steps    9] [reward 10.0]
[Episode 3635/4000] [Steps   10] [reward 11.0]
[Episode 3636/4000] [Steps   15] [reward 16.0]
[Episode 3637/4000] [Steps  102] [reward 103.0]
[Episode 3638/4000] [Steps    9] [reward 10.0]
[Episode 3639/4000] [Steps   76] [reward 77.0]
[Episode 3640/4000] [Steps   87] [reward 88.0]
[Episode 3641/4000] [Steps  117] [reward 118.0]
[Episode 3642/4000] [Steps   11] [reward 12.0]
[Episode 3643/4000] [Steps   13] [reward 14.0]
[Episode 3644/4000] [Steps   42] [reward 43.0]
[Episode 3645/4000] [Steps  155] [reward 156.0]
[Episode 3646/4000] [Steps   17] [reward 18.0]
[Episode 3647/4000] [Steps   14] [reward 15.0]
[Episode 3648/4000] [Steps  101] [reward 102.0]
[Episode 3649/4000] [Steps   11] [reward 12.0]
[Episode 3650/4000] [Steps   19] [reward 20.0]
----------
[TEST Episode 3650] [Average Reward 17.6]
----------
[Episode 3651/4000] [Steps   15] [reward 16.0]
[Episode 3652/4000] [Steps  106] [reward 107.0]
[Episode 3653/4000] [Steps   29] [reward 30.0]
[Episode 3654/4000] [Steps  120] [reward 121.0]
[Episode 3655/4000] [Steps  140] [reward 141.0]
[Episode 3656/4000] [Steps   30] [reward 31.0]
[Episode 3657/4000] [Steps   12] [reward 13.0]
[Episode 3658/4000] [Steps   15] [reward 16.0]
[Episode 3659/4000] [Steps  123] [reward 124.0]
[Episode 3660/4000] [Steps  120] [reward 121.0]
[Episode 3661/4000] [Steps   13] [reward 14.0]
[Episode 3662/4000] [Steps   13] [reward 14.0]
[Episode 3663/4000] [Steps   10] [reward 11.0]
[Episode 3664/4000] [Steps  108] [reward 109.0]
[Episode 3665/4000] [Steps   13] [reward 14.0]
[Episode 3666/4000] [Steps  104] [reward 105.0]
[Episode 3667/4000] [Steps  105] [reward 106.0]
[Episode 3668/4000] [Steps   30] [reward 31.0]
[Episode 3669/4000] [Steps   11] [reward 12.0]
[Episode 3670/4000] [Steps   87] [reward 88.0]
[Episode 3671/4000] [Steps   41] [reward 42.0]
[Episode 3672/4000] [Steps   14] [reward 15.0]
[Episode 3673/4000] [Steps   30] [reward 31.0]
[Episode 3674/4000] [Steps    8] [reward 9.0]
[Episode 3675/4000] [Steps   11] [reward 12.0]
----------
[TEST Episode 3675] [Average Reward 13.4]
----------
[Episode 3676/4000] [Steps   10] [reward 11.0]
[Episode 3677/4000] [Steps   11] [reward 12.0]
[Episode 3678/4000] [Steps    9] [reward 10.0]
[Episode 3679/4000] [Steps    8] [reward 9.0]
[Episode 3680/4000] [Steps   18] [reward 19.0]
[Episode 3681/4000] [Steps    9] [reward 10.0]
[Episode 3682/4000] [Steps   14] [reward 15.0]
[Episode 3683/4000] [Steps   10] [reward 11.0]
[Episode 3684/4000] [Steps   10] [reward 11.0]
[Episode 3685/4000] [Steps   12] [reward 13.0]
[Episode 3686/4000] [Steps   12] [reward 13.0]
[Episode 3687/4000] [Steps   13] [reward 14.0]
[Episode 3688/4000] [Steps  119] [reward 120.0]
[Episode 3689/4000] [Steps   32] [reward 33.0]
[Episode 3690/4000] [Steps  107] [reward 108.0]
[Episode 3691/4000] [Steps  120] [reward 121.0]
[Episode 3692/4000] [Steps   11] [reward 12.0]
[Episode 3693/4000] [Steps  125] [reward 126.0]
[Episode 3694/4000] [Steps   16] [reward 17.0]
[Episode 3695/4000] [Steps  128] [reward 129.0]
[Episode 3696/4000] [Steps  118] [reward 119.0]
[Episode 3697/4000] [Steps  121] [reward 122.0]
[Episode 3698/4000] [Steps   18] [reward 19.0]
[Episode 3699/4000] [Steps   14] [reward 15.0]
[Episode 3700/4000] [Steps  146] [reward 147.0]
----------
[TEST Episode 3700] [Average Reward 17.7]
----------
[Episode 3701/4000] [Steps   21] [reward 22.0]
[Episode 3702/4000] [Steps   13] [reward 14.0]
[Episode 3703/4000] [Steps  131] [reward 132.0]
[Episode 3704/4000] [Steps   11] [reward 12.0]
[Episode 3705/4000] [Steps   10] [reward 11.0]
[Episode 3706/4000] [Steps   13] [reward 14.0]
[Episode 3707/4000] [Steps   11] [reward 12.0]
[Episode 3708/4000] [Steps   11] [reward 12.0]
[Episode 3709/4000] [Steps   11] [reward 12.0]
[Episode 3710/4000] [Steps   10] [reward 11.0]
[Episode 3711/4000] [Steps  179] [reward 180.0]
[Episode 3712/4000] [Steps  161] [reward 162.0]
[Episode 3713/4000] [Steps  147] [reward 148.0]
[Episode 3714/4000] [Steps  199] [reward 200.0]
[Episode 3715/4000] [Steps  199] [reward 200.0]
[Episode 3716/4000] [Steps   19] [reward 20.0]
[Episode 3717/4000] [Steps  117] [reward 118.0]
[Episode 3718/4000] [Steps  177] [reward 178.0]
[Episode 3719/4000] [Steps  199] [reward 200.0]
[Episode 3720/4000] [Steps  115] [reward 116.0]
[Episode 3721/4000] [Steps  103] [reward 104.0]
[Episode 3722/4000] [Steps   95] [reward 96.0]
[Episode 3723/4000] [Steps  134] [reward 135.0]
[Episode 3724/4000] [Steps  169] [reward 170.0]
[Episode 3725/4000] [Steps   52] [reward 53.0]
----------
[TEST Episode 3725] [Average Reward 141.3]
----------
[Episode 3726/4000] [Steps   22] [reward 23.0]
[Episode 3727/4000] [Steps  132] [reward 133.0]
[Episode 3728/4000] [Steps   16] [reward 17.0]
[Episode 3729/4000] [Steps   88] [reward 89.0]
[Episode 3730/4000] [Steps   12] [reward 13.0]
[Episode 3731/4000] [Steps   95] [reward 96.0]
[Episode 3732/4000] [Steps   18] [reward 19.0]
[Episode 3733/4000] [Steps   59] [reward 60.0]
[Episode 3734/4000] [Steps   31] [reward 32.0]
[Episode 3735/4000] [Steps   22] [reward 23.0]
[Episode 3736/4000] [Steps  133] [reward 134.0]
[Episode 3737/4000] [Steps  121] [reward 122.0]
[Episode 3738/4000] [Steps   58] [reward 59.0]
[Episode 3739/4000] [Steps  161] [reward 162.0]
[Episode 3740/4000] [Steps  199] [reward 200.0]
[Episode 3741/4000] [Steps  103] [reward 104.0]
[Episode 3742/4000] [Steps   18] [reward 19.0]
[Episode 3743/4000] [Steps  105] [reward 106.0]
[Episode 3744/4000] [Steps  168] [reward 169.0]
[Episode 3745/4000] [Steps   17] [reward 18.0]
[Episode 3746/4000] [Steps   19] [reward 20.0]
[Episode 3747/4000] [Steps   15] [reward 16.0]
[Episode 3748/4000] [Steps   82] [reward 83.0]
[Episode 3749/4000] [Steps   17] [reward 18.0]
[Episode 3750/4000] [Steps   17] [reward 18.0]
----------
[TEST Episode 3750] [Average Reward 63.1]
----------
[Episode 3751/4000] [Steps   40] [reward 41.0]
[Episode 3752/4000] [Steps  163] [reward 164.0]
[Episode 3753/4000] [Steps   67] [reward 68.0]
[Episode 3754/4000] [Steps   42] [reward 43.0]
[Episode 3755/4000] [Steps   97] [reward 98.0]
[Episode 3756/4000] [Steps  165] [reward 166.0]
[Episode 3757/4000] [Steps   39] [reward 40.0]
[Episode 3758/4000] [Steps   13] [reward 14.0]
[Episode 3759/4000] [Steps  133] [reward 134.0]
[Episode 3760/4000] [Steps   67] [reward 68.0]
[Episode 3761/4000] [Steps  145] [reward 146.0]
[Episode 3762/4000] [Steps  154] [reward 155.0]
[Episode 3763/4000] [Steps   65] [reward 66.0]
[Episode 3764/4000] [Steps  199] [reward 200.0]
[Episode 3765/4000] [Steps  131] [reward 132.0]
[Episode 3766/4000] [Steps  113] [reward 114.0]
[Episode 3767/4000] [Steps  112] [reward 113.0]
[Episode 3768/4000] [Steps  152] [reward 153.0]
[Episode 3769/4000] [Steps  137] [reward 138.0]
[Episode 3770/4000] [Steps  113] [reward 114.0]
[Episode 3771/4000] [Steps  106] [reward 107.0]
[Episode 3772/4000] [Steps  114] [reward 115.0]
[Episode 3773/4000] [Steps  128] [reward 129.0]
[Episode 3774/4000] [Steps  140] [reward 141.0]
[Episode 3775/4000] [Steps   34] [reward 35.0]
----------
[TEST Episode 3775] [Average Reward 123.9]
----------
[Episode 3776/4000] [Steps  113] [reward 114.0]
[Episode 3777/4000] [Steps   34] [reward 35.0]
[Episode 3778/4000] [Steps  106] [reward 107.0]
[Episode 3779/4000] [Steps   20] [reward 21.0]
[Episode 3780/4000] [Steps  115] [reward 116.0]
[Episode 3781/4000] [Steps   11] [reward 12.0]
[Episode 3782/4000] [Steps  123] [reward 124.0]
[Episode 3783/4000] [Steps  117] [reward 118.0]
[Episode 3784/4000] [Steps   11] [reward 12.0]
[Episode 3785/4000] [Steps   69] [reward 70.0]
[Episode 3786/4000] [Steps  140] [reward 141.0]
[Episode 3787/4000] [Steps   21] [reward 22.0]
[Episode 3788/4000] [Steps  102] [reward 103.0]
[Episode 3789/4000] [Steps   36] [reward 37.0]
[Episode 3790/4000] [Steps  185] [reward 186.0]
[Episode 3791/4000] [Steps  124] [reward 125.0]
[Episode 3792/4000] [Steps  143] [reward 144.0]
[Episode 3793/4000] [Steps  199] [reward 200.0]
[Episode 3794/4000] [Steps   12] [reward 13.0]
[Episode 3795/4000] [Steps   10] [reward 11.0]
[Episode 3796/4000] [Steps  131] [reward 132.0]
[Episode 3797/4000] [Steps   12] [reward 13.0]
[Episode 3798/4000] [Steps   78] [reward 79.0]
[Episode 3799/4000] [Steps   33] [reward 34.0]
[Episode 3800/4000] [Steps   13] [reward 14.0]
----------
[TEST Episode 3800] [Average Reward 66.4]
----------
[Episode 3801/4000] [Steps  133] [reward 134.0]
[Episode 3802/4000] [Steps   15] [reward 16.0]
[Episode 3803/4000] [Steps   13] [reward 14.0]
[Episode 3804/4000] [Steps   23] [reward 24.0]
[Episode 3805/4000] [Steps   15] [reward 16.0]
[Episode 3806/4000] [Steps   43] [reward 44.0]
[Episode 3807/4000] [Steps   15] [reward 16.0]
[Episode 3808/4000] [Steps  106] [reward 107.0]
[Episode 3809/4000] [Steps   71] [reward 72.0]
[Episode 3810/4000] [Steps   81] [reward 82.0]
[Episode 3811/4000] [Steps   16] [reward 17.0]
[Episode 3812/4000] [Steps   15] [reward 16.0]
[Episode 3813/4000] [Steps   12] [reward 13.0]
[Episode 3814/4000] [Steps  118] [reward 119.0]
[Episode 3815/4000] [Steps   30] [reward 31.0]
[Episode 3816/4000] [Steps   11] [reward 12.0]
[Episode 3817/4000] [Steps   25] [reward 26.0]
[Episode 3818/4000] [Steps   10] [reward 11.0]
[Episode 3819/4000] [Steps   12] [reward 13.0]
[Episode 3820/4000] [Steps   13] [reward 14.0]
[Episode 3821/4000] [Steps   12] [reward 13.0]
[Episode 3822/4000] [Steps  130] [reward 131.0]
[Episode 3823/4000] [Steps  127] [reward 128.0]
[Episode 3824/4000] [Steps   12] [reward 13.0]
[Episode 3825/4000] [Steps   16] [reward 17.0]
----------
[TEST Episode 3825] [Average Reward 81.5]
----------
[Episode 3826/4000] [Steps   78] [reward 79.0]
[Episode 3827/4000] [Steps  135] [reward 136.0]
[Episode 3828/4000] [Steps   28] [reward 29.0]
[Episode 3829/4000] [Steps  114] [reward 115.0]
[Episode 3830/4000] [Steps  127] [reward 128.0]
[Episode 3831/4000] [Steps  130] [reward 131.0]
[Episode 3832/4000] [Steps  108] [reward 109.0]
[Episode 3833/4000] [Steps   72] [reward 73.0]
[Episode 3834/4000] [Steps   94] [reward 95.0]
[Episode 3835/4000] [Steps   11] [reward 12.0]
[Episode 3836/4000] [Steps   91] [reward 92.0]
[Episode 3837/4000] [Steps   89] [reward 90.0]
[Episode 3838/4000] [Steps   93] [reward 94.0]
[Episode 3839/4000] [Steps   59] [reward 60.0]
[Episode 3840/4000] [Steps   21] [reward 22.0]
[Episode 3841/4000] [Steps   92] [reward 93.0]
[Episode 3842/4000] [Steps   94] [reward 95.0]
[Episode 3843/4000] [Steps  117] [reward 118.0]
[Episode 3844/4000] [Steps   88] [reward 89.0]
[Episode 3845/4000] [Steps   55] [reward 56.0]
[Episode 3846/4000] [Steps  199] [reward 200.0]
[Episode 3847/4000] [Steps  102] [reward 103.0]
[Episode 3848/4000] [Steps   31] [reward 32.0]
[Episode 3849/4000] [Steps   42] [reward 43.0]
[Episode 3850/4000] [Steps   13] [reward 14.0]
----------
[TEST Episode 3850] [Average Reward 73.0]
----------
[Episode 3851/4000] [Steps   14] [reward 15.0]
[Episode 3852/4000] [Steps   82] [reward 83.0]
[Episode 3853/4000] [Steps   17] [reward 18.0]
[Episode 3854/4000] [Steps  137] [reward 138.0]
[Episode 3855/4000] [Steps   42] [reward 43.0]
[Episode 3856/4000] [Steps   95] [reward 96.0]
[Episode 3857/4000] [Steps   71] [reward 72.0]
[Episode 3858/4000] [Steps  103] [reward 104.0]
[Episode 3859/4000] [Steps   84] [reward 85.0]
[Episode 3860/4000] [Steps   13] [reward 14.0]
[Episode 3861/4000] [Steps   22] [reward 23.0]
[Episode 3862/4000] [Steps   61] [reward 62.0]
[Episode 3863/4000] [Steps   45] [reward 46.0]
[Episode 3864/4000] [Steps   63] [reward 64.0]
[Episode 3865/4000] [Steps  146] [reward 147.0]
[Episode 3866/4000] [Steps   38] [reward 39.0]
[Episode 3867/4000] [Steps  128] [reward 129.0]
[Episode 3868/4000] [Steps  127] [reward 128.0]
[Episode 3869/4000] [Steps   37] [reward 38.0]
[Episode 3870/4000] [Steps  113] [reward 114.0]
[Episode 3871/4000] [Steps  113] [reward 114.0]
[Episode 3872/4000] [Steps  137] [reward 138.0]
[Episode 3873/4000] [Steps   32] [reward 33.0]
[Episode 3874/4000] [Steps  109] [reward 110.0]
[Episode 3875/4000] [Steps   96] [reward 97.0]
----------
[TEST Episode 3875] [Average Reward 69.1]
----------
[Episode 3876/4000] [Steps   98] [reward 99.0]
[Episode 3877/4000] [Steps   81] [reward 82.0]
[Episode 3878/4000] [Steps  133] [reward 134.0]
[Episode 3879/4000] [Steps  103] [reward 104.0]
[Episode 3880/4000] [Steps   98] [reward 99.0]
[Episode 3881/4000] [Steps   33] [reward 34.0]
[Episode 3882/4000] [Steps   90] [reward 91.0]
[Episode 3883/4000] [Steps   16] [reward 17.0]
[Episode 3884/4000] [Steps   17] [reward 18.0]
[Episode 3885/4000] [Steps  199] [reward 200.0]
[Episode 3886/4000] [Steps   31] [reward 32.0]
[Episode 3887/4000] [Steps   12] [reward 13.0]
[Episode 3888/4000] [Steps  135] [reward 136.0]
[Episode 3889/4000] [Steps  199] [reward 200.0]
[Episode 3890/4000] [Steps  109] [reward 110.0]
[Episode 3891/4000] [Steps   48] [reward 49.0]
[Episode 3892/4000] [Steps   23] [reward 24.0]
[Episode 3893/4000] [Steps   36] [reward 37.0]
[Episode 3894/4000] [Steps  106] [reward 107.0]
[Episode 3895/4000] [Steps   82] [reward 83.0]
[Episode 3896/4000] [Steps   14] [reward 15.0]
[Episode 3897/4000] [Steps  111] [reward 112.0]
[Episode 3898/4000] [Steps   66] [reward 67.0]
[Episode 3899/4000] [Steps  199] [reward 200.0]
[Episode 3900/4000] [Steps  120] [reward 121.0]
----------
[TEST Episode 3900] [Average Reward 71.9]
----------
[Episode 3901/4000] [Steps   16] [reward 17.0]
[Episode 3902/4000] [Steps  104] [reward 105.0]
[Episode 3903/4000] [Steps   10] [reward 11.0]
[Episode 3904/4000] [Steps   14] [reward 15.0]
[Episode 3905/4000] [Steps  120] [reward 121.0]
[Episode 3906/4000] [Steps    8] [reward 9.0]
[Episode 3907/4000] [Steps   14] [reward 15.0]
[Episode 3908/4000] [Steps   11] [reward 12.0]
[Episode 3909/4000] [Steps   10] [reward 11.0]
[Episode 3910/4000] [Steps   15] [reward 16.0]
[Episode 3911/4000] [Steps   12] [reward 13.0]
[Episode 3912/4000] [Steps   62] [reward 63.0]
[Episode 3913/4000] [Steps   92] [reward 93.0]
[Episode 3914/4000] [Steps  180] [reward 181.0]
[Episode 3915/4000] [Steps  194] [reward 195.0]
[Episode 3916/4000] [Steps   13] [reward 14.0]
[Episode 3917/4000] [Steps  199] [reward 200.0]
[Episode 3918/4000] [Steps  197] [reward 198.0]
[Episode 3919/4000] [Steps  199] [reward 200.0]
[Episode 3920/4000] [Steps   11] [reward 12.0]
[Episode 3921/4000] [Steps   16] [reward 17.0]
[Episode 3922/4000] [Steps   22] [reward 23.0]
[Episode 3923/4000] [Steps  199] [reward 200.0]
[Episode 3924/4000] [Steps  199] [reward 200.0]
[Episode 3925/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3925] [Average Reward 64.5]
----------
[Episode 3926/4000] [Steps  199] [reward 200.0]
[Episode 3927/4000] [Steps   52] [reward 53.0]
[Episode 3928/4000] [Steps  199] [reward 200.0]
[Episode 3929/4000] [Steps  155] [reward 156.0]
[Episode 3930/4000] [Steps  199] [reward 200.0]
[Episode 3931/4000] [Steps  199] [reward 200.0]
[Episode 3932/4000] [Steps  112] [reward 113.0]
[Episode 3933/4000] [Steps  199] [reward 200.0]
[Episode 3934/4000] [Steps  199] [reward 200.0]
[Episode 3935/4000] [Steps   92] [reward 93.0]
[Episode 3936/4000] [Steps  199] [reward 200.0]
[Episode 3937/4000] [Steps   85] [reward 86.0]
[Episode 3938/4000] [Steps  176] [reward 177.0]
[Episode 3939/4000] [Steps  199] [reward 200.0]
[Episode 3940/4000] [Steps  199] [reward 200.0]
[Episode 3941/4000] [Steps  199] [reward 200.0]
[Episode 3942/4000] [Steps   16] [reward 17.0]
[Episode 3943/4000] [Steps   24] [reward 25.0]
[Episode 3944/4000] [Steps  199] [reward 200.0]
[Episode 3945/4000] [Steps  199] [reward 200.0]
[Episode 3946/4000] [Steps   97] [reward 98.0]
[Episode 3947/4000] [Steps  199] [reward 200.0]
[Episode 3948/4000] [Steps  104] [reward 105.0]
[Episode 3949/4000] [Steps  199] [reward 200.0]
[Episode 3950/4000] [Steps   78] [reward 79.0]
----------
[TEST Episode 3950] [Average Reward 200.0]
----------
[Episode 3951/4000] [Steps  199] [reward 200.0]
[Episode 3952/4000] [Steps  199] [reward 200.0]
[Episode 3953/4000] [Steps   17] [reward 18.0]
[Episode 3954/4000] [Steps   65] [reward 66.0]
[Episode 3955/4000] [Steps  199] [reward 200.0]
[Episode 3956/4000] [Steps  199] [reward 200.0]
[Episode 3957/4000] [Steps   16] [reward 17.0]
[Episode 3958/4000] [Steps   27] [reward 28.0]
[Episode 3959/4000] [Steps   18] [reward 19.0]
[Episode 3960/4000] [Steps  199] [reward 200.0]
[Episode 3961/4000] [Steps   40] [reward 41.0]
[Episode 3962/4000] [Steps  199] [reward 200.0]
[Episode 3963/4000] [Steps  199] [reward 200.0]
[Episode 3964/4000] [Steps   13] [reward 14.0]
[Episode 3965/4000] [Steps  199] [reward 200.0]
[Episode 3966/4000] [Steps   58] [reward 59.0]
[Episode 3967/4000] [Steps  199] [reward 200.0]
[Episode 3968/4000] [Steps  199] [reward 200.0]
[Episode 3969/4000] [Steps   56] [reward 57.0]
[Episode 3970/4000] [Steps  199] [reward 200.0]
[Episode 3971/4000] [Steps  199] [reward 200.0]
[Episode 3972/4000] [Steps  199] [reward 200.0]
[Episode 3973/4000] [Steps  199] [reward 200.0]
[Episode 3974/4000] [Steps  199] [reward 200.0]
[Episode 3975/4000] [Steps   29] [reward 30.0]
----------
[TEST Episode 3975] [Average Reward 200.0]
----------
[Episode 3976/4000] [Steps  199] [reward 200.0]
[Episode 3977/4000] [Steps  149] [reward 150.0]
[Episode 3978/4000] [Steps  199] [reward 200.0]
[Episode 3979/4000] [Steps  199] [reward 200.0]
[Episode 3980/4000] [Steps  199] [reward 200.0]
[Episode 3981/4000] [Steps  199] [reward 200.0]
[Episode 3982/4000] [Steps  131] [reward 132.0]
[Episode 3983/4000] [Steps   55] [reward 56.0]
[Episode 3984/4000] [Steps   85] [reward 86.0]
[Episode 3985/4000] [Steps  199] [reward 200.0]
[Episode 3986/4000] [Steps  164] [reward 165.0]
[Episode 3987/4000] [Steps   94] [reward 95.0]
[Episode 3988/4000] [Steps  147] [reward 148.0]
[Episode 3989/4000] [Steps  199] [reward 200.0]
[Episode 3990/4000] [Steps   73] [reward 74.0]
[Episode 3991/4000] [Steps   66] [reward 67.0]
[Episode 3992/4000] [Steps  137] [reward 138.0]
[Episode 3993/4000] [Steps   88] [reward 89.0]
[Episode 3994/4000] [Steps  133] [reward 134.0]
[Episode 3995/4000] [Steps  107] [reward 108.0]
[Episode 3996/4000] [Steps  135] [reward 136.0]
[Episode 3997/4000] [Steps  139] [reward 140.0]
[Episode 3998/4000] [Steps  199] [reward 200.0]
[Episode 3999/4000] [Steps   83] [reward 84.0]
[Episode 4000/4000] [Steps   79] [reward 80.0]
----------
[TEST Episode 4000] [Average Reward 137.1]
----------


### 2.4 Add the Experience Replay Buffer

If you read the DQN paper (and as you can see from the algorithm picture above), the authors make use of an experience replay buffer to learn faster. We provide an implementation in the file `replay_buffer.py`. Update the `train_reinforcement_learning` code to push a tuple to the replay buffer and to sample a batch for the `optimize_model` function.

**[QUESTION 5 points]** How does the replay buffer improve performances?

The replay buffer would speed up the training time and make the result converges earlier. DQN with replay buffer would coverge around 1000 episodes, which is about 4 times earlier based on the trails. Sampling previous states would enocurage the training for more general cases such as motion in early episodes and reduce the correlation of the tuple with the tuple immediately before it. This would result in better training since training data has less dependency on each other.

In [12]:
## PASTE YOUR TERMINAL OUTPUT HERE
# NOTE: TO HAVE LESS LINES PRINTED, YOU CAN SET THE VARIABLE PRINT_INTERVAL TO 5 or 10
[Episode   10/4000] [Steps    8] [reward 9.0]
[Episode   20/4000] [Steps   14] [reward 15.0]
----------
saving model.
[TEST Episode 25] [Average Reward 9.6]
----------
[Episode   30/4000] [Steps   12] [reward 13.0]
[Episode   40/4000] [Steps    8] [reward 9.0]
[Episode   50/4000] [Steps   10] [reward 11.0]
----------
saving model.
[TEST Episode 50] [Average Reward 9.7]
----------
[Episode   60/4000] [Steps   10] [reward 11.0]
[Episode   70/4000] [Steps    9] [reward 10.0]
----------
[TEST Episode 75] [Average Reward 9.0]
----------
[Episode   80/4000] [Steps    9] [reward 10.0]
[Episode   90/4000] [Steps    8] [reward 9.0]
[Episode  100/4000] [Steps    8] [reward 9.0]
----------
[TEST Episode 100] [Average Reward 9.5]
----------
[Episode  110/4000] [Steps   12] [reward 13.0]
[Episode  120/4000] [Steps   10] [reward 11.0]
----------
saving model.
[TEST Episode 125] [Average Reward 11.2]
----------
[Episode  130/4000] [Steps    9] [reward 10.0]
[Episode  140/4000] [Steps   11] [reward 12.0]
[Episode  150/4000] [Steps   12] [reward 13.0]
----------
saving model.
[TEST Episode 150] [Average Reward 14.1]
----------
[Episode  160/4000] [Steps   35] [reward 36.0]
[Episode  170/4000] [Steps   13] [reward 14.0]
----------
saving model.
[TEST Episode 175] [Average Reward 28.6]
----------
[Episode  180/4000] [Steps   11] [reward 12.0]
[Episode  190/4000] [Steps   14] [reward 15.0]
[Episode  200/4000] [Steps   32] [reward 33.0]
----------
saving model.
[TEST Episode 200] [Average Reward 36.9]
----------
[Episode  210/4000] [Steps   23] [reward 24.0]
[Episode  220/4000] [Steps   67] [reward 68.0]
----------
saving model.
[TEST Episode 225] [Average Reward 38.6]
----------
[Episode  230/4000] [Steps   79] [reward 80.0]
[Episode  240/4000] [Steps   41] [reward 42.0]
[Episode  250/4000] [Steps   30] [reward 31.0]
----------
saving model.
[TEST Episode 250] [Average Reward 54.4]
----------
[Episode  260/4000] [Steps   33] [reward 34.0]
[Episode  270/4000] [Steps   25] [reward 26.0]
----------
[TEST Episode 275] [Average Reward 40.8]
----------
[Episode  280/4000] [Steps   27] [reward 28.0]
[Episode  290/4000] [Steps   38] [reward 39.0]
[Episode  300/4000] [Steps   34] [reward 35.0]
----------
[TEST Episode 300] [Average Reward 46.0]
----------
[Episode  310/4000] [Steps   39] [reward 40.0]
[Episode  320/4000] [Steps   46] [reward 47.0]
----------
saving model.
[TEST Episode 325] [Average Reward 82.9]
----------
[Episode  330/4000] [Steps  199] [reward 200.0]
[Episode  340/4000] [Steps  133] [reward 134.0]
[Episode  350/4000] [Steps  199] [reward 200.0]
----------
saving model.
[TEST Episode 350] [Average Reward 196.2]
----------
[Episode  360/4000] [Steps  199] [reward 200.0]
[Episode  370/4000] [Steps  187] [reward 188.0]
----------
[TEST Episode 375] [Average Reward 190.5]
----------
[Episode  380/4000] [Steps  149] [reward 150.0]
[Episode  390/4000] [Steps  137] [reward 138.0]
[Episode  400/4000] [Steps  116] [reward 117.0]
----------
[TEST Episode 400] [Average Reward 122.5]
----------
[Episode  410/4000] [Steps   42] [reward 43.0]
[Episode  420/4000] [Steps   22] [reward 23.0]
----------
[TEST Episode 425] [Average Reward 104.9]
----------
[Episode  430/4000] [Steps  109] [reward 110.0]
[Episode  440/4000] [Steps  123] [reward 124.0]
[Episode  450/4000] [Steps  119] [reward 120.0]
----------
[TEST Episode 450] [Average Reward 115.3]
----------
[Episode  460/4000] [Steps  158] [reward 159.0]
[Episode  470/4000] [Steps  120] [reward 121.0]
----------
[TEST Episode 475] [Average Reward 167.0]
----------
[Episode  480/4000] [Steps  182] [reward 183.0]
[Episode  490/4000] [Steps  141] [reward 142.0]
[Episode  500/4000] [Steps  144] [reward 145.0]
----------
[TEST Episode 500] [Average Reward 151.4]
----------
[Episode  510/4000] [Steps  143] [reward 144.0]
[Episode  520/4000] [Steps   28] [reward 29.0]
----------
[TEST Episode 525] [Average Reward 136.0]
----------
[Episode  530/4000] [Steps  135] [reward 136.0]
[Episode  540/4000] [Steps  135] [reward 136.0]
[Episode  550/4000] [Steps  123] [reward 124.0]
----------
[TEST Episode 550] [Average Reward 139.1]
----------
[Episode  560/4000] [Steps  127] [reward 128.0]
[Episode  570/4000] [Steps  146] [reward 147.0]
----------
[TEST Episode 575] [Average Reward 141.8]
----------
[Episode  580/4000] [Steps  136] [reward 137.0]
[Episode  590/4000] [Steps  143] [reward 144.0]
[Episode  600/4000] [Steps  145] [reward 146.0]
----------
[TEST Episode 600] [Average Reward 152.9]
----------
[Episode  610/4000] [Steps   43] [reward 44.0]
[Episode  620/4000] [Steps  136] [reward 137.0]
----------
[TEST Episode 625] [Average Reward 147.8]
----------
[Episode  630/4000] [Steps  144] [reward 145.0]
[Episode  640/4000] [Steps  146] [reward 147.0]
[Episode  650/4000] [Steps  145] [reward 146.0]
----------
[TEST Episode 650] [Average Reward 169.2]
----------
[Episode  660/4000] [Steps  167] [reward 168.0]
[Episode  670/4000] [Steps  158] [reward 159.0]
----------
[TEST Episode 675] [Average Reward 166.7]
----------
[Episode  680/4000] [Steps  153] [reward 154.0]
[Episode  690/4000] [Steps  168] [reward 169.0]
[Episode  700/4000] [Steps  151] [reward 152.0]
----------
[TEST Episode 700] [Average Reward 164.1]
----------
[Episode  710/4000] [Steps  137] [reward 138.0]
[Episode  720/4000] [Steps  141] [reward 142.0]
----------
[TEST Episode 725] [Average Reward 151.8]
----------
[Episode  730/4000] [Steps  153] [reward 154.0]
[Episode  740/4000] [Steps  142] [reward 143.0]
[Episode  750/4000] [Steps  144] [reward 145.0]
----------
[TEST Episode 750] [Average Reward 147.2]
----------
[Episode  760/4000] [Steps  140] [reward 141.0]
[Episode  770/4000] [Steps  136] [reward 137.0]
----------
[TEST Episode 775] [Average Reward 151.4]
----------
[Episode  780/4000] [Steps  141] [reward 142.0]
[Episode  790/4000] [Steps  159] [reward 160.0]
[Episode  800/4000] [Steps  155] [reward 156.0]
----------
[TEST Episode 800] [Average Reward 160.3]
----------
[Episode  810/4000] [Steps  132] [reward 133.0]
[Episode  820/4000] [Steps  155] [reward 156.0]
----------
[TEST Episode 825] [Average Reward 179.8]
----------
[Episode  830/4000] [Steps  156] [reward 157.0]
[Episode  840/4000] [Steps  199] [reward 200.0]
[Episode  850/4000] [Steps  119] [reward 120.0]
----------
[TEST Episode 850] [Average Reward 194.0]
----------
[Episode  860/4000] [Steps   92] [reward 93.0]
[Episode  870/4000] [Steps  160] [reward 161.0]
----------
[TEST Episode 875] [Average Reward 178.6]
----------
[Episode  880/4000] [Steps  199] [reward 200.0]
[Episode  890/4000] [Steps  199] [reward 200.0]
[Episode  900/4000] [Steps  199] [reward 200.0]
----------
saving model.
[TEST Episode 900] [Average Reward 199.0]
----------
[Episode  910/4000] [Steps  189] [reward 190.0]
[Episode  920/4000] [Steps  162] [reward 163.0]
----------
saving model.
[TEST Episode 925] [Average Reward 199.3]
----------
[Episode  930/4000] [Steps  199] [reward 200.0]
[Episode  940/4000] [Steps  199] [reward 200.0]
[Episode  950/4000] [Steps  199] [reward 200.0]
----------
saving model.
[TEST Episode 950] [Average Reward 200.0]
----------
[Episode  960/4000] [Steps  131] [reward 132.0]
[Episode  970/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 975] [Average Reward 200.0]
----------
[Episode  980/4000] [Steps  199] [reward 200.0]
[Episode  990/4000] [Steps  199] [reward 200.0]
[Episode 1000/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1000] [Average Reward 200.0]
----------
[Episode 1010/4000] [Steps  199] [reward 200.0]
[Episode 1020/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1025] [Average Reward 200.0]
----------
[Episode 1030/4000] [Steps  199] [reward 200.0]
[Episode 1040/4000] [Steps  199] [reward 200.0]
[Episode 1050/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1050] [Average Reward 200.0]
----------
[Episode 1060/4000] [Steps   61] [reward 62.0]
[Episode 1070/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1075] [Average Reward 200.0]
----------
[Episode 1080/4000] [Steps  199] [reward 200.0]
[Episode 1090/4000] [Steps  199] [reward 200.0]
[Episode 1100/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1100] [Average Reward 200.0]
----------
[Episode 1110/4000] [Steps  199] [reward 200.0]
[Episode 1120/4000] [Steps  175] [reward 176.0]
----------
[TEST Episode 1125] [Average Reward 194.6]
----------
[Episode 1130/4000] [Steps  168] [reward 169.0]
[Episode 1140/4000] [Steps  199] [reward 200.0]
[Episode 1150/4000] [Steps  169] [reward 170.0]
----------
[TEST Episode 1150] [Average Reward 182.7]
----------
[Episode 1160/4000] [Steps  199] [reward 200.0]
[Episode 1170/4000] [Steps  194] [reward 195.0]
----------
[TEST Episode 1175] [Average Reward 176.6]
----------
[Episode 1180/4000] [Steps  199] [reward 200.0]
[Episode 1190/4000] [Steps  197] [reward 198.0]
[Episode 1200/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1200] [Average Reward 195.9]
----------
[Episode 1210/4000] [Steps  199] [reward 200.0]
[Episode 1220/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1225] [Average Reward 200.0]
----------
[Episode 1230/4000] [Steps  199] [reward 200.0]
[Episode 1240/4000] [Steps  199] [reward 200.0]
[Episode 1250/4000] [Steps  193] [reward 194.0]
----------
[TEST Episode 1250] [Average Reward 200.0]
----------
[Episode 1260/4000] [Steps  199] [reward 200.0]
[Episode 1270/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1275] [Average Reward 200.0]
----------
[Episode 1280/4000] [Steps  199] [reward 200.0]
[Episode 1290/4000] [Steps  199] [reward 200.0]
[Episode 1300/4000] [Steps   95] [reward 96.0]
----------
[TEST Episode 1300] [Average Reward 200.0]
----------
[Episode 1310/4000] [Steps  199] [reward 200.0]
[Episode 1320/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1325] [Average Reward 200.0]
----------
[Episode 1330/4000] [Steps  199] [reward 200.0]
[Episode 1340/4000] [Steps  199] [reward 200.0]
[Episode 1350/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1350] [Average Reward 200.0]
----------
[Episode 1360/4000] [Steps  199] [reward 200.0]
[Episode 1370/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1375] [Average Reward 200.0]
----------
[Episode 1380/4000] [Steps  199] [reward 200.0]
[Episode 1390/4000] [Steps  158] [reward 159.0]
[Episode 1400/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1400] [Average Reward 200.0]
----------
[Episode 1410/4000] [Steps  199] [reward 200.0]
[Episode 1420/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1425] [Average Reward 199.0]
----------
[Episode 1430/4000] [Steps  197] [reward 198.0]
[Episode 1440/4000] [Steps  199] [reward 200.0]
[Episode 1450/4000] [Steps  196] [reward 197.0]
----------
[TEST Episode 1450] [Average Reward 200.0]
----------
[Episode 1460/4000] [Steps  199] [reward 200.0]
[Episode 1470/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1475] [Average Reward 198.2]
----------
[Episode 1480/4000] [Steps  199] [reward 200.0]
[Episode 1490/4000] [Steps  199] [reward 200.0]
[Episode 1500/4000] [Steps  166] [reward 167.0]
----------
[TEST Episode 1500] [Average Reward 185.2]
----------
[Episode 1510/4000] [Steps  199] [reward 200.0]
[Episode 1520/4000] [Steps  184] [reward 185.0]
----------
[TEST Episode 1525] [Average Reward 200.0]
----------
[Episode 1530/4000] [Steps  199] [reward 200.0]
[Episode 1540/4000] [Steps  199] [reward 200.0]
[Episode 1550/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1550] [Average Reward 182.0]
----------
[Episode 1560/4000] [Steps  178] [reward 179.0]
[Episode 1570/4000] [Steps  197] [reward 198.0]
----------
[TEST Episode 1575] [Average Reward 196.6]
----------
[Episode 1580/4000] [Steps  168] [reward 169.0]
[Episode 1590/4000] [Steps  186] [reward 187.0]
[Episode 1600/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1600] [Average Reward 200.0]
----------
[Episode 1610/4000] [Steps  199] [reward 200.0]
[Episode 1620/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1625] [Average Reward 200.0]
----------
[Episode 1630/4000] [Steps  199] [reward 200.0]
[Episode 1640/4000] [Steps  199] [reward 200.0]
[Episode 1650/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1650] [Average Reward 200.0]
----------
[Episode 1660/4000] [Steps  199] [reward 200.0]
[Episode 1670/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1675] [Average Reward 200.0]
----------
[Episode 1680/4000] [Steps  199] [reward 200.0]
[Episode 1690/4000] [Steps  199] [reward 200.0]
[Episode 1700/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1700] [Average Reward 200.0]
----------
[Episode 1710/4000] [Steps  199] [reward 200.0]
[Episode 1720/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1725] [Average Reward 200.0]
----------
[Episode 1730/4000] [Steps  199] [reward 200.0]
[Episode 1740/4000] [Steps  199] [reward 200.0]
[Episode 1750/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1750] [Average Reward 200.0]
----------
[Episode 1760/4000] [Steps  199] [reward 200.0]
[Episode 1770/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1775] [Average Reward 200.0]
----------
[Episode 1780/4000] [Steps  199] [reward 200.0]
[Episode 1790/4000] [Steps  199] [reward 200.0]
[Episode 1800/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1800] [Average Reward 200.0]
----------
[Episode 1810/4000] [Steps  199] [reward 200.0]
[Episode 1820/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1825] [Average Reward 200.0]
----------
[Episode 1830/4000] [Steps  199] [reward 200.0]
[Episode 1840/4000] [Steps  163] [reward 164.0]
[Episode 1850/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1850] [Average Reward 200.0]
----------
[Episode 1860/4000] [Steps  199] [reward 200.0]
[Episode 1870/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1875] [Average Reward 200.0]
----------
[Episode 1880/4000] [Steps  199] [reward 200.0]
[Episode 1890/4000] [Steps  199] [reward 200.0]
[Episode 1900/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1900] [Average Reward 200.0]
----------
[Episode 1910/4000] [Steps   14] [reward 15.0]
[Episode 1920/4000] [Steps   49] [reward 50.0]
----------
[TEST Episode 1925] [Average Reward 200.0]
----------
[Episode 1930/4000] [Steps  199] [reward 200.0]
[Episode 1940/4000] [Steps  199] [reward 200.0]
[Episode 1950/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1950] [Average Reward 200.0]
----------
[Episode 1960/4000] [Steps  199] [reward 200.0]
[Episode 1970/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 1975] [Average Reward 200.0]
----------
[Episode 1980/4000] [Steps  199] [reward 200.0]
[Episode 1990/4000] [Steps  199] [reward 200.0]
[Episode 2000/4000] [Steps  190] [reward 191.0]
----------
[TEST Episode 2000] [Average Reward 200.0]
----------
[Episode 2010/4000] [Steps  199] [reward 200.0]
[Episode 2020/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2025] [Average Reward 200.0]
----------
[Episode 2030/4000] [Steps  199] [reward 200.0]
[Episode 2040/4000] [Steps  199] [reward 200.0]
[Episode 2050/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2050] [Average Reward 200.0]
----------
[Episode 2060/4000] [Steps  199] [reward 200.0]
[Episode 2070/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2075] [Average Reward 200.0]
----------
[Episode 2080/4000] [Steps  199] [reward 200.0]
[Episode 2090/4000] [Steps  199] [reward 200.0]
[Episode 2100/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2100] [Average Reward 200.0]
----------
[Episode 2110/4000] [Steps  199] [reward 200.0]
[Episode 2120/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2125] [Average Reward 200.0]
----------
[Episode 2130/4000] [Steps  199] [reward 200.0]
[Episode 2140/4000] [Steps  199] [reward 200.0]
[Episode 2150/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2150] [Average Reward 200.0]
----------
[Episode 2160/4000] [Steps  199] [reward 200.0]
[Episode 2170/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2175] [Average Reward 200.0]
----------
[Episode 2180/4000] [Steps  199] [reward 200.0]
[Episode 2190/4000] [Steps  199] [reward 200.0]
[Episode 2200/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2200] [Average Reward 200.0]
----------
[Episode 2210/4000] [Steps   26] [reward 27.0]
[Episode 2220/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2225] [Average Reward 200.0]
----------
[Episode 2230/4000] [Steps  199] [reward 200.0]
[Episode 2240/4000] [Steps  199] [reward 200.0]
[Episode 2250/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2250] [Average Reward 200.0]
----------
[Episode 2260/4000] [Steps  199] [reward 200.0]
[Episode 2270/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2275] [Average Reward 200.0]
----------
[Episode 2280/4000] [Steps  199] [reward 200.0]
[Episode 2290/4000] [Steps  199] [reward 200.0]
[Episode 2300/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2300] [Average Reward 200.0]
----------
[Episode 2310/4000] [Steps  136] [reward 137.0]
[Episode 2320/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2325] [Average Reward 200.0]
----------
[Episode 2330/4000] [Steps  199] [reward 200.0]
[Episode 2340/4000] [Steps  199] [reward 200.0]
[Episode 2350/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2350] [Average Reward 200.0]
----------
[Episode 2360/4000] [Steps  199] [reward 200.0]
[Episode 2370/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2375] [Average Reward 200.0]
----------
[Episode 2380/4000] [Steps  199] [reward 200.0]
[Episode 2390/4000] [Steps  199] [reward 200.0]
[Episode 2400/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2400] [Average Reward 200.0]
----------
[Episode 2410/4000] [Steps  199] [reward 200.0]
[Episode 2420/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2425] [Average Reward 200.0]
----------
[Episode 2430/4000] [Steps  199] [reward 200.0]
[Episode 2440/4000] [Steps  199] [reward 200.0]
[Episode 2450/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2450] [Average Reward 200.0]
----------
[Episode 2460/4000] [Steps  199] [reward 200.0]
[Episode 2470/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2475] [Average Reward 200.0]
----------
[Episode 2480/4000] [Steps  199] [reward 200.0]
[Episode 2490/4000] [Steps  199] [reward 200.0]
[Episode 2500/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2500] [Average Reward 200.0]
----------
[Episode 2510/4000] [Steps  199] [reward 200.0]
[Episode 2520/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2525] [Average Reward 200.0]
----------
[Episode 2530/4000] [Steps  199] [reward 200.0]
[Episode 2540/4000] [Steps  199] [reward 200.0]
[Episode 2550/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2550] [Average Reward 200.0]
----------
[Episode 2560/4000] [Steps  199] [reward 200.0]
[Episode 2570/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2575] [Average Reward 200.0]
----------
[Episode 2580/4000] [Steps  199] [reward 200.0]
[Episode 2590/4000] [Steps  199] [reward 200.0]
[Episode 2600/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2600] [Average Reward 200.0]
----------
[Episode 2610/4000] [Steps  199] [reward 200.0]
[Episode 2620/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2625] [Average Reward 200.0]
----------
[Episode 2630/4000] [Steps  199] [reward 200.0]
[Episode 2640/4000] [Steps  199] [reward 200.0]
[Episode 2650/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2650] [Average Reward 200.0]
----------
[Episode 2660/4000] [Steps  199] [reward 200.0]
[Episode 2670/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2675] [Average Reward 200.0]
----------
[Episode 2680/4000] [Steps  199] [reward 200.0]
[Episode 2690/4000] [Steps  199] [reward 200.0]
[Episode 2700/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2700] [Average Reward 200.0]
----------
[Episode 2710/4000] [Steps  199] [reward 200.0]
[Episode 2720/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2725] [Average Reward 200.0]
----------
[Episode 2730/4000] [Steps  199] [reward 200.0]
[Episode 2740/4000] [Steps  199] [reward 200.0]
[Episode 2750/4000] [Steps   21] [reward 22.0]
----------
[TEST Episode 2750] [Average Reward 200.0]
----------
[Episode 2760/4000] [Steps  199] [reward 200.0]
[Episode 2770/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2775] [Average Reward 200.0]
----------
[Episode 2780/4000] [Steps  199] [reward 200.0]
[Episode 2790/4000] [Steps  199] [reward 200.0]
[Episode 2800/4000] [Steps  152] [reward 153.0]
----------
[TEST Episode 2800] [Average Reward 200.0]
----------
[Episode 2810/4000] [Steps   35] [reward 36.0]
[Episode 2820/4000] [Steps  167] [reward 168.0]
----------
[TEST Episode 2825] [Average Reward 200.0]
----------
[Episode 2830/4000] [Steps  167] [reward 168.0]
[Episode 2840/4000] [Steps  199] [reward 200.0]
[Episode 2850/4000] [Steps   50] [reward 51.0]
----------
[TEST Episode 2850] [Average Reward 200.0]
----------
[Episode 2860/4000] [Steps   32] [reward 33.0]
[Episode 2870/4000] [Steps  142] [reward 143.0]
----------
[TEST Episode 2875] [Average Reward 200.0]
----------
[Episode 2880/4000] [Steps  199] [reward 200.0]
[Episode 2890/4000] [Steps  199] [reward 200.0]
[Episode 2900/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2900] [Average Reward 200.0]
----------
[Episode 2910/4000] [Steps  199] [reward 200.0]
[Episode 2920/4000] [Steps  119] [reward 120.0]
----------
[TEST Episode 2925] [Average Reward 200.0]
----------
[Episode 2930/4000] [Steps   11] [reward 12.0]
[Episode 2940/4000] [Steps  199] [reward 200.0]
[Episode 2950/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2950] [Average Reward 200.0]
----------
[Episode 2960/4000] [Steps  199] [reward 200.0]
[Episode 2970/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 2975] [Average Reward 200.0]
----------
[Episode 2980/4000] [Steps  199] [reward 200.0]
[Episode 2990/4000] [Steps  199] [reward 200.0]
[Episode 3000/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3000] [Average Reward 200.0]
----------
[Episode 3010/4000] [Steps  199] [reward 200.0]
[Episode 3020/4000] [Steps  148] [reward 149.0]
----------
[TEST Episode 3025] [Average Reward 200.0]
----------
[Episode 3030/4000] [Steps  199] [reward 200.0]
[Episode 3040/4000] [Steps  163] [reward 164.0]
[Episode 3050/4000] [Steps  131] [reward 132.0]
----------
[TEST Episode 3050] [Average Reward 133.1]
----------
[Episode 3060/4000] [Steps  116] [reward 117.0]
[Episode 3070/4000] [Steps   96] [reward 97.0]
----------
[TEST Episode 3075] [Average Reward 107.9]
----------
[Episode 3080/4000] [Steps  101] [reward 102.0]
[Episode 3090/4000] [Steps   26] [reward 27.0]
[Episode 3100/4000] [Steps  107] [reward 108.0]
----------
[TEST Episode 3100] [Average Reward 100.1]
----------
[Episode 3110/4000] [Steps  101] [reward 102.0]
[Episode 3120/4000] [Steps  103] [reward 104.0]
----------
[TEST Episode 3125] [Average Reward 108.2]
----------
[Episode 3130/4000] [Steps  104] [reward 105.0]
[Episode 3140/4000] [Steps  100] [reward 101.0]
[Episode 3150/4000] [Steps   23] [reward 24.0]
----------
[TEST Episode 3150] [Average Reward 100.4]
----------
[Episode 3160/4000] [Steps   98] [reward 99.0]
[Episode 3170/4000] [Steps   98] [reward 99.0]
----------
[TEST Episode 3175] [Average Reward 107.2]
----------
[Episode 3180/4000] [Steps  105] [reward 106.0]
[Episode 3190/4000] [Steps  112] [reward 113.0]
[Episode 3200/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3200] [Average Reward 151.4]
----------
[Episode 3210/4000] [Steps  156] [reward 157.0]
[Episode 3220/4000] [Steps   13] [reward 14.0]
----------
[TEST Episode 3225] [Average Reward 192.1]
----------
[Episode 3230/4000] [Steps  157] [reward 158.0]
[Episode 3240/4000] [Steps   52] [reward 53.0]
[Episode 3250/4000] [Steps  115] [reward 116.0]
----------
[TEST Episode 3250] [Average Reward 135.3]
----------
[Episode 3260/4000] [Steps  199] [reward 200.0]
[Episode 3270/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3275] [Average Reward 200.0]
----------
[Episode 3280/4000] [Steps  199] [reward 200.0]
[Episode 3290/4000] [Steps  188] [reward 189.0]
[Episode 3300/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3300] [Average Reward 200.0]
----------
[Episode 3310/4000] [Steps  199] [reward 200.0]
[Episode 3320/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3325] [Average Reward 199.9]
----------
[Episode 3330/4000] [Steps  100] [reward 101.0]
[Episode 3340/4000] [Steps  177] [reward 178.0]
[Episode 3350/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3350] [Average Reward 180.4]
----------
[Episode 3360/4000] [Steps  199] [reward 200.0]
[Episode 3370/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3375] [Average Reward 200.0]
----------
[Episode 3380/4000] [Steps  199] [reward 200.0]
[Episode 3390/4000] [Steps  199] [reward 200.0]
[Episode 3400/4000] [Steps  101] [reward 102.0]
----------
[TEST Episode 3400] [Average Reward 200.0]
----------
[Episode 3410/4000] [Steps  199] [reward 200.0]
[Episode 3420/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3425] [Average Reward 200.0]
----------
[Episode 3430/4000] [Steps  188] [reward 189.0]
[Episode 3440/4000] [Steps  199] [reward 200.0]
[Episode 3450/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3450] [Average Reward 200.0]
----------
[Episode 3460/4000] [Steps  199] [reward 200.0]
[Episode 3470/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3475] [Average Reward 200.0]
----------
[Episode 3480/4000] [Steps  199] [reward 200.0]
[Episode 3490/4000] [Steps  199] [reward 200.0]
[Episode 3500/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3500] [Average Reward 200.0]
----------
[Episode 3510/4000] [Steps  101] [reward 102.0]
[Episode 3520/4000] [Steps  125] [reward 126.0]
----------
[TEST Episode 3525] [Average Reward 179.5]
----------
[Episode 3530/4000] [Steps   82] [reward 83.0]
[Episode 3540/4000] [Steps  199] [reward 200.0]
[Episode 3550/4000] [Steps   92] [reward 93.0]
----------
[TEST Episode 3550] [Average Reward 200.0]
----------
[Episode 3560/4000] [Steps  199] [reward 200.0]
[Episode 3570/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3575] [Average Reward 200.0]
----------
[Episode 3580/4000] [Steps   83] [reward 84.0]
[Episode 3590/4000] [Steps   97] [reward 98.0]
[Episode 3600/4000] [Steps   34] [reward 35.0]
----------
[TEST Episode 3600] [Average Reward 200.0]
----------
[Episode 3610/4000] [Steps  181] [reward 182.0]
[Episode 3620/4000] [Steps  195] [reward 196.0]
----------
[TEST Episode 3625] [Average Reward 200.0]
----------
[Episode 3630/4000] [Steps  199] [reward 200.0]
[Episode 3640/4000] [Steps  199] [reward 200.0]
[Episode 3650/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3650] [Average Reward 200.0]
----------
[Episode 3660/4000] [Steps  175] [reward 176.0]
[Episode 3670/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3675] [Average Reward 127.6]
----------
[Episode 3680/4000] [Steps  199] [reward 200.0]
[Episode 3690/4000] [Steps  199] [reward 200.0]
[Episode 3700/4000] [Steps   71] [reward 72.0]
----------
[TEST Episode 3700] [Average Reward 127.4]
----------
[Episode 3710/4000] [Steps  199] [reward 200.0]
[Episode 3720/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3725] [Average Reward 200.0]
----------
[Episode 3730/4000] [Steps  199] [reward 200.0]
[Episode 3740/4000] [Steps  199] [reward 200.0]
[Episode 3750/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3750] [Average Reward 200.0]
----------
[Episode 3760/4000] [Steps  199] [reward 200.0]
[Episode 3770/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3775] [Average Reward 200.0]
----------
[Episode 3780/4000] [Steps  135] [reward 136.0]
[Episode 3790/4000] [Steps  151] [reward 152.0]
[Episode 3800/4000] [Steps   81] [reward 82.0]
----------
[TEST Episode 3800] [Average Reward 184.9]
----------
[Episode 3810/4000] [Steps  103] [reward 104.0]
[Episode 3820/4000] [Steps   17] [reward 18.0]
----------
[TEST Episode 3825] [Average Reward 150.0]
----------
[Episode 3830/4000] [Steps   56] [reward 57.0]
[Episode 3840/4000] [Steps   14] [reward 15.0]
[Episode 3850/4000] [Steps  173] [reward 174.0]
----------
[TEST Episode 3850] [Average Reward 158.7]
----------
[Episode 3860/4000] [Steps  171] [reward 172.0]
[Episode 3870/4000] [Steps  156] [reward 157.0]
----------
[TEST Episode 3875] [Average Reward 160.0]
----------
[Episode 3880/4000] [Steps   45] [reward 46.0]
[Episode 3890/4000] [Steps  148] [reward 149.0]
[Episode 3900/4000] [Steps  158] [reward 159.0]
----------
[TEST Episode 3900] [Average Reward 178.0]
----------
[Episode 3910/4000] [Steps  199] [reward 200.0]
[Episode 3920/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3925] [Average Reward 200.0]
----------
[Episode 3930/4000] [Steps  173] [reward 174.0]
[Episode 3940/4000] [Steps  199] [reward 200.0]
[Episode 3950/4000] [Steps  199] [reward 200.0]
----------
[TEST Episode 3950] [Average Reward 200.0]
----------
[Episode 3960/4000] [Steps  199] [reward 200.0]
[Episode 3970/4000] [Steps   94] [reward 95.0]
----------
[TEST Episode 3975] [Average Reward 200.0]
----------
[Episode 3980/4000] [Steps   54] [reward 55.0]
[Episode 3990/4000] [Steps   22] [reward 23.0]
[Episode 4000/4000] [Steps   20] [reward 21.0]
----------
[TEST Episode 4000] [Average Reward 200.0]
----------


## Task 3: Extra

Ideas to experiment with:

- Is $\epsilon$-greedy strategy the best strategy available? Why not trying something different.
- Why not make use of the model you have trained in the behavioral cloning part and fine-tune it with RL? How does that affect performance?
- You are perhaps bored with `CartPole-v0` by now. Another environment we suggest trying is `LunarLander-v2`. It will be harder to learn but with experimentation, you will find the correct optimizations for success. Piazza is also your friend :)
- What about learning from images? This requires more work because you have to extract the image from the environment. However, would it be possible? How much more challenging might you expect the learning to be in this case?
- The ReplayBuffer implementation provided is very simple. In class we have briefly mentioned Prioritized Experience Replay; how would the learning process change?
- An improvement over DQN is DoubleDQN, which is a very simple addition to the current code.



In [13]:
# YOU CAN USE THIS CODEBLOCK AND ADD ANY BLOCK BELOW AS YOU NEED
# TO SHOW US THE IDEAS AND EXTRA EXPERIMENTS YOU RUN.
# HAVE FUN!