# Continuous Control

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the second project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [None]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Reacher.app"`
- **Windows** (x86): `"path/to/Reacher_Windows_x86/Reacher.exe"`
- **Windows** (x86_64): `"path/to/Reacher_Windows_x86_64/Reacher.exe"`
- **Linux** (x86): `"path/to/Reacher_Linux/Reacher.x86"`
- **Linux** (x86_64): `"path/to/Reacher_Linux/Reacher.x86_64"`
- **Linux** (x86, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86"`
- **Linux** (x86_64, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86_64"`

For instance, if you are using a Mac, then you downloaded `Reacher.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Reacher.app")
```

In [None]:
env = UnityEnvironment(file_name='Reacher_Linux_20agents/Reacher.x86_64')

Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [None]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, a double-jointed arm can move to target locations. A reward of `+0.1` is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of `33` variables corresponding to position, rotation, velocity, and angular velocities of the arm.  Each action is a vector with four numbers, corresponding to torque applicable to two joints.  Every entry in the action vector must be a number between `-1` and `1`.

Run the code cell below to print some information about the environment.

In [None]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [None]:
env_info = env.reset(train_mode=False)[brain_name]     # reset the environment    
states = env_info.vector_observations                  # get the current state (for each agent)
scores = np.zeros(num_agents)                          # initialize the score (for each agent)
while True:
    actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
    actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
    env_info = env.step(actions)[brain_name]           # send all actions to tne environment
    next_states = env_info.vector_observations         # get next state (for each agent)
    rewards = env_info.rewards                         # get reward (for each agent)
    dones = env_info.local_done                        # see if episode finished
    scores += env_info.rewards                         # update the score (for each agent)
    states = next_states                               # roll over states to next time step
    if np.any(dones):                                  # exit loop if episode finished
        break
print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

When finished, you can close the environment.

In [None]:
env.close()

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import torch
from unityagents import UnityEnvironment
import ppo
import ppo_agent
import utils

In [2]:
env = UnityEnvironment(file_name="Reacher_Linux_20agents/Reacher.x86_64")
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


In [3]:
agent = ppo_agent.Agent(state_size=len(env_info.vector_observations[0]),
                        action_size=brain.vector_action_space_size,
                        hidden_sizes=[512, 1024],
                        seed=237)

In [4]:
mean_rewards = ppo.train_ppo(env, agent, report_every=10)

  0%|          | 10/10000 [01:46<29:26:26, 10.61s/it]

Episode: 10, score: 1.518500, window mean: 1.286850


  0%|          | 20/10000 [03:40<31:13:34, 11.26s/it]

Episode: 20, score: 1.037000, window mean: 1.264650


  0%|          | 30/10000 [05:34<31:56:49, 11.54s/it]

Episode: 30, score: 1.151500, window mean: 1.285600


  0%|          | 40/10000 [07:25<30:27:46, 11.01s/it]

Episode: 40, score: 1.083000, window mean: 1.268512


  0%|          | 50/10000 [09:20<31:37:38, 11.44s/it]

Episode: 50, score: 1.194000, window mean: 1.227930


  1%|          | 60/10000 [11:16<31:34:43, 11.44s/it]

Episode: 60, score: 0.981000, window mean: 1.200325


  1%|          | 70/10000 [13:12<33:27:54, 12.13s/it]

Episode: 70, score: 0.966500, window mean: 1.179607


  1%|          | 80/10000 [15:14<34:21:51, 12.47s/it]

Episode: 80, score: 1.078500, window mean: 1.153906


  1%|          | 90/10000 [17:03<30:00:02, 10.90s/it]

Episode: 90, score: 1.131000, window mean: 1.137022


  1%|          | 100/10000 [18:57<32:14:30, 11.72s/it]

Episode: 100, score: 0.852500, window mean: 1.126685


  1%|          | 110/10000 [20:41<28:29:53, 10.37s/it]

Episode: 110, score: 1.105500, window mean: 1.092440


  1%|          | 120/10000 [22:25<28:24:21, 10.35s/it]

Episode: 120, score: 0.828500, window mean: 1.069570


  1%|▏         | 130/10000 [24:08<28:21:23, 10.34s/it]

Episode: 130, score: 0.955500, window mean: 1.040575


  1%|▏         | 140/10000 [25:52<28:17:29, 10.33s/it]

Episode: 140, score: 1.273500, window mean: 1.038965


  2%|▏         | 150/10000 [27:36<28:13:58, 10.32s/it]

Episode: 150, score: 0.827500, window mean: 1.030440


  2%|▏         | 160/10000 [29:19<28:25:57, 10.40s/it]

Episode: 160, score: 0.967000, window mean: 1.046390


  2%|▏         | 170/10000 [31:03<28:21:55, 10.39s/it]

Episode: 170, score: 1.281000, window mean: 1.056430


  2%|▏         | 180/10000 [32:46<28:04:56, 10.30s/it]

Episode: 180, score: 1.149500, window mean: 1.079875


  2%|▏         | 190/10000 [34:29<28:06:33, 10.32s/it]

Episode: 190, score: 1.173000, window mean: 1.107400


  2%|▏         | 200/10000 [36:13<28:06:58, 10.33s/it]

Episode: 200, score: 1.510500, window mean: 1.128975


  2%|▏         | 210/10000 [37:56<28:03:32, 10.32s/it]

Episode: 210, score: 1.359000, window mean: 1.171330


  2%|▏         | 220/10000 [39:40<28:00:12, 10.31s/it]

Episode: 220, score: 1.588500, window mean: 1.208515


  2%|▏         | 230/10000 [41:23<28:06:25, 10.36s/it]

Episode: 230, score: 1.185000, window mean: 1.231045


  2%|▏         | 240/10000 [43:06<28:08:28, 10.38s/it]

Episode: 240, score: 1.300000, window mean: 1.242040


  2%|▎         | 250/10000 [44:53<30:25:55, 11.24s/it]

Episode: 250, score: 1.611000, window mean: 1.283035


  3%|▎         | 260/10000 [46:38<28:01:54, 10.36s/it]

Episode: 260, score: 1.435500, window mean: 1.301015


  3%|▎         | 270/10000 [48:22<27:59:44, 10.36s/it]

Episode: 270, score: 1.304500, window mean: 1.319230


  3%|▎         | 280/10000 [50:06<27:48:31, 10.30s/it]

Episode: 280, score: 1.641000, window mean: 1.335710


  3%|▎         | 290/10000 [51:51<28:46:33, 10.67s/it]

Episode: 290, score: 1.311000, window mean: 1.339380


  3%|▎         | 300/10000 [53:38<28:22:36, 10.53s/it]

Episode: 300, score: 1.473000, window mean: 1.351875


  3%|▎         | 310/10000 [55:22<27:48:41, 10.33s/it]

Episode: 310, score: 1.493000, window mean: 1.354950


  3%|▎         | 320/10000 [57:10<29:37:58, 11.02s/it]

Episode: 320, score: 1.417000, window mean: 1.363345


  3%|▎         | 330/10000 [59:00<28:58:49, 10.79s/it]

Episode: 330, score: 1.325000, window mean: 1.368295


  3%|▎         | 340/10000 [1:00:43<27:33:43, 10.27s/it]

Episode: 340, score: 1.313000, window mean: 1.371920


  4%|▎         | 350/10000 [1:02:25<27:12:38, 10.15s/it]

Episode: 350, score: 1.669500, window mean: 1.389890


  4%|▎         | 360/10000 [1:04:07<27:04:45, 10.11s/it]

Episode: 360, score: 1.394000, window mean: 1.413720


  4%|▎         | 370/10000 [1:05:53<28:27:44, 10.64s/it]

Episode: 370, score: 2.166500, window mean: 1.437465


  4%|▍         | 380/10000 [1:07:46<30:13:36, 11.31s/it]

Episode: 380, score: 1.352000, window mean: 1.462970


  4%|▍         | 390/10000 [1:09:38<28:54:44, 10.83s/it]

Episode: 390, score: 1.688500, window mean: 1.497680


  4%|▍         | 400/10000 [1:11:34<30:06:01, 11.29s/it]

Episode: 400, score: 1.907000, window mean: 1.525925


  4%|▍         | 410/10000 [1:13:28<30:53:47, 11.60s/it]

Episode: 410, score: 1.974000, window mean: 1.544510


  4%|▍         | 420/10000 [1:15:29<32:22:08, 12.16s/it]

Episode: 420, score: 1.611500, window mean: 1.554000


  4%|▍         | 430/10000 [1:17:17<27:54:05, 10.50s/it]

Episode: 430, score: 1.817500, window mean: 1.589540


  4%|▍         | 440/10000 [1:18:59<27:09:53, 10.23s/it]

Episode: 440, score: 1.245000, window mean: 1.608700


  4%|▍         | 450/10000 [1:20:40<26:50:08, 10.12s/it]

Episode: 450, score: 2.227000, window mean: 1.624820


  5%|▍         | 460/10000 [1:22:22<27:00:11, 10.19s/it]

Episode: 460, score: 1.763000, window mean: 1.642580


  5%|▍         | 470/10000 [1:24:04<26:46:10, 10.11s/it]

Episode: 470, score: 1.322500, window mean: 1.670560


  5%|▍         | 480/10000 [1:25:45<26:49:24, 10.14s/it]

Episode: 480, score: 1.797500, window mean: 1.712710


  5%|▍         | 490/10000 [1:27:27<27:01:07, 10.23s/it]

Episode: 490, score: 2.777000, window mean: 1.747045


  5%|▌         | 500/10000 [1:29:08<26:58:44, 10.22s/it]

Episode: 500, score: 1.757500, window mean: 1.773520


  5%|▌         | 510/10000 [1:30:49<26:40:04, 10.12s/it]

Episode: 510, score: 1.778000, window mean: 1.809890


  5%|▌         | 520/10000 [1:32:31<26:45:45, 10.16s/it]

Episode: 520, score: 1.920000, window mean: 1.839365


  5%|▌         | 530/10000 [1:34:13<26:38:33, 10.13s/it]

Episode: 530, score: 1.440500, window mean: 1.851680


  5%|▌         | 540/10000 [1:35:54<26:42:04, 10.16s/it]

Episode: 540, score: 1.509500, window mean: 1.890970


  6%|▌         | 550/10000 [1:37:36<26:36:19, 10.14s/it]

Episode: 550, score: 2.156000, window mean: 1.908175


  6%|▌         | 560/10000 [1:39:18<26:49:19, 10.23s/it]

Episode: 560, score: 1.951500, window mean: 1.920580


  6%|▌         | 570/10000 [1:40:59<26:49:22, 10.24s/it]

Episode: 570, score: 1.312500, window mean: 1.918830


  6%|▌         | 580/10000 [1:42:41<26:40:51, 10.20s/it]

Episode: 580, score: 1.637000, window mean: 1.884935


  6%|▌         | 590/10000 [1:44:23<26:35:15, 10.17s/it]

Episode: 590, score: 1.360500, window mean: 1.842195


  6%|▌         | 600/10000 [1:46:11<27:02:40, 10.36s/it]

Episode: 600, score: 1.491500, window mean: 1.804405


  6%|▌         | 610/10000 [1:47:54<26:46:29, 10.27s/it]

Episode: 610, score: 1.164500, window mean: 1.758885


  6%|▌         | 620/10000 [1:49:36<26:44:31, 10.26s/it]

Episode: 620, score: 1.356000, window mean: 1.711440


  6%|▋         | 630/10000 [1:51:19<26:34:39, 10.21s/it]

Episode: 630, score: 1.344000, window mean: 1.688370


  6%|▋         | 640/10000 [1:53:01<26:53:18, 10.34s/it]

Episode: 640, score: 1.162000, window mean: 1.637470


  6%|▋         | 650/10000 [1:54:43<26:29:27, 10.20s/it]

Episode: 650, score: 1.529500, window mean: 1.596245


  7%|▋         | 660/10000 [1:56:26<26:32:20, 10.23s/it]

Episode: 660, score: 1.750500, window mean: 1.552625


  7%|▋         | 670/10000 [1:58:09<26:32:51, 10.24s/it]

Episode: 670, score: 1.309500, window mean: 1.508700


  7%|▋         | 680/10000 [1:59:52<26:36:05, 10.28s/it]

Episode: 680, score: 1.378000, window mean: 1.487335


  7%|▋         | 690/10000 [2:01:34<26:33:45, 10.27s/it]

Episode: 690, score: 1.595000, window mean: 1.480255


  7%|▋         | 700/10000 [2:03:16<26:32:50, 10.28s/it]

Episode: 700, score: 1.579000, window mean: 1.482405


  7%|▋         | 710/10000 [2:04:59<26:55:31, 10.43s/it]

Episode: 710, score: 1.511000, window mean: 1.486015


  7%|▋         | 720/10000 [2:06:43<26:25:54, 10.25s/it]

Episode: 720, score: 1.512500, window mean: 1.502495


  7%|▋         | 730/10000 [2:08:25<26:14:49, 10.19s/it]

Episode: 730, score: 1.881000, window mean: 1.514875


  7%|▋         | 740/10000 [2:10:08<26:15:31, 10.21s/it]

Episode: 740, score: 1.848500, window mean: 1.539040


  8%|▊         | 750/10000 [2:11:50<26:18:58, 10.24s/it]

Episode: 750, score: 1.852000, window mean: 1.548070


  8%|▊         | 760/10000 [2:13:32<26:05:27, 10.17s/it]

Episode: 760, score: 1.714500, window mean: 1.557370


  8%|▊         | 770/10000 [2:15:20<28:19:46, 11.05s/it]

Episode: 770, score: 2.007500, window mean: 1.602135


  8%|▊         | 780/10000 [2:17:02<26:21:01, 10.29s/it]

Episode: 780, score: 1.436500, window mean: 1.620785


  8%|▊         | 790/10000 [2:18:44<26:08:57, 10.22s/it]

Episode: 790, score: 1.423500, window mean: 1.650520


  8%|▊         | 800/10000 [2:20:27<26:06:24, 10.22s/it]

Episode: 800, score: 1.967500, window mean: 1.693135


  8%|▊         | 810/10000 [2:22:09<26:10:38, 10.25s/it]

Episode: 810, score: 2.296000, window mean: 1.738160


  8%|▊         | 820/10000 [2:23:52<26:06:18, 10.24s/it]

Episode: 820, score: 1.638000, window mean: 1.769385


  8%|▊         | 830/10000 [2:25:34<26:05:35, 10.24s/it]

Episode: 830, score: 1.663500, window mean: 1.778105


  8%|▊         | 840/10000 [2:27:17<26:11:58, 10.30s/it]

Episode: 840, score: 1.620500, window mean: 1.782055


  8%|▊         | 850/10000 [2:29:00<26:15:09, 10.33s/it]

Episode: 850, score: 1.829500, window mean: 1.797790


  9%|▊         | 860/10000 [2:30:42<25:56:29, 10.22s/it]

Episode: 860, score: 1.546500, window mean: 1.827385


  9%|▊         | 870/10000 [2:32:24<25:51:29, 10.20s/it]

Episode: 870, score: 1.751000, window mean: 1.824375


  9%|▉         | 880/10000 [2:34:06<25:51:17, 10.21s/it]

Episode: 880, score: 1.643500, window mean: 1.845185


  9%|▉         | 890/10000 [2:35:49<25:55:21, 10.24s/it]

Episode: 890, score: 2.180500, window mean: 1.867800


  9%|▉         | 900/10000 [2:37:31<25:56:35, 10.26s/it]

Episode: 900, score: 1.877500, window mean: 1.867870


  9%|▉         | 910/10000 [2:39:14<25:54:55, 10.26s/it]

Episode: 910, score: 2.079000, window mean: 1.864130


  9%|▉         | 920/10000 [2:40:56<25:43:14, 10.20s/it]

Episode: 920, score: 1.476500, window mean: 1.868825


  9%|▉         | 930/10000 [2:42:38<25:43:57, 10.21s/it]

Episode: 930, score: 1.365500, window mean: 1.882670


  9%|▉         | 940/10000 [2:44:21<25:34:24, 10.16s/it]

Episode: 940, score: 1.916000, window mean: 1.911630


 10%|▉         | 950/10000 [2:46:08<26:04:17, 10.37s/it]

Episode: 950, score: 1.884000, window mean: 1.927285


 10%|▉         | 960/10000 [2:47:51<25:53:57, 10.31s/it]

Episode: 960, score: 2.351000, window mean: 1.937310


 10%|▉         | 970/10000 [2:49:34<25:44:23, 10.26s/it]

Episode: 970, score: 1.977500, window mean: 1.981815


 10%|▉         | 980/10000 [2:51:16<25:49:26, 10.31s/it]

Episode: 980, score: 2.108000, window mean: 2.002050


 10%|▉         | 990/10000 [2:52:59<25:54:03, 10.35s/it]

Episode: 990, score: 1.821500, window mean: 2.022240


 10%|█         | 1000/10000 [2:54:41<25:33:11, 10.22s/it]

Episode: 1000, score: 2.102500, window mean: 2.038700


 10%|█         | 1010/10000 [2:56:24<25:39:04, 10.27s/it]

Episode: 1010, score: 1.965000, window mean: 2.063765


 10%|█         | 1020/10000 [2:58:06<25:25:59, 10.20s/it]

Episode: 1020, score: 2.867500, window mean: 2.093195


 10%|█         | 1030/10000 [2:59:48<25:25:11, 10.20s/it]

Episode: 1030, score: 2.239000, window mean: 2.122925


 10%|█         | 1040/10000 [3:01:31<25:34:44, 10.28s/it]

Episode: 1040, score: 2.570500, window mean: 2.140380


 10%|█         | 1050/10000 [3:03:13<25:29:37, 10.25s/it]

Episode: 1050, score: 1.654000, window mean: 2.148470


 11%|█         | 1060/10000 [3:04:55<25:23:55, 10.23s/it]

Episode: 1060, score: 2.571000, window mean: 2.170390


 11%|█         | 1070/10000 [3:06:37<25:21:17, 10.22s/it]

Episode: 1070, score: 1.985000, window mean: 2.143025


 11%|█         | 1080/10000 [3:08:20<25:17:26, 10.21s/it]

Episode: 1080, score: 1.826000, window mean: 2.124910


 11%|█         | 1090/10000 [3:10:02<25:15:27, 10.21s/it]

Episode: 1090, score: 2.666000, window mean: 2.100255


 11%|█         | 1100/10000 [3:11:44<25:16:55, 10.23s/it]

Episode: 1100, score: 1.792000, window mean: 2.069890


 11%|█         | 1110/10000 [3:13:27<25:20:56, 10.27s/it]

Episode: 1110, score: 1.587500, window mean: 2.028640


 11%|█         | 1120/10000 [3:15:15<27:46:20, 11.26s/it]

Episode: 1120, score: 1.602500, window mean: 1.983725


 11%|█▏        | 1130/10000 [3:16:57<25:13:35, 10.24s/it]

Episode: 1130, score: 1.774500, window mean: 1.952215


 11%|█▏        | 1140/10000 [3:18:40<25:14:47, 10.26s/it]

Episode: 1140, score: 1.849000, window mean: 1.926975


 12%|█▏        | 1150/10000 [3:20:22<25:03:59, 10.20s/it]

Episode: 1150, score: 2.212500, window mean: 1.946230


 12%|█▏        | 1160/10000 [3:22:04<25:03:28, 10.20s/it]

Episode: 1160, score: 1.726500, window mean: 1.924380


 12%|█▏        | 1170/10000 [3:23:47<25:03:32, 10.22s/it]

Episode: 1170, score: 2.187000, window mean: 1.919865


 12%|█▏        | 1180/10000 [3:25:29<25:03:09, 10.23s/it]

Episode: 1180, score: 2.058000, window mean: 1.917580


 12%|█▏        | 1190/10000 [3:27:12<25:11:03, 10.29s/it]

Episode: 1190, score: 1.535000, window mean: 1.909465


 12%|█▏        | 1200/10000 [3:28:54<25:00:54, 10.23s/it]

Episode: 1200, score: 1.958500, window mean: 1.899345


 12%|█▏        | 1210/10000 [3:30:36<24:59:12, 10.23s/it]

Episode: 1210, score: 1.909000, window mean: 1.909445


 12%|█▏        | 1220/10000 [3:32:19<24:54:33, 10.21s/it]

Episode: 1220, score: 1.843000, window mean: 1.919115


 12%|█▏        | 1230/10000 [3:34:01<24:53:57, 10.22s/it]

Episode: 1230, score: 1.639000, window mean: 1.922410


 12%|█▏        | 1240/10000 [3:35:44<24:56:48, 10.25s/it]

Episode: 1240, score: 2.087500, window mean: 1.920490


 12%|█▎        | 1250/10000 [3:37:27<24:54:38, 10.25s/it]

Episode: 1250, score: 2.048000, window mean: 1.885370


 13%|█▎        | 1260/10000 [3:39:09<24:54:39, 10.26s/it]

Episode: 1260, score: 1.927500, window mean: 1.875870


 13%|█▎        | 1270/10000 [3:40:51<24:45:36, 10.21s/it]

Episode: 1270, score: 1.697000, window mean: 1.870865


 13%|█▎        | 1280/10000 [3:42:33<24:39:27, 10.18s/it]

Episode: 1280, score: 1.304500, window mean: 1.870585


 13%|█▎        | 1290/10000 [3:44:15<24:39:18, 10.19s/it]

Episode: 1290, score: 2.180000, window mean: 1.877490


 13%|█▎        | 1300/10000 [3:46:03<25:08:02, 10.40s/it]

Episode: 1300, score: 2.218000, window mean: 1.903775


 13%|█▎        | 1310/10000 [3:47:45<24:39:26, 10.21s/it]

Episode: 1310, score: 2.231500, window mean: 1.916730


 13%|█▎        | 1320/10000 [3:49:28<24:43:43, 10.26s/it]

Episode: 1320, score: 1.804500, window mean: 1.924295


 13%|█▎        | 1330/10000 [3:51:10<24:48:31, 10.30s/it]

Episode: 1330, score: 2.077500, window mean: 1.933395


 13%|█▎        | 1340/10000 [3:52:52<24:34:58, 10.22s/it]

Episode: 1340, score: 2.002500, window mean: 1.924975


 14%|█▎        | 1350/10000 [3:54:35<24:31:00, 10.20s/it]

Episode: 1350, score: 1.588000, window mean: 1.924495


 14%|█▎        | 1360/10000 [3:56:17<24:32:10, 10.22s/it]

Episode: 1360, score: 1.477000, window mean: 1.915935


 14%|█▎        | 1370/10000 [3:58:00<24:26:44, 10.20s/it]

Episode: 1370, score: 2.044500, window mean: 1.910550


 14%|█▍        | 1380/10000 [3:59:42<24:32:32, 10.25s/it]

Episode: 1380, score: 1.862500, window mean: 1.934920


 14%|█▍        | 1390/10000 [4:01:25<24:41:02, 10.32s/it]

Episode: 1390, score: 2.148500, window mean: 1.926475


 14%|█▍        | 1400/10000 [4:03:07<24:38:22, 10.31s/it]

Episode: 1400, score: 2.111000, window mean: 1.932690


 14%|█▍        | 1410/10000 [4:04:50<24:28:42, 10.26s/it]

Episode: 1410, score: 2.239000, window mean: 1.925185


 14%|█▍        | 1420/10000 [4:06:32<24:16:44, 10.19s/it]

Episode: 1420, score: 2.210000, window mean: 1.916190


 14%|█▍        | 1430/10000 [4:08:15<24:18:19, 10.21s/it]

Episode: 1430, score: 2.404500, window mean: 1.921800


 14%|█▍        | 1440/10000 [4:09:57<24:16:43, 10.21s/it]

Episode: 1440, score: 1.643000, window mean: 1.929630


 14%|█▍        | 1450/10000 [4:11:39<24:16:40, 10.22s/it]

Episode: 1450, score: 1.393500, window mean: 1.948910


 15%|█▍        | 1460/10000 [4:13:22<24:21:57, 10.27s/it]

Episode: 1460, score: 1.992000, window mean: 1.975335


 15%|█▍        | 1470/10000 [4:15:10<27:01:05, 11.40s/it]

Episode: 1470, score: 2.474500, window mean: 2.001545


 15%|█▍        | 1480/10000 [4:16:52<24:20:13, 10.28s/it]

Episode: 1480, score: 1.799000, window mean: 1.983755


 15%|█▍        | 1490/10000 [4:18:34<24:13:43, 10.25s/it]

Episode: 1490, score: 1.374500, window mean: 1.974630


 15%|█▌        | 1500/10000 [4:20:17<24:04:01, 10.19s/it]

Episode: 1500, score: 1.371000, window mean: 1.961545


 15%|█▌        | 1510/10000 [4:21:59<24:08:57, 10.24s/it]

Episode: 1510, score: 1.950000, window mean: 1.975330


 15%|█▌        | 1520/10000 [4:23:41<24:04:21, 10.22s/it]

Episode: 1520, score: 2.073000, window mean: 1.988220


 15%|█▌        | 1530/10000 [4:25:24<24:14:02, 10.30s/it]

Episode: 1530, score: 1.658000, window mean: 1.991190


 15%|█▌        | 1540/10000 [4:27:07<24:16:40, 10.33s/it]

Episode: 1540, score: 1.613000, window mean: 2.003615


 16%|█▌        | 1550/10000 [4:28:49<23:59:42, 10.22s/it]

Episode: 1550, score: 1.896000, window mean: 2.006230


 16%|█▌        | 1560/10000 [4:30:32<23:57:41, 10.22s/it]

Episode: 1560, score: 3.093000, window mean: 2.012495


 16%|█▌        | 1570/10000 [4:32:14<23:56:49, 10.23s/it]

Episode: 1570, score: 2.387500, window mean: 2.005335


 16%|█▌        | 1580/10000 [4:33:57<23:57:09, 10.24s/it]

Episode: 1580, score: 2.289000, window mean: 2.027930


 16%|█▌        | 1590/10000 [4:35:39<23:53:16, 10.23s/it]

Episode: 1590, score: 2.208000, window mean: 2.064315


 16%|█▌        | 1600/10000 [4:37:22<23:55:30, 10.25s/it]

Episode: 1600, score: 2.081000, window mean: 2.086225


 16%|█▌        | 1610/10000 [4:39:04<24:00:47, 10.30s/it]

Episode: 1610, score: 1.919000, window mean: 2.090055


 16%|█▌        | 1620/10000 [4:40:46<23:42:22, 10.18s/it]

Episode: 1620, score: 1.535500, window mean: 2.092280


 16%|█▋        | 1630/10000 [4:42:29<23:48:52, 10.24s/it]

Episode: 1630, score: 2.294000, window mean: 2.084900


 16%|█▋        | 1640/10000 [4:44:11<23:41:29, 10.20s/it]

Episode: 1640, score: 2.343500, window mean: 2.089585


 16%|█▋        | 1650/10000 [4:45:59<24:10:38, 10.42s/it]

Episode: 1650, score: 1.673500, window mean: 2.087950


 17%|█▋        | 1660/10000 [4:47:41<23:36:48, 10.19s/it]

Episode: 1660, score: 1.847500, window mean: 2.081925


 17%|█▋        | 1670/10000 [4:49:24<23:39:23, 10.22s/it]

Episode: 1670, score: 2.322000, window mean: 2.077300


 17%|█▋        | 1680/10000 [4:51:06<23:57:34, 10.37s/it]

Episode: 1680, score: 1.616000, window mean: 2.063885


 17%|█▋        | 1690/10000 [4:52:49<23:35:24, 10.22s/it]

Episode: 1690, score: 2.183500, window mean: 2.071545


 17%|█▋        | 1700/10000 [4:54:31<23:37:16, 10.25s/it]

Episode: 1700, score: 1.867000, window mean: 2.053445


 17%|█▋        | 1710/10000 [4:56:14<23:29:16, 10.20s/it]

Episode: 1710, score: 2.289500, window mean: 2.052830


 17%|█▋        | 1720/10000 [4:57:56<23:27:14, 10.20s/it]

Episode: 1720, score: 1.780000, window mean: 2.067595


 17%|█▋        | 1730/10000 [4:59:38<23:27:40, 10.21s/it]

Episode: 1730, score: 2.102500, window mean: 2.071210


 17%|█▋        | 1740/10000 [5:01:20<23:30:38, 10.25s/it]

Episode: 1740, score: 2.124000, window mean: 2.054200


 18%|█▊        | 1750/10000 [5:03:03<23:39:43, 10.33s/it]

Episode: 1750, score: 2.033000, window mean: 2.046925


 18%|█▊        | 1760/10000 [5:04:45<23:18:11, 10.18s/it]

Episode: 1760, score: 2.046500, window mean: 2.037935


 18%|█▊        | 1770/10000 [5:06:27<23:19:03, 10.20s/it]

Episode: 1770, score: 2.945500, window mean: 2.068980


 18%|█▊        | 1780/10000 [5:08:09<23:15:33, 10.19s/it]

Episode: 1780, score: 1.801000, window mean: 2.068030


 18%|█▊        | 1790/10000 [5:09:52<23:19:24, 10.23s/it]

Episode: 1790, score: 1.771500, window mean: 2.047600


 18%|█▊        | 1800/10000 [5:11:34<23:14:56, 10.21s/it]

Episode: 1800, score: 2.288500, window mean: 2.063655


 18%|█▊        | 1810/10000 [5:13:16<23:19:32, 10.25s/it]

Episode: 1810, score: 1.752000, window mean: 2.076260


 18%|█▊        | 1820/10000 [5:15:04<26:37:46, 11.72s/it]

Episode: 1820, score: 2.051500, window mean: 2.070385


 18%|█▊        | 1830/10000 [5:16:46<23:19:50, 10.28s/it]

Episode: 1830, score: 1.720500, window mean: 2.073995


 18%|█▊        | 1840/10000 [5:18:29<23:22:45, 10.31s/it]

Episode: 1840, score: 2.610000, window mean: 2.114610


 18%|█▊        | 1850/10000 [5:20:11<23:06:56, 10.21s/it]

Episode: 1850, score: 2.219000, window mean: 2.135175


 19%|█▊        | 1860/10000 [5:21:54<23:10:14, 10.25s/it]

Episode: 1860, score: 2.232000, window mean: 2.137205


 19%|█▊        | 1870/10000 [5:23:36<23:09:23, 10.25s/it]

Episode: 1870, score: 2.151000, window mean: 2.124025


 19%|█▉        | 1880/10000 [5:25:19<23:11:28, 10.28s/it]

Episode: 1880, score: 1.472000, window mean: 2.118405


 19%|█▉        | 1890/10000 [5:27:01<23:22:54, 10.38s/it]

Episode: 1890, score: 2.097500, window mean: 2.133025


 19%|█▉        | 1900/10000 [5:28:44<22:58:20, 10.21s/it]

Episode: 1900, score: 2.231000, window mean: 2.135920


 19%|█▉        | 1910/10000 [5:30:26<22:54:15, 10.19s/it]

Episode: 1910, score: 2.120500, window mean: 2.135370


 19%|█▉        | 1920/10000 [5:32:08<22:51:46, 10.19s/it]

Episode: 1920, score: 2.258500, window mean: 2.129770


 19%|█▉        | 1930/10000 [5:33:51<22:58:44, 10.25s/it]

Episode: 1930, score: 2.276500, window mean: 2.119775


 19%|█▉        | 1940/10000 [5:35:33<22:54:05, 10.23s/it]

Episode: 1940, score: 1.646500, window mean: 2.100555


 20%|█▉        | 1950/10000 [5:37:15<22:55:29, 10.25s/it]

Episode: 1950, score: 1.799500, window mean: 2.079885


 20%|█▉        | 1960/10000 [5:38:57<22:47:41, 10.21s/it]

Episode: 1960, score: 2.055000, window mean: 2.092165


 20%|█▉        | 1970/10000 [5:40:40<22:45:24, 10.20s/it]

Episode: 1970, score: 2.019500, window mean: 2.089380


 20%|█▉        | 1980/10000 [5:42:22<22:44:56, 10.21s/it]

Episode: 1980, score: 1.690000, window mean: 2.097615


 20%|█▉        | 1990/10000 [5:44:05<22:46:17, 10.23s/it]

Episode: 1990, score: 2.301500, window mean: 2.099790


 20%|██        | 2000/10000 [5:45:53<23:20:24, 10.50s/it]

Episode: 2000, score: 2.101500, window mean: 2.098200


 20%|██        | 2010/10000 [5:47:35<22:45:03, 10.25s/it]

Episode: 2010, score: 2.438500, window mean: 2.100225


 20%|██        | 2020/10000 [5:49:18<22:53:41, 10.33s/it]

Episode: 2020, score: 2.070500, window mean: 2.111815


 20%|██        | 2030/10000 [5:51:01<22:48:46, 10.30s/it]

Episode: 2030, score: 1.960500, window mean: 2.114380


 20%|██        | 2040/10000 [5:52:43<22:34:57, 10.21s/it]

Episode: 2040, score: 2.207000, window mean: 2.118795


 20%|██        | 2050/10000 [5:54:25<22:33:30, 10.22s/it]

Episode: 2050, score: 2.032500, window mean: 2.132070


 21%|██        | 2060/10000 [5:56:07<22:27:49, 10.19s/it]

Episode: 2060, score: 2.191000, window mean: 2.128460


 21%|██        | 2070/10000 [5:57:50<22:27:50, 10.20s/it]

Episode: 2070, score: 2.354000, window mean: 2.112005


 21%|██        | 2080/10000 [5:59:32<22:34:38, 10.26s/it]

Episode: 2080, score: 2.449500, window mean: 2.118330


 21%|██        | 2090/10000 [6:01:15<22:33:24, 10.27s/it]

Episode: 2090, score: 1.795000, window mean: 2.097735


 21%|██        | 2100/10000 [6:02:56<22:21:42, 10.19s/it]

Episode: 2100, score: 2.443000, window mean: 2.081265


 21%|██        | 2110/10000 [6:04:39<22:19:45, 10.19s/it]

Episode: 2110, score: 2.336000, window mean: 2.086635


 21%|██        | 2120/10000 [6:06:21<22:18:37, 10.19s/it]

Episode: 2120, score: 2.073500, window mean: 2.096730


 21%|██▏       | 2130/10000 [6:08:04<22:20:54, 10.22s/it]

Episode: 2130, score: 2.353500, window mean: 2.113685


 21%|██▏       | 2140/10000 [6:09:47<22:24:09, 10.26s/it]

Episode: 2140, score: 2.183000, window mean: 2.105630


 22%|██▏       | 2150/10000 [6:11:29<22:21:35, 10.25s/it]

Episode: 2150, score: 2.019000, window mean: 2.100910


 22%|██▏       | 2160/10000 [6:13:12<22:41:12, 10.42s/it]

Episode: 2160, score: 2.135000, window mean: 2.101485


 22%|██▏       | 2170/10000 [6:14:58<24:36:01, 11.31s/it]

Episode: 2170, score: 1.839500, window mean: 2.113370


 22%|██▏       | 2180/10000 [6:16:41<22:15:49, 10.25s/it]

Episode: 2180, score: 2.260000, window mean: 2.113095


 22%|██▏       | 2190/10000 [6:18:24<22:14:54, 10.26s/it]

Episode: 2190, score: 1.945000, window mean: 2.138305


 22%|██▏       | 2200/10000 [6:20:06<22:09:18, 10.23s/it]

Episode: 2200, score: 1.948000, window mean: 2.144700


 22%|██▏       | 2210/10000 [6:21:49<22:09:10, 10.24s/it]

Episode: 2210, score: 2.731000, window mean: 2.138060


 22%|██▏       | 2220/10000 [6:23:32<22:11:31, 10.27s/it]

Episode: 2220, score: 2.370500, window mean: 2.129505


 22%|██▏       | 2230/10000 [6:25:14<22:03:55, 10.22s/it]

Episode: 2230, score: 1.693500, window mean: 2.099530


 22%|██▏       | 2240/10000 [6:26:56<21:58:58, 10.20s/it]

Episode: 2240, score: 2.912000, window mean: 2.127010


 22%|██▎       | 2250/10000 [6:28:38<21:55:38, 10.19s/it]

Episode: 2250, score: 2.124500, window mean: 2.154600


 23%|██▎       | 2260/10000 [6:30:21<22:01:50, 10.25s/it]

Episode: 2260, score: 2.683500, window mean: 2.176915


 23%|██▎       | 2270/10000 [6:32:03<21:55:24, 10.21s/it]

Episode: 2270, score: 1.778500, window mean: 2.207370


 23%|██▎       | 2280/10000 [6:34:12<28:43:19, 13.39s/it]

Episode: 2280, score: 3.160500, window mean: 2.234775


 23%|██▎       | 2290/10000 [6:35:58<22:28:12, 10.49s/it]

Episode: 2290, score: 1.442000, window mean: 2.224320


 23%|██▎       | 2300/10000 [6:37:41<22:03:23, 10.31s/it]

Episode: 2300, score: 2.056000, window mean: 2.231380


 23%|██▎       | 2310/10000 [6:39:24<22:11:26, 10.39s/it]

Episode: 2310, score: 2.500000, window mean: 2.233565


 23%|██▎       | 2320/10000 [6:41:08<22:06:35, 10.36s/it]

Episode: 2320, score: 2.565500, window mean: 2.243660


 23%|██▎       | 2330/10000 [6:42:50<21:54:42, 10.28s/it]

Episode: 2330, score: 2.104500, window mean: 2.263940


 23%|██▎       | 2340/10000 [6:44:34<21:51:37, 10.27s/it]

Episode: 2340, score: 2.653000, window mean: 2.258820


 24%|██▎       | 2350/10000 [6:46:23<22:04:03, 10.38s/it]

Episode: 2350, score: 2.709500, window mean: 2.260015


 24%|██▎       | 2360/10000 [6:48:09<23:17:20, 10.97s/it]

Episode: 2360, score: 2.182000, window mean: 2.248325


 24%|██▎       | 2368/10000 [6:49:37<22:00:11, 10.38s/it]


KeyboardInterrupt: 

In [None]:
agent.save('checkpoint.pth')