# Continuous Control

---

Congratulations for completing the second project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program!  In this notebook, you will learn how to control an agent in a more challenging environment, where the goal is to train a creature with four arms to walk forward.  **Note that this exercise is optional!**

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Crawler.app"`
- **Windows** (x86): `"path/to/Crawler_Windows_x86/Crawler.exe"`
- **Windows** (x86_64): `"path/to/Crawler_Windows_x86_64/Crawler.exe"`
- **Linux** (x86): `"path/to/Crawler_Linux/Crawler.x86"`
- **Linux** (x86_64): `"path/to/Crawler_Linux/Crawler.x86_64"`
- **Linux** (x86, headless): `"path/to/Crawler_Linux_NoVis/Crawler.x86"`
- **Linux** (x86_64, headless): `"path/to/Crawler_Linux_NoVis/Crawler.x86_64"`

For instance, if you are using a Mac, then you downloaded `Crawler.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Crawler.app")
```

In [2]:
env = UnityEnvironment(file_name='d:/DRL/app/Crawler_Windows_x86_64/Crawler.exe')

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: CrawlerBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 129
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 20
        Vector Action descriptions: , , , , , , , , , , , , , , , , , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [4]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

Run the code cell below to print some information about the environment.

In [5]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 12
Size of each action: 20
There are 12 agents. Each observes a state with length: 129
The state for the first agent looks like: [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  2.25000000e+00
  1.00000000e+00  0.00000000e+00  1.78813934e-07  0.00000000e+00
  1.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  6.06093168e-01 -1.42857209e-01 -6.06078804e-01  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  1.33339906e+00 -1.42857209e-01
 -1.33341408e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
 -6.0609

### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [5]:
env_info = env.reset(train_mode=False)[brain_name]     # reset the environment    
states = env_info.vector_observations                  # get the current state (for each agent)
scores = np.zeros(num_agents)                          # initialize the score (for each agent)
while True:
    actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
    actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
    env_info = env.step(actions)[brain_name]           # send all actions to tne environment
    next_states = env_info.vector_observations         # get next state (for each agent)
    rewards = env_info.rewards                         # get reward (for each agent)
    dones = env_info.local_done                        # see if episode finished
    scores += env_info.rewards                         # update the score (for each agent)
    states = next_states                               # roll over states to next time step
    if np.any(dones):                                  # exit loop if episode finished
        break
print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

Total score (averaged over agents) this episode: 0.8618722101673484


When finished, you can close the environment.

### 4. Take Actions with DDPG

In [6]:
import torch
from collections import deque
from ddpg_agent import Agent
import matplotlib.pyplot as plt
%matplotlib inline

In [7]:
def ddpg(n_episodes=18000, max_t=1000,target_score=15):
    """ Deep Deterministic Policy Gradients
    Params
    ======
        n_episodes (int): maximum number of training episodes
        max_t (int): maximum number of timesteps per episode
    """
    scores_window = deque(maxlen=100)
    scores = np.zeros(num_agents)
    scores_episode = []
    
    agents =[] 
    
    for i in range(num_agents):
        agents.append(Agent(state_size, action_size, random_seed=0))
    
    for i_episode in range(1, n_episodes+1):
        env_info = env.reset(train_mode=True)[brain_name]
        states = env_info.vector_observations
        
        for agent in agents:
            agent.reset()
            
        scores = np.zeros(num_agents)
            
        for t in range(max_t):
            actions = np.array([agents[i].act(states[i]) for i in range(num_agents)])
            env_info = env.step(actions)[brain_name]        # send the action to the environment
            next_states = env_info.vector_observations     # get the next state
            rewards = env_info.rewards                     # get the reward
            dones = env_info.local_done        
            
            for i in range(num_agents):
                agents[i].step(t,states[i], actions[i], rewards[i], next_states[i], dones[i]) 
 
            states = next_states
            scores += rewards
            if t % 20:
                print('\rTimestep {}\tScore: {:.2f}\tmin: {:.2f}\tmax: {:.2f}'
                      .format(t, np.mean(scores), np.min(scores), np.max(scores)), end="") 
            if np.any(dones):
                break 
        score = np.mean(scores)
        scores_window.append(score)       # save most recent score
        scores_episode.append(score)

        print('\rEpisode {}\tScore: {:.2f}\tAverage Score: {:.2f}'.format(i_episode, score, np.mean(scores_window)), end="\n")
        if i_episode % 100 == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
        if np.mean(scores_window)>=target_score:
            print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_window)))
            torch.save(Agent.actor_local.state_dict(), 'models/crawler_actor.pth')
            torch.save(Agent.critic_local.state_dict(), 'models/crawler_critic.pth')
            break
            
    return scores_episode

scores = ddpg()


Initialising ReplayBuffer
Episode 1	Score: 2.01	Average Score: 2.0121
Episode 2	Score: 1.19	Average Score: 1.6054
Episode 3	Score: 2.38	Average Score: 1.8685
Episode 4	Score: 1.91	Average Score: 1.8730
Episode 5	Score: 2.08	Average Score: 1.9271
Episode 6	Score: 2.20	Average Score: 1.9649
Episode 7	Score: 2.26	Average Score: 2.0080
Episode 8	Score: 2.15	Average Score: 2.0272
Episode 9	Score: 2.04	Average Score: 2.0243
Episode 10	Score: 2.25	Average Score: 2.055
Episode 11	Score: 2.14	Average Score: 2.067
Episode 12	Score: 2.29	Average Score: 2.082
Episode 13	Score: 2.10	Average Score: 2.087
Episode 14	Score: 2.20	Average Score: 2.098
Episode 15	Score: 1.98	Average Score: 2.080
Episode 16	Score: 2.26	Average Score: 2.095
Episode 17	Score: 2.22	Average Score: 2.105
Episode 18	Score: 2.07	Average Score: 2.102
Episode 19	Score: 2.47	Average Score: 2.125
Episode 20	Score: 2.09	Average Score: 2.125
Episode 21	Score: 2.23	Average Score: 2.126
Episode 22	Score: 1.90	Average Score: 2.117
Episod

Episode 186	Score: 2.99	Average Score: 1.99
Episode 187	Score: 2.80	Average Score: 2.01
Episode 188	Score: 2.81	Average Score: 2.03
Episode 189	Score: 2.81	Average Score: 2.04
Episode 190	Score: 2.94	Average Score: 2.06
Episode 191	Score: 2.77	Average Score: 2.08
Episode 192	Score: 2.74	Average Score: 2.10
Episode 193	Score: 2.71	Average Score: 2.12
Episode 194	Score: 2.76	Average Score: 2.13
Episode 195	Score: 2.77	Average Score: 2.15
Episode 196	Score: 3.00	Average Score: 2.16
Episode 197	Score: 2.70	Average Score: 2.17
Episode 198	Score: 2.99	Average Score: 2.19
Episode 199	Score: 2.90	Average Score: 2.20
Episode 200	Score: 2.96	Average Score: 2.22
Episode 200	Average Score: 2.22
Episode 201	Score: 3.00	Average Score: 2.23
Episode 202	Score: 2.84	Average Score: 2.25
Episode 203	Score: 3.08	Average Score: 2.26
Episode 204	Score: 2.93	Average Score: 2.28
Episode 205	Score: 3.15	Average Score: 2.29
Episode 206	Score: 3.16	Average Score: 2.31
Episode 207	Score: 2.52	Average Score: 2.32


Episode 370	Score: 5.10	Average Score: 6.70
Episode 371	Score: 4.47	Average Score: 6.68
Episode 372	Score: 6.48	Average Score: 6.68
Episode 373	Score: 6.17	Average Score: 6.69
Episode 374	Score: 6.02	Average Score: 6.69
Episode 375	Score: 6.24	Average Score: 6.69
Episode 376	Score: 9.52	Average Score: 6.744
Episode 377	Score: 7.09	Average Score: 6.77
Episode 378	Score: 8.41	Average Score: 6.80
Episode 379	Score: 8.45	Average Score: 6.841
Episode 380	Score: 5.18	Average Score: 6.84
Episode 381	Score: 4.64	Average Score: 6.84
Episode 382	Score: 4.57	Average Score: 6.84
Episode 383	Score: 7.74	Average Score: 6.90
Episode 384	Score: 4.48	Average Score: 6.94
Episode 385	Score: 6.58	Average Score: 6.94
Episode 386	Score: 9.31	Average Score: 6.974
Episode 387	Score: 6.52	Average Score: 6.98
Episode 388	Score: 10.28	Average Score: 7.017
Episode 389	Score: 10.54	Average Score: 7.042
Episode 390	Score: 10.50	Average Score: 7.084
Episode 391	Score: 10.89	Average Score: 7.121
Episode 392	Score: 9.

Episode 723	Score: 16.60	Average Score: 16.127
Episode 724	Score: 15.18	Average Score: 16.10
Episode 725	Score: 21.21	Average Score: 16.169
Episode 726	Score: 17.04	Average Score: 16.192
Episode 727	Score: 16.84	Average Score: 16.202
Episode 728	Score: 16.56	Average Score: 16.212
Episode 729	Score: 16.15	Average Score: 16.213
Episode 730	Score: 14.20	Average Score: 16.204
Episode 731	Score: 14.18	Average Score: 16.183
Episode 732	Score: 16.41	Average Score: 16.192
Episode 733	Score: 17.08	Average Score: 16.202
Episode 734	Score: 19.18	Average Score: 16.231
Episode 735	Score: 12.74	Average Score: 16.226
Episode 736	Score: 23.20	Average Score: 16.296
Episode 737	Score: 18.16	Average Score: 16.302
Episode 738	Score: 16.67	Average Score: 16.264
Episode 739	Score: 22.85	Average Score: 16.330
Episode 740	Score: 14.11	Average Score: 16.320
Episode 741	Score: 17.78	Average Score: 16.344
Episode 742	Score: 16.92	Average Score: 16.343
Episode 743	Score: 15.27	Average Score: 16.344
Episode 744	Sc

Episode 901	Score: 0.12	Average Score: 4.06
Episode 902	Score: 0.03	Average Score: 4.00
Episode 903	Score: 0.24	Average Score: 3.94
Episode 904	Score: -0.04	Average Score: 3.94
Episode 905	Score: 0.35	Average Score: 3.86
Episode 906	Score: 0.36	Average Score: 3.81
Episode 907	Score: 0.40	Average Score: 3.76
Episode 908	Score: 0.12	Average Score: 3.71
Episode 909	Score: 6.18	Average Score: 3.754
Episode 910	Score: 14.30	Average Score: 3.9242
Episode 911	Score: 1.05	Average Score: 3.93
Episode 912	Score: 3.57	Average Score: 3.99
Episode 913	Score: 10.58	Average Score: 4.1374
Episode 914	Score: 2.72	Average Score: 4.14
Episode 915	Score: 4.88	Average Score: 4.178
Episode 916	Score: 3.63	Average Score: 4.212
Episode 917	Score: 2.02	Average Score: 4.23
Episode 918	Score: 5.06	Average Score: 4.299
Episode 919	Score: -0.56	Average Score: 4.271
Episode 920	Score: -5.69	Average Score: 4.2297
Episode 921	Score: -4.34	Average Score: 4.180
Episode 922	Score: -4.01	Average Score: 4.1325
Episode 923

Episode 1259	Score: 3.95	Average Score: 5.07
Episode 1260	Score: 4.45	Average Score: 5.09
Episode 1261	Score: 3.92	Average Score: 5.092
Episode 1262	Score: 4.39	Average Score: 5.10
Episode 1263	Score: 3.19	Average Score: 5.09
Episode 1264	Score: 4.30	Average Score: 5.04
Episode 1265	Score: 1.67	Average Score: 4.99
Episode 1266	Score: 3.48	Average Score: 4.95
Episode 1267	Score: 6.10	Average Score: 4.95
Episode 1268	Score: 4.85	Average Score: 4.88
Episode 1269	Score: 4.11	Average Score: 4.89
Episode 1270	Score: 5.36	Average Score: 4.904
Episode 1271	Score: 3.93	Average Score: 4.89
Episode 1272	Score: 2.56	Average Score: 4.86
Episode 1273	Score: 4.37	Average Score: 4.820
Episode 1274	Score: 2.45	Average Score: 4.80
Episode 1275	Score: 5.04	Average Score: 4.81
Episode 1276	Score: 3.34	Average Score: 4.78
Episode 1277	Score: 4.15	Average Score: 4.77
Episode 1278	Score: 6.98	Average Score: 4.672
Episode 1279	Score: 4.15	Average Score: 4.632
Episode 1280	Score: 5.01	Average Score: 4.656
Epis

Episode 1617	Score: -0.39	Average Score: 2.35
Episode 1618	Score: 3.11	Average Score: 2.35
Episode 1619	Score: 1.72	Average Score: 2.35
Episode 1620	Score: 1.10	Average Score: 2.32
Episode 1621	Score: 2.00	Average Score: 2.31
Episode 1622	Score: 1.36	Average Score: 2.294
Episode 1623	Score: 2.22	Average Score: 2.28
Episode 1624	Score: -0.82	Average Score: 2.25
Episode 1625	Score: 1.86	Average Score: 2.23
Episode 1626	Score: 1.76	Average Score: 2.21
Episode 1627	Score: -0.14	Average Score: 2.176
Episode 1628	Score: -2.73	Average Score: 2.120
Episode 1629	Score: 1.74	Average Score: 2.10
Episode 1630	Score: 2.17	Average Score: 2.09
Episode 1631	Score: 1.98	Average Score: 2.07
Episode 1632	Score: 1.27	Average Score: 2.06
Episode 1633	Score: 1.62	Average Score: 2.04
Episode 1634	Score: 1.60	Average Score: 2.02
Episode 1635	Score: 1.92	Average Score: 2.02
Episode 1636	Score: 0.69	Average Score: 1.99
Episode 1637	Score: 0.38	Average Score: 1.975
Episode 1638	Score: 1.57	Average Score: 1.95
Ep

KeyboardInterrupt: 

## 5.plot the score

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(len(scores)), scores)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

In [None]:
env.close()