# Continuous Control

---

You are welcome to use this coding environment to train your agent for the project.  Follow the instructions below to get started!

### 1. Start the Environment

Run the next code cell to install a few packages.  This line will take a few minutes to run!

In [1]:
from unityagents import UnityEnvironment
import numpy as np

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  A few **important notes**:
- When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```
- To structure your work, you're welcome to work directly in this Jupyter notebook, or you might like to start over with a new file!  You can see the list of files in the workspace by clicking on **_Jupyter_** in the top left corner of the notebook.
- In this coding environment, you will not be able to watch the agents while they are training.  However, **_after training the agents_**, you can download the saved model weights to watch the agents on your own machine! 

In [2]:
from unityagents import UnityEnvironment
import numpy as np
from ddpg_agent import Agent
from collections import deque

my_env = UnityEnvironment(file_name='./Reacher_Linux/Reacher.x86_64')
brain_name = my_env.brain_names[0]
brain = my_env.brains[brain_name]
env_info = my_env.reset(train_mode=True)[brain_name]

# number of agents
print('Number of agents:', len(env_info.agents))

# size of each action
print('Size of each action:', brain.vector_action_space_size)
print('Number of states:', env_info.vector_observations.shape[1])

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Number of agents: 20
Size of each action: 4
Number of states: 33


In [3]:
# number of agents
print('Number of agents:', len(env_info.agents))

# size of each action
print('Size of each action:', brain.vector_action_space_size)
print('Number of states:', env_info.vector_observations.shape[1])

Number of agents: 20
Size of each action: 4
Number of states: 33


In [3]:
def my_ddpg(my_env, agent, n_episodes=1000, max_t=300, print_every=100):
    print("started ddpg")
    scores_deque = deque(maxlen=print_every)
    all_scores = []

    print("starting episodes")
    for i_episode in range(1, n_episodes+1):
        print('a Episode:', i_episode)
        env_info = my_env.reset(train_mode=True)[brain_name]
        states = env_info.vector_observations 
        
        agent.reset()
        scores = np.zeros(len(env_info.agents))
        print('b Episode:', i_episode)
        for t in range(max_t):
            print('t:', t)
            actions = agent.act(states)
            #next_state, reward, done, _ = my_env.step(action)
            env_info = my_env.step(actions)[brain_name]        # send all actions to tne environment
            next_states = env_info.vector_observations         # get next state (for each agent)
            rewards = env_info.rewards                         # get reward (for each agent)
            dones = env_info.local_done                        # see if episode finished
            scores += rewards                                  # update the score (for each agent)
            for state, action, reward, next_state, done in zip(states, actions, rewards, next_states, dones):
                agent.step(state, action, reward, next_state, done, t)
            states = next_states
            if np.any(dones):
                break 
        mean_score = np.mean(scores)
        scores_deque.append(mean_score)
        all_scores.append(mean_score)
        print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, mean_score), end="")
        #torch.save(agent.actor_local.state_dict(), 'checkpoint_actor.pth')
        #torch.save(agent.critic_local.state_dict(), 'checkpoint_critic.pth')
        if i_episode % print_every == 0:
            print('\rEpisode {}\tAverage Score: {:.2f}'.format(i_episode, mean_score))
            
    return all_scores

print("calling ddpg")
scores = my_ddpg(my_env, Agent(state_size=env_info.vector_observations.shape[1], 
                               action_size=brain.vector_action_space_size, random_seed=2))

calling ddpg
cuda:0
state_size: 33
action_size: 4
state_size: 33
action_size: 4
started ddpg
starting episodes
a Episode: 1
b Episode: 1
t: 0




TypeError: step() takes 6 positional arguments but 7 were given

In [1]:
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(1, len(scores)+1), scores)
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.show()

NameError: name 'plt' is not defined