# Udacity Deep Reinforcement Learning Nanodegree Continuous Control Project

# Method

<img src="https://static.wixstatic.com/media/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png/v1/fill/w_240,h_216/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png">

## Neural network architecture

<img src="https://static.wixstatic.com/media/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png/v1/fill/w_240,h_216/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png">

## Hyper-parameters

<img src="https://static.wixstatic.com/media/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png/v1/fill/w_240,h_216/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png">

# Implementation

## Initial setup

Please follow the instructions of `README.md` (found in the same root folder as the present notebook) to install the required packages and download the game environment.

## Imports

The following packages are required for the notebook to work.

In [1]:
import numpy as np

from IPython.display import Audio
from unityagents import UnityEnvironment

## Unity Environment

Modify the next line so that the `file_name` parameter matches the location of the Unity environment you downloaded.

In [2]:
env = UnityEnvironment(file_name="Reacher_Windows_x86_64/Reacher.exe")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


An empty Unity window should have opened. If you switch to it during training or testing, you will be able to watch the agent in action.

**Note**: if the environment is set to training mode, the action will be so fast, it might be uncomfortable to watch.

Once the environment is launched, we need to connect to it via a *brain name*. In some Unity environments, there are more than one brain/agent to control, but in the present case, there is only one.

In [3]:
# Connect to the default (and only) brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

## Random Agent

The following random agent was provided by Udacity as a way to check the environment is properly setup.

In [4]:
env_info = env.reset(train_mode=False)[brain_name]     # reset the environment    
num_agents = len(env_info.agents)                      # number of agents
action_size = brain.vector_action_space_size           # size of each action
states = env_info.vector_observations                  # get the current state (for each agent)
scores = np.zeros(num_agents)                          # initialize the score (for each agent)
while True:
    actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
    actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
    env_info = env.step(actions)[brain_name]           # send all actions to tne environment
    next_states = env_info.vector_observations         # get next state (for each agent)
    rewards = env_info.rewards                         # get reward (for each agent)
    dones = env_info.local_done                        # see if episode finished
    scores += env_info.rewards                         # update the score (for each agent)
    states = next_states                               # roll over states to next time step
    if np.any(dones):                                  # exit loop if episode finished
        break
print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))
Audio('notification.wav', autoplay=True)

Total score (averaged over agents) this episode: 0.15949999643489718


## Closing the Unity Environment

In [5]:
env.close()

# Conclusion

<img src="https://static.wixstatic.com/media/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png/v1/fill/w_240,h_216/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png">

# References

<img src="https://static.wixstatic.com/media/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png/v1/fill/w_240,h_216/eebf4a_fea65753056c41ee9c02e4d402ef922c~mv2.png">

# Credits

- The neural network architecture diagram was generated with [NN SVG](http://alexlenail.me/NN-SVG/index.html) then edited with [Inkscape](https://inkscape.org/)
- The GIF recordings were made with [ScreenToGif](https://www.screentogif.com/)
- The notification sound was found on [WavSource.com](http://www.wavsource.com/video_games/pac-man.htm)