# Description of Implementation
## Learning Algorithm
* ### Algorithm - Deep Deterministic Policy Gradients (DDPG)Implementation
This project implements an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces called [Deep Deterministic Policy Gradients](https://arxiv.org/abs/1509.02971). 

![Deep Q Network](images/DDPG.png)
* ### Hyperparameters
    * Replay buffer size
      ```bash
      list BUFFER_SIZE = int(1e5)   
      ```
    * Minibatch size
      ```bash
      BATCH_SIZE = 250            
      ```
    * Discount factor
    ```bash
    GAMMA = 0.99                  
    ```
    *  For soft update of target parameters
    ```bash
    TAU = 1e-3                    
    ```
    * Learning rate of the Actor 
    ```
    LR_ACTOR = 1e-4              
    ```
    * Learning rate of the Critic
    ```
    LR_CRITIC = 1e-3           
    ```
    * L2 weight decay
    ```
    WEIGHT_DECAY = 0        
    ```
    * OU Process
    ```
    mu=0.
    theta=0.15
    sigma=0.2
    ```
* ### Accelerate the trainings
    
    * in ddpg_agent.py,
      Learn, if enough samples are available in memory
    ```
        if len(self.memory) > BATCH_SIZE:
            experiences = self.memory.sample()
            self.learn(experiences, GAMMA)
    ```
   
* ### Neural Network Architecture
    The Neural Network will take states as inputs and output actions.
    * Actor
    ```
    fc1_units=200, fc2_units = 150
    ReLu is applied
    The final output is generated through Tanh
    ```
    * Critic
    ```
    For multiple agents, we define the inputs for all agents
    self.fcs1 = nn.Linear((state_size+action_size) * num_agents, fcs1_units)
    fc1_units=200, fc2_units = 150
    ReLu is applied
    ```
* ### Setup for Multiple Agents
    * Define Memory to save teh replaybuffer for all agents
       ```
       sharedBuffer = ReplayBuffer(BUFFER_SIZE, BATCH_SIZE)
       ```
    * Initialize multiple agents
       ```
       self.agents = [DDPGAgent(state_size,action_size,random_seed) for x in range(num_agents)]
       ```
    * Add current (states, actions, rewards, next_states, dones) into sharedBuffer
       ```
       sharedBuffer.add(states, actions, rewards, next_states, dones)
       ```
    * Actions based on individual agent's action
       ```
       for index, agent in enumerate(self.agents):
            actions[index, :] = agent.act(states[index], add_noise)
       ```

 
                
* ### Clip the action between -1 and 1
    return np.clip(action, -1, 1)

## Plot of Rewards

Running DDPG with above hyperparameters and Neural Network Architecture, the agent is able to receive an average reward (over 100 episodes) of at least `+0.5` after `1563` episodes!

![Epsode Solution](images/Episode_solution.png)

## Ideas for Future Work
* ### Read paper to determine performance of various deep RL algorithms on continuous control tasks
    * Implement REINFORCE, TNPG, RWR, REPS, TRPO, CEM, CMA-ES and DDPG,
    * [Deep Deterministic Policy Gradients](https://arxiv.org/abs/1604.06778).