Deep Deterministic Policy Gradient Actor-Critic method for solving the continuous control reacher problem.
The continuous control reacher environment is a Unity environment that consists of a double-jointed arm that can move to target locations. The goal is to keep the agents hand in the target area for as long as possible.
- State space: 33 dimensions corresponding to position, rotation, velocity, and angular velocities of the arm.
- Action space: 4 dimensions corresponding to torque applicable to two joints (each with value in [-1,1]).
- Rewards: +0.1 is provided for each step that the agent's hand is in the goal location.
The environment is considered solved when the agents achieve an average reward of +30 (over 100 consecutive episodes, and over all agents)
The code in this project is based heavily off the code from the Udacity Deep Reinforcement Learning ddpg-bipedal code and tuned based on discussion and code in the Udacity mentor chat from Dmitry G.
Follow the instructions at the Udacity Deep Reinforcement Learning repository for general instructions on setting up the environment. Specific instructions for installing and downloading required files for this project are at located in Project 2.
Run control.ipynb
to train the 20-agent model and visualize the scores over time. The logic for the agent and neural network are in ddpg_agent.py
and model.py
, respectively. The model weights for the successful agent are saved in checkpoint_actor.pth
and checkpoint_critic.pth
. Note that there is an alternative approach for the single agent model in the files appended _vanilla
.