Target Acquisition

About the Project

This is an application of the Deep Deterministic Policy Gradient (DDPG) reinforcement learning algorithm to learn voltage-level control of a two-wheeled Differential Drive mobile robot. Normally, voltage-level control is handled by dedicated controllers running at kilohertz speeds. Then, higher level path planners can command the vehicle using velocity or position setpoints.

In this project, I show a simple MLP with 2 hidden layers, running at 20 Hz, can learn control of a highly nonlinear vehicle. This model operates on a series of 4 85x48 binary images (state space dimensionality: 16320) and outputs a continuous value for each motor. For this project, the model is tasked with facing a square target in the provided binary image. The reward provided is the increase or decrease in robot heading, compared to the heading exactly focused on the target.

Project Structure

This repo 2 main sections:

DDPG Folder: An existing pytorch implementation of DDPG with slight modifications from this repo.
Simulator Folder: A custom Differential Drive, DC Brushed Motor, and Pinhole Camera simulator.

Using this repository

First, install dependencies from the requirements.txt. This can be done easily with pip:

pip install -r requirements.txt

For training the model from scratch, simply run the train.py file. Results can optionally be logged to weights and biases with the --wandb flag.

python train.py

To run a pretrained model, first download the checkpoints from here and extract into the project directory. Then run all cells in the evaluate.ipynb notebook. At the bottom, a video showing the model controlling a robot should appear.

Results from Model Training

DDPG is a rather unstable algorithm. This can result in the policy converging and collapsing repeatedly as shown by the reward graph below. Interestingly, because the task of facing a target is rather open-ended, the algorithm converges on a few different policies over the training period.

Initially, the algorithm performs very poorly, as shown in the following video. The vehicle turns quickly away from the target it should be pointed at, resulting in a high negative reward. Note that the following videos are best viewed in full screen.

epoch_100.mp4

Around epoch 1800, the first useful policy of moving forward and facing the target emerges. Note that this corresponds to a spike in the above reward graph as well:

epoch_1800.mp4

At epoch 2000, this policy changes slightly, resulting in underdamped control of the system:

epoch_2000.mp4

Near epoch 2400, a new policy emerges, with the robot facing the target and driving backwards:

epoch_2400.mp4

At epoch 3000, underdamped behavior of this backwards policy also emerges

epoch_3000.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
DDPG		DDPG
Simulator		Simulator
media		media
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.ipynb		evaluate.ipynb
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Target Acquisition

About the Project

Project Structure

Using this repository

Results from Model Training

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Target Acquisition

About the Project

Project Structure

Using this repository

Results from Model Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages