Training

Train the Network

This chapter explains how to train a new network for DeepDriving.

The caffe framework from this repository is build and installed correctly.
There are training data available (for example the training data provided at the DeepDriving webpage: http://deepdriving.cs.princeton.edu/).

For an easy start, a couple of environment variables can be defined:

export DEEPDRIVING_CAFFE_PATH=<path-to-installation>

(optional) DEEPDRIVING_SOLVER_PATH must contain a path to the caffe solver-file (for example <repository-path>/torcs/pre_trained/driving_solver_1F.prototxt). If this variable is not defined, the standard solver-file from this repository is used.
(optional) DEEPDRIVING_GPU must contain the ID of the GPU which should be used for inference. On a single GPU system, this ID is normally 0. If this variable is not defined or contains the value -1, the CPU is used instead of the GPU, which leads to bad performance. Note, that Caffe must be compiled for GPU support with CUDA and cuDNN to allow the DeepDriving network calculating on GPU.

cd <repository-path>/torcs

./torcs_train.sh

(optional) Or resume the training from a saved solver-state.
- In this case the GPU-ID must be given as first argument (-1 if the CPU should be used)

./torcs_train.sh <gpu-id> <path-to-solverstate>

The default training-model at this repository expects the training data inside the directory <repository-path>/torcs/pre_trained/TORCS_Training_1F. If a different directory as source for the training data should be used, the training-data-path inside the file <repository-path>/torcs/pre_trained/driving_train_1F.prototxt must be adapted (in the layer "data" the value of parameter "source").
During training the stochastic gradient descent solver outputs the current batch-loss every 100 iterations. Furthermore the current solver-state is stored every 2000 iterations.
On a i7 4770k processor (32 GB Ram) with NVidia GTX 680 (4 GB RAM) and training samples stored on a SSD, a training rate of 2,2 Iterations per Second can be achieved. Thus a full training with around 140000 iterations can be finished within around 18 hours.
Due to the random access to the LevelDB training database, one major bottleneck is the reading of the training samples from hard-disk. Thus storing the training samples on a SSD improves the training speed significantly.