Learning to drive from pixel inputs. This is the repository for a software project at the Freie Universtität.
See the requirements.txt
file. The packages can be install with $ pip install -r requirements.txt
.
Most code is written in Jupyter Notebooks. Here is a short overview. A more detailed description can be found inside the notebooks.
- rosbag_to_hdf5.ipynb: Converts the rosbag files to a train and test hdf5 file
- train.ipynb: Train the neural network.
- driver.ipynb: Connects to the car, loads, and runs the neural network live.
data.py
: Contains augmentation functions and other helper functionsmodel.py
: The neural network architecture is defined here.
For collecting the data we build a little test race circuit. For the first generation of data we were simply driving the trip manually controlling the car with the android app. The driving time was about 1 hour. We sticked to driving in the same direction. In the end we had 3 big files with trainig data.
For the second generation we placed various obstacles on the driving road. We used orange soccer balls and some big chess statues. This time we were also driving in both directions. The driving time was about 1 1/2 hour. Again the driving was done manually with the android app. This time we split our trainig data into many smaller data sets.
Finally we combined the data to create our test and train set.
We used the following command for recording the data:
$ rosbag record \
/manual_control/speed \
/manual_control/steering \
/model_car/yaw \
/deepcar/resize_img80x60/compressed
The data can be downloaded from here: https://drive.google.com/open?id=0B4-Jw9T9VL8nYTVTbmRfZHVrVDA
Check the sha1sum:
527d3561561deae40300da706bc0467a5175719c rosbags.tar.gz
The input of the neural network is of shape (48, 64)
.
Given a single frame, the network predicts the steering.
The neural network is made from multiple convolutional blocks. A convolutional block looks like this:
- Convolutional layer with filter size
5x5
. The feature size vary by depth. The edges are padded such that the output has the same shape as the input (type same). - Batch Normalization
- Relu activation.
- Max-Pooling layer scales the feature map down by a factor of two.
This block is repeat 5 times. The first block has a feature size of 32. For every block, we double the feature size, e.g. 32, 64, 128, and so on.
After the last convolutional block, we use an averaging layer and then a dense layer that projects the features to a single scalar which represents the steeering.
We use mean squared error as loss and train the network with the Adam optimizer.
All notebooks can be run remotely. It is preferrable to scale the camera images on the car to save bandwidth. Start the appropiate script on the car:
crop_img.py
: This crops the image to (64, 40) and converts them to grayscale such that they can be used as input to the neural network. The topic name is/deepcar/crop_img64x48/compressed
.resize_img80x60.py
: Crops the image to (80, 60) and leaves them in RGB color space. Run this script if you want to record data. The topic name is/deepcar/resize_img80x60/compressed
.
You have to install tensorflow on the car. We used the car 103 which has tensorflow installed:
- Install the repository by running
pip install -e .
inside this directory. - Download the network weights from here into the
data
directory. The sha1 sum should be:7ae7c7aa2db79cd6c0df149cd3257868dc8e99a7 steering_model.tar.gz
- Extract the weights:
$ tar -xvf steering_model.tar.gz
The weights should now be located atdata/steering_model
where the path is relative to the git repository. - Run the network:
$ cd scripts $ python2 driver.py
- Release the handbreak and give gas:
$ rostopic pub -r 1 manual_control/stop_start std_msgs/Int16 '{data: 1}' # press control+c after 2 seconds $ rostopic pub -r 1 manual_control/speed std_msgs/Int32 '{data: -200}' # press control+c after 2 seconds
There exists multiple possible directions for future work:
- Reinformcement learning: Give an reward for good driving maneuver. A very simple driving enviroment exists in OpenAI's gym.
- Network compression: Improve the performance by using neural network compression, e.g. see this paper.
- Intelligent planning: currently we process only a single frame. How could more frames be taken into account?
- Use Tegra: Run the code on a car with the Tegra chip set.