Train YOLO or TensorFlow for object detection
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
darknet @ 596cbc2
models @ 8525651
squeezeDet @ e136f1a


This is code for training a CNN for object detection for the RAS project using YOLO on darknet and a few networks with TensorFlow and generating a learning curve. Look in individual code files for more documentation. Below is the general process, though you will have to modify for your particular setup.

Generating Dataset

Getting code

Download this repository and all the submodules:

git clone --recursive
cd ras-object-detection

Images from video

You can record on the robot or use your phone, for instance.

rosrun image_view video_recorder image:=/camera/rgb/image_raw _filename:=video1.avi

I then extract every 10th frame from all the video files and scale down / crop to 640x480:

mkdir images && for i in *.mp4; do
    ffmpeg -i "$i" -vf scale=-2:480 tmp.mp4
    ffmpeg -i tmp.mp4 -vf "select=not(mod(n\,10))" -filter:v "crop=640:in_h" -r 1/1 "images/${i//.mp4/} - %03d.png"
    rm tmp.mp4

Then I go through all the images and delete really bad ones, ones without objects, nearly-identical images, etc. with ristretto . that allows easy deleting without confirmation.

Images from bag file

To record the camera images in a bag file:

rosbag record /camera/rgb/image_raw /camera/depth/image_raw

Note, to run this script, you need to be running roscore.

python datasets/NewDataset/ data.bag

Edit config

Edit to set which dataset you wish to work with. Note this is for both Bash and Python, so remember to make syntax work for both.

Label the images

Output a JSON file with all the images and no annotations yet.

./ # default: every 10th image
./ 1 # or: every image
./ 2 # or: every other image

Open up Sloth (see my Arch PKGBUILD) and then start to drawing bounding boxes around objects.


Convert the JSON file to the formats needed for YOLO and TensorFlow.


./ # set learningCurve option to true/false if you wish
./ rfcn_resnet101 ssd_mobilenet_v1 ssd_inception_v2 faster_rcnn_resnet101

Note: you might want to manually edit the TensorFlow config afterwards to do different data augmentation, e.g. changing random crops, brightness changes, etc. If you have huge imbalances in clases, you might want to use JPEG images rather than PNG since I ended up having the tftrain.record file 53 GiB with PNG and 25 GiB with JPEG.

Training / Testing

Getting starting weights

For YOLO, download and put where you specified in the config file.

For TensorFlow, download Faster RCNN, extract the model.ckpt.* files into datasets/YourDataSet/. Prepend each filename with faster_rcnn_resnet101_.

For TensorFlow, download RFCN, extract the model.ckpt.* files into datasets/YourDataSet/. Prepend each filename with rfcn_resnet101_.

For TensorFlow, download SSD MobileNet, extract the model.ckpt.* files into datasets/YourDataSet/. Prepend each filename with ssd_mobilenet_v1_.

For TensorFlow, download SSD Inception, extract the model.ckpt.* files into datasets/YourDataSet/. Prepend each filename with ssd_inception_v2_.

Before uploading to Kamiak, since protoc is not installed, make sure you run:

cd models/research
protoc object_detection/protos/*.proto --python_out=.

Note: if you're generating a learning curve, ou need one of these model files for each:

for i in 100 90 80 70 60 50 40 30 20 10; do
    cp ssd_mobilenet_v1_model{,_$i}.ckpt.meta
    cp ssd_mobilenet_v1_model{,_$i}.ckpt.index
    cp ssd_mobilenet_v1_model{,_$i}

Copy files over to Kamiak


Compile the modified darknet on Kamiak in an idev session

ssh kamiak
idev --gres=gpu:1 # get on a node to build your code (not the login node)
module load git/2.6.3 gcc/5.2.0 cuda/8.0.44 cudnn/5.1_cuda8.0
cd /data/vcea/matt.taylor/Projects/ras-object-detection/darknet

Training and Testing

Start the train job and then after it has output some weights, you can start testing what weights it has output.

sbatch yolo_train.srun
sbatch yolo_test.srun
sbatch yolo_test_iterations.srun

sbatch tf_train.srun rfcn_resnet101
sbatch tf_train.srun ssd_mobilenet_v1
sbatch tf_train.srun ssd_inception_v2
sbatch tf_train.srun faster_rcnn_resnet101

# Note: run eval jobs as soon as the training starts outputting to tflogs/
sbatch tf_eval.srun rfcn_resnet101
sbatch tf_eval.srun ssd_mobilenet_v1
sbatch tf_eval.srun ssd_inception_v2
sbatch tf_eval.srun faster_rcnn_resnet101

# If using learning curve then instead for training/evaluation:
sbatch tf_train.srun ssd_mobilenet_v1 10
sbatch tf_train.srun ssd_mobilenet_v1 20
sbatch tf_train.srun ssd_mobilenet_v1 30
sbatch tf_train.srun ssd_mobilenet_v1 40
sbatch tf_train.srun ssd_mobilenet_v1 50
sbatch tf_train.srun ssd_mobilenet_v1 60
sbatch tf_train.srun ssd_mobilenet_v1 70
sbatch tf_train.srun ssd_mobilenet_v1 80
sbatch tf_train.srun ssd_mobilenet_v1 90
sbatch tf_train.srun ssd_mobilenet_v1 100

# And after they're all done...
sbatch tf_eval.srun ssd_mobilenet_v1 10
sbatch tf_eval.srun ssd_mobilenet_v1 20
sbatch tf_eval.srun ssd_mobilenet_v1 30
sbatch tf_eval.srun ssd_mobilenet_v1 40
sbatch tf_eval.srun ssd_mobilenet_v1 50
sbatch tf_eval.srun ssd_mobilenet_v1 60
sbatch tf_eval.srun ssd_mobilenet_v1 70
sbatch tf_eval.srun ssd_mobilenet_v1 80
sbatch tf_eval.srun ssd_mobilenet_v1 90
sbatch tf_eval.srun ssd_mobilenet_v1 100

After you're done training with TensorFlow, you can export the networks:

sbatch tf_export.srun rfcn_resnet101
sbatch tf_export.srun ssd_mobilenet_v1
sbatch tf_export.srun ssd_inception_v2
sbatch tf_export.srun faster_rcnn_resnet101

Monitor progress


watch -n 1 squeue -A taylor -l
tail -f slurm_logs/yolo_train.{out,err}

Get YOLO results:



./ # Sync TF log directory every 30 seconds
 tensorboard  --logdir datasets/SmartHome/tflogs

COCO Dataset (Optional)

If you wish to include some data from the COCO dataset, e.g. I wanted to include humans, first download the many gigs of files:

aursync google-cloud-sdk-minimal # For Arch Linux -- install gsutil somehow
mkdir COCO
mkdir annotations
gsutil -m rsync gs:// annotations
mkdir train2017
gsutil -m rsync gs:// train2017

Extract Then extract all the annotations for the classes you want (see script):

python COCO/annotations/annotations_trainval2017/annotations/instances_train2017.json \
    coco_train2017 > output.json

Then copy the images for those annotations to your dataset:

mkdir -p datasets/SmartHome3/coco_train2017
grep filename output.json | sort -u | \
    grep -o '\"[^"]*"$' | sed 's/"//g' | \
    sed 's#.*/#../COCO/train2017/#g' | \
    xargs -i cp --reflink=always {} datasets/SmartHome3/coco_train2017/

Combine ouput.csv with the datasets/SmartHome3/sloth.json manually annotated file (concatenate the two arrays, i.e. remove the extra "] [" in the middle of the file).

To see the class imbalance:


On Kamiak, make sure you copy the pretrained weights:

cd datasets/SmartHome2/
cp -a faster_rcnn_resnet101_model.ckpt.* rfcn_resnet101_model.ckpt.* \
    ssd_inception_v2_model.ckpt.* ssd_mobilenet_v1_model.ckpt.* ../SmartHome3/