YOLO (You look only once) is an object detection and classification network that runs real time at 155 frames per second. We implement YOLO here to detect and classify two objects in a scene. It can be extended to detect and classify multiple classes. YOLO works on Darknet framework. More details about darknet can be found here : http://pjreddie.com/darknet/
I present a method to train your own YOLO network . To get a better understanding of the network ,please refer to the publication : [YOLO ] (http://pjreddie.com/media/files/papers/yolo_1.pdf)
To begin , download the repository. Everything needed to run the code is in the Darknet-2-classes folder provided in the repo. You will also need to download weights to run the code , [which can be found here] (http://guanghan.info/download/yolo_2class_box11_3000.weights). Add these weights to the Darknet-2-class folder provided in the repository.
To train YOLO , like all neural networks , you will need images and corresponding labels. I have provided images and labels in the required format by darknet . To use the provided images to start training directly, you only need to change the training list , provided in the darknet-folder , [here] (https://github.com/Yaffa1607/Pandas-Python/blob/master/darknet-2-class/training_list.txt) to direct to the path your images are saved in. For example , the image path are written in the training list as : /Users/yashajain/Desktop/darknet-2-class/images/yieldsign/ Change these to the path of the images/... folder on your laptop. The images folder is a sub directory in the darknet-2-class-folder , [here] (https://github.com/Yaffa1607/Pandas-Python/tree/master/darknet-2-class/images)
You can now skip to step 4 and start Training directly .
Else , To use your own images and data to train do the following :
(1). For Videos, create static images by grabbing frames from the video and save them .
(2). Now , when you have images, you need to label them . To train the network , you need bounding box information for each image. Find the x,y co-ordinates of the region of interest in each image. For example, if you want to detect hands in an image , draw bounding boxes around the hand and save the x,y co-ordinates for each image.
You can use a simple matlab Image Labelling tool to label objects. It outputs the selected region of interest as a mat file . How to use the image labeller is found [here] (https://www.mathworks.com/help/vision/ug/label-images-for-classification-model-training.html) Export these to an excel sheet.
There are a lot of bounding box labelling tools available , we can also use the [Bounding box label] (https://github.com/puzzledqs/BBox-Label-Tool)
There are 2 things required for training :
- Training list
- Boundung box information for each image
To label your own data : At this step , you should have the x, y co-ordinates for each image. Export the mat file to an excel sheet. Upon labeling, convert the the format of annotations generated by Matlab Image Label tool to:
box1_x1 box1_y1 box1_width box1_height
box2_x1 box2_y1 box2_width box2_height
The format for training darknet is a little more complicated . You will need to convert the bounding box information to make it compatible for training. I have provided the math required in the excel sheet , here
After conversion, the format of annotations converted : box1_x1_ratio box1_y1_ratio box1_width_ratio box1_height_ratio box2_x1_ratio box2_y1_ratio box2_width_ratio box2_height_ratio
…. Now , we will need to add the class number to each image.
class_number box1_x1_ratio box1_y1_ratio box1_width_ratio box1_height_ratio class_number box2_x1_ratio box2_y1_ratio box2_width_ratio box2_height_ratio
The class_numbers I have used here are 0 and 1 for the two classes.
The exact conversion math from bounding box from the format required can be found in the scripts/convert.py
Note that each image corresponds to an annotation file that contains its bounding box information. But we only need one single training list of images. Remember to put the folder “images” and folder “annotations” in the same parent directory, as the darknet code looks for annotation files this way (by default).
You can download some examples to understand the format:
[after_conversion.txt] (https://github.com/Yaffa1607/Pandas-Python/blob/master/darknet-2-class/labels/yieldsign/Frame_movie_1_1.txt)
[training_list.txt] (https://github.com/Yaffa1607/Pandas-Python/blob/master/darknet-2-class/training_list.txt)
(1) In [src/yolo.c] (https://github.com/Yaffa1607/Pandas-Python/blob/master/darknet-2-class/src/yolo.c), change class numbers and class names. Chane the paths to the training data and the annotations, i.e., the list we obtained from step 2. If we want to train new classes, in order to display correct Label files, we also need to moidify and run data/labels/make_labels
(2) In [src/yolo_kernels.cu] https://github.com/Yaffa1607/Pandas-Python/blob/master/darknet-2-class/src/yolo_kernels.cu(), change class numbers.
(3) Now we are able to train with new classes, but there is one more thing to deal with. In YOLO, the number of parameters of the second last layer is not arbitrary, instead it is defined by some other parameters including the number of classes,the side(number of splits of the whole image). Please read the paper for a detailed explaination.
(5 x 2 + number_of_classes) x 7 x 7, as an example, assuming no other parameters are modified.
Therefore, in [cfg/yolo.cfg] (https://github.com/Yaffa1607/Pandas-Python/blob/master/darknet-2-class/cfg/yolo.cfg),
change the “output” in line 218, and “classes” in line 222.
(4) Now we are good to go. If we need to change the number of layers and experiment with various parameters, just mess with the cfg file For the original yolo configuration, we have the pre-trained weights to start from. For arbitrary configuration, I’m afraid we have to generate pre-trained model ourselves.
Navigate to your Darknet folder on terminal and try :
./darknet yolo train cfg/yolo.cfg extraction.conv.weights
You can join the Google Group; there are many brilliant people answering questions out there for anymore questions.
https://groups.google.com/forum/#!forum/darknet
To test the trained model :
Run: ./darknet yolo test [cfg_file] [weight_file] [img_name]
-
The images and training folder should be two folders with the same parent directory as in the sample darknet folder
-
The training list should be just 1 file with path names for all images.
-
The labels folder should have the data for each image as a separate txt file.
-
You can change the number of classes as shown above , be sure to create a label for each class.
-
Add ~500-1000 images for each class
-
Common errors I found were , upon training , “cannot load image”. Check your txt file for training ,there is a difference between Python2 and Python3 in treating spaces in text files.
-
If you have more than 2 classes , and do not have the pre-trained weights to begin the trainig with , you will have to train from scratch. It will take considerably more time.
-
Although it is recommended to use opencv , since it makes visualisation easier and more interactable , go to the Makefile in the darknet-2-classes folder and chnage OPENCV=0 from OPENCV=1. Navigate to the darknet folder in your terminal and type " make clean" and then " make ".
-
If you don't have CUDA , change the GPU=0.
More images can be found here : Multimedia Technology and Telecommunications Laboratory, University of Padova [Hand Gesture Dataset] (http://lttm.dei.unipd.it/downloads/gesture/senz3d/data/senz3d_dataset.zip)