Raspberry Pi YOLO Training #289

WTeichert · 2017-12-02T09:03:06Z

Greatings everyone,

I am in the middel of my student research project.
Therefor I am creating an object detection and classification which fits for a pi.
I am using YAD2k running on PI, because it has less computational demands.
I plan to train my network by VOC with different training cfg's.

I am asking you, for some advises, tipps or tricks I can use.

I will change so far:

number of convoltional and pooling layers + filters per layer (like yolo-voc -> yolo-voc-tiny)
learning rate, steps, scale
max batches
height and weight (to a minimum of 224, possible? what should I do with anchors, devide by 2? Do I have to add "resize_network(nets + i, nets[i].w, nets[i].h);" in detector.c line 40-41?
greyscaled pictures with channels=1
random on or off
max pooling size
stride to 2 <- much more speed, less accuracy

I have also few questions:
What does activation: leaky or linear do?
saturation/exposure are always the same, what do they do?

Thank you for all inspiration! :)

AlexeyAB · 2017-12-13T09:59:45Z

Hi,

height and weight (to a minimum of 224, possible? what should I do with anchors, devide by 2? Do I have to add "resize_network(nets + i, nets[i].w, nets[i].h);" in detector.c line 40-41?

You shouldn't add resize_network(). Just set width=224 height=224 in the your cfg-file
Yes, just divide anchors by 2
if you use random=1 then you should change these two line:

darknet/src/detector.c

Lines 98 to 99 in 75c39f5

int dim = (rand() % 10 + 10) * 32;

if (get_current_batch(net)+100 > net.max_batches) dim = 544;

to these, for resolution ~224x224

            int dim = (rand() % 5 + 5) * 32;
            if (get_current_batch(net)+100 > net.max_batches) dim = 224;

random=1 gives you about +1% mAP

What does activation: leaky or linear do?

linear: y = x
leaky (ReLU): if(x>0) { y = x; } else { y = x/10; }

saturation/exposure are always the same, what do they do?

saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV
The larger the value, the more invariance would neural network to change of lighting and color of the objects. More: #279 (comment)

WTeichert · 2017-12-16T09:30:14Z

Thank you alot!

Of course I first watch the training. There I came to the point, that
Darknet19 448x448
should be used.
You've written that "This model performs significantly better but is slower since the whole image is larger.".
Since I need to speed and tight up the whole algorithm, I want to use darknet19 on it s basic configuration.

Now my question, from where can I get these darknet19.conv.xx for training?
May I can use yolo-voc-tiny.weights as my basic, like this backup training? (but my cfg changed in a few lines)

And one more question:
I got access to a computation centre where I ve 4 CPUs and 2 GPUs I can use. As I've read on your page, YOLOv2 is not made for multi cpu. Are there have been some changes? Is it helpful, that tensorflow is configured for multi CPU?

AlexeyAB · 2017-12-16T10:20:08Z

There is path to darknet19_448.conv.23: http://pjreddie.com/media/files/darknet19_448.conv.23
You can find this path here: https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data
You can do darknet.exe partial tiny-yolo-voc.cfg tiny-yolo-voc.weights tiny-yolo-vo.conv.13 13, so you will get pre-trained file tiny-yolo-vo.conv.13, then you can use it for training
You can use multi-GPU for training: https://github.com/AlexeyAB/darknet#how-to-train-with-multi-gpu
But for detection you can use only one GPU.
If you don't want to use GPU, then you can use multi-CPU (many Cores in one CPU and many CPUs on the one motherboard - ccNUMA) - but it's slow enough
- on Windows compile build\darknet\darknet_no_gpu.sln https://github.com/AlexeyAB/darknet#how-to-compile-on-windows
- on Linux change OPENMP=1 in the Makefile and run make https://github.com/AlexeyAB/darknet#how-to-compile-on-linux

WTeichert · 2017-12-16T10:32:10Z

Thank you again ^^
This partial training sounds interessting!
I take a trained weight and make it as my pre-trained base?
Where does the 13 come from (=number of layers - last "class" layer)?
Can I use my own cfg, or do I need to use tiny-yolo-voc.cfg? Wouldn t I ve problems when they are different?

Little missunderstanding,

I look for darknet19, not trained on 448x448, so the previous version of it!
I want to use it all, so 4 cpus + 2 gpus for training.

For detection I ve to look forward, to get the best out of a Raspberry Pi.

AlexeyAB · 2017-12-16T11:01:36Z

Tiny-yolo has 16 layers, where last 2 layers: are detection-layer and conv-layer that depends on number of classes. So you can use any number of the first layers from 1 to 14 for partial.
To do partial you should use cfg that corresponds to the weights file - i.e. tiny-yolo-voc.cfg for tiny-yolo-voc.weights.
Just use multi-GPU, each GPU is 100x times faster than each CPU, so there isn't any reason to use CPU: https://github.com/AlexeyAB/darknet#how-to-train-with-multi-gpu

WTeichert · 2018-03-29T07:44:51Z

Hey, first of all, thank you for your time.
I am now done with the trainings (had some different stuff to do), but it doesn t work out like I thought.

I tried to train on Pascal Voc, followed your instructions, went all fine.
Not sure if that matters, I chose the pre-trained model Darknet19_448.conv.23 instead of darknet53.conv.74 (I think this was changed by you?)
My cfg1 you can see below. With 45 000 iterations it mostly detects chairs, doesn t matter if it is a person or a dog or whatever
for cfg4 i just changed width+height to 608 and multiplied the anchors by 4 -> their is no detection at all, also IOU and Recall are 0 when i try to valid the weights

Did I missed something or is it just a network conflict, that the parameters doesn t fit to the dataset?
overview to all cfg's I trained

cfg_overview.pdf

`[net]
batch=64
subdivisions=64
width=224
height=224
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
max_batches = 45000
policy=steps
steps=100,25000,35000
scales=.1,.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=125
activation=linear

[region]
anchors = 0.54,0.60, 1.71,2.2, 3.32,5.69, 4.71,2.55, 8.31,5.26
bias_match=1
classes=20
coords=4
num=5
softmax=1
jitter=.2
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=0
`

AlexeyAB · 2018-03-29T11:04:05Z

@WTeichert

darknet53.conv.74 should be used only if your cfg-file based on yolov3.cfg (only for Yolo v3). But if your cfg-file based of tiny-yolo-voc.cfg, or yolov2-tiny-voc.cfg or yolo-voc.2.0.cfg or yolov2-voc.cfg (Yolo v2) then you should use darknet19_448.conv.23
On what cfg-file did you base your cfg-file?
Do you try to train yolo on CPU?
Can you get any good results or the results of all the trainings are bad?
How many iterations did you train?

WTeichert · 2018-03-29T11:14:48Z

Than I was right
based on tiny-yolo-voc.cfg
trained on gpu, cpu was way to slow (50min for 1k iterations)
no, all are trash
45k-60k, see max batches.
+I trained from windows

AlexeyAB · 2018-03-29T11:51:56Z

What was average loss?
And what mAP can you get for one of your weights file? darknet.exe detector map voc.data your.cfg your_40000.weights
Can you compress your files: .data, .cfg, .names, train.txt, cmd-file for training - and pin this compressed archive here to the message?

WTeichert · 2018-03-29T13:40:54Z

I don t know
mAP doesn t work, get this error...
File "...\YOLOv2\darknet-master\build\darknet\x64\voc_eval_py3.py", line 157, in voc_eval
R = class_recs[image_ids[d]]
KeyError: '003028'
training_data.zip

For next day I ve nomore access to this data, so further data can be sended on monday

AlexeyAB · 2018-03-29T14:46:21Z

You should change this line:

darknet/build/darknet/x64/calc_mAP.cmd

Line 7 in d0039f6

darknet.exe detector map data/voc.data cfg/yolov2-voc.cfg yolo-voc.weights

To this line (with your files: data, cfg, weights):
darknet.exe detector map data/obj.data cfg/yolo_obj.cfg yolo-obj.weights

And run it.

Also what cammnd do you use for training?

WTeichert · 2018-03-29T20:27:13Z

I ve tried both ways. First with my data and cfg, and 2nd with your tiny-voc and voc. Doesn t worked.

The command for training is written in train.cmd
darknet.exe detector train data/cfg1.data cfg/cfg1.cfg darknet19_448.conv.23

AlexeyAB · 2018-03-29T20:51:23Z

@WTeichert This command:
darknet.exe detector map data/cfg1.data cfg/cfg1.cfg cfg1_40000.weights
can't give this error, because there isn't any from Python.

What is the error gives this command?

WTeichert · 2018-03-30T07:16:19Z

Ahh little missunderstanding. I tried using map, but nothing happend (so far as i can see)
So I chose calc_mAP_voc_py.cmd and changed line 8 to my files and line 9 to my voc dir

voc_eval_py3.py", line 157 was the error

Could it be, that I chose the learning rate too little, so the network doesn t learn something new out of new input size?
Or could it be, that the change in anchors lead to this mistake?

AlexeyAB · 2018-03-30T12:00:25Z

Attach screenshot of "nothing happend" that happen after this command - you should wait sometimes 10 minutes while mAP will be calculated)
darknet.exe detector map data/cfg1.data cfg/cfg1.cfg cfg1_40000.weights

I don't know is there any mistake. I can't say anything without mAP.
What repo did you use for training?

WTeichert · 2018-03-30T12:23:23Z

nothing happens, means nothing i can see directly.
I am not sure which repo I am using (repo?) but since in the introduction is said "If you use another GitHub repository, then use darknet.exe detector recall... instead of darknet.exe detector map". I tried both and map has not give me any visual output. cmd just ended and console is waiting for next cmd, nothing happend (<- that s ment by it, their is no screenshot for it) so I chose recall to check IOU and recall. IOU I got maximum of 28%
I copyed this github we re talking on and followed instructions for windwos, so I thought should be this repo...
When I use darknet.exe detector valid, it created lot of blank class files in results. Instead with yolo-voc they were full with notes and detections, that s all I can say to mAP.

Problem is I am not into c programming, I just in a little python, so all the dector.c - compile and functions in c I understand on the very top.

I am not at office these days, so I can try once more on Tuesday, but I don t think it will change the results.

WTeichert · 2018-04-03T07:22:13Z

first commend was with recall instead of map, second doesn t give any result as far as I can see

first commend was with valid instead of recall and created the files in results

AlexeyAB · 2018-04-03T11:03:01Z

@WTeichert Try to update your code from this repo.

WTeichert · 2018-04-06T07:32:58Z

Done, but same error.
Was someting changed to detector? Because I cannot update darknet version, since my MSVS liscenes expired.

AlexeyAB · 2018-04-06T18:03:00Z

You should recompile code in MSVS after that your repo is updated.
Yes, there was added Yolo v3, fused batch_norm (+7% speedup), calc anchors and mAP, AVX on CPU (+20% speedup) and many other things...
You can install free MSVS2015 community that I use: https://go.microsoft.com/fwlink/?LinkId=532606&clcid=0x409

WTeichert · 2018-04-07T09:39:45Z

Finally map works ! needed a few trys with cuda 9.1, 9.0, 8.0 and their cuDnn libarys cause this error accured:

I solved it with creating new repo instead of updating.

It was the cfg which only gave me chairs as output

It was the cfg with 0 IOU and Recall and no detections.

The difference between them was: height weidth at first cfg=224 and at second 608

WTeichert · 2018-04-07T09:55:10Z

I just checked again difference between my cfg and tiny yolo voc.
I changed anchors, weidth and height, deleted comments,
and their are these lines:
steps= -1,100,20000,30000
scales=.1,10,.1,.1
I chose instead:
steps=100,25000,35000
scales=.1,.1,.1

Because I did not understood the -1

AlexeyAB · 2018-04-07T10:15:10Z

@WTeichert It's bad mAP result. Check your dataset using Yolo_mark.
And use these lines:

learning_rate=0.0001
max_batches = 45000
policy=steps
steps=100,25000,35000
scales=10,.1,.1

WTeichert · 2018-04-07T10:26:47Z

Ah found it, but I used PascalVoc Dataset, do I need to mark bounding boxes?

@AlexeyAB I ve done check. The labels are not correctly signed, like persons are chairs, cats are boats. Should be connected to the voc.names list, am I right?

But the bounding boxes are all right!

And that doesn t explain why detection doesnt work.
Should I train new network with these lines?
learning_rate=0.0001
max_batches = 45000
policy=steps
steps=100,25000,35000
scales=10,.1,.1

That would be sad... to not know, why it doesn t work and just try again...

WTeichert · 2018-04-08T07:20:31Z

These are the results of yolov2-tiny-voc weights ... there should be an error somewhere else.
How does mAP depends on names list? Could the order of names cause this error?

If I compare the labels folder of voc labels with the voc.names file there are changes.
5 and 8 should be dog and person, while in voc.names that s are bus and chair

but when I validate tiny-voc with some example pictures, it is pretty good

AlexeyAB · 2018-04-08T10:53:32Z

Show your file obj.data
What command do you use for training?
What command do you use for calc mAP?

WTeichert · 2018-04-08T10:59:58Z

@AlexeyAB

darknet.exe detector train data/cfg1.data cfg/cfg1.cfg darknet19_448.conv.23
darknet.exe detector map data/cfg1.data cfg/cfg1.cfg cfg1_final.weights

Btw. why do i get complete different IOU and recall with commend: ... detector map ... or ... detector ... recall

WTeichert · 2018-04-12T06:54:41Z

@AlexeyAB
I tryed to train with changed learning rate and anchors to standard. But result is again 0 detections, mAP is 0 too.
Do you have used the windows method to train tiny-yolo-voc.cfg? Is your version different from the repo? How many iterations did you trained? Which command do you used for training?

AlexeyAB · 2018-04-12T10:36:29Z

@WTeichert I trained any models on both Windows and Linux using this repo. It works fine.

WTeichert · 2018-04-14T09:08:10Z

@AlexeyAB
I found a mistake in my train.txt. Now I gte 56,21% mAP for yolov2-tiny-voc

I tryed to train with 11 classes of VOC, so shortend class-list in voc-label.py and voc-names. Also set number of class to 11 in voc.data, cfg and last filter to 80. Again I get 0 mAP.

What was your average loss in training? I am always around 0,5 which seems pretty high.

AlexeyAB · 2018-04-14T09:44:36Z

@WTeichert About ~0.5

WTeichert · 2018-04-27T19:41:26Z

Ok, I found the problems. Was some mess with the voc.data and label.txt files.

But I am still wondering why the cfg of tiny-yolo-voc starts steps with -1. If you could explain me that fact, I won t ask anything anymore :D

AlexeyAB · 2018-04-27T20:08:53Z

step with -1 means that 1st scale 0.1 will be applied immediately.
It was left just for some experiments.

This is:

darknet/cfg/yolov2-tiny-voc.cfg

Lines 18 to 22 in 5e3dcb6

    
           learning_rate=0.001 
        
           max_batches = 40200 
        
           policy=steps 
        
           steps=-1,100,20000,30000 
        
           scales=.1,10,.1,.1

the same as reduced learning_rate and removed 1st steps/scales:

learning_rate=0.0001
max_batches = 40200
policy=steps
steps=100,20000,30000
scales=10,.1,.1

Because net.steps[i] > batch_num i.e. -1 > 0 then 1st scale is applied immediately:

darknet/src/network.c

Lines 94 to 101 in 5e3dcb6

    
           case STEPS: 
        
               rate = net.learning_rate; 
        
               for(i = 0; i < net.num_steps; ++i){ 
        
                   if(net.steps[i] > batch_num) return rate; 
        
                   rate *= net.scales[i]; 
        
                   //if(net.steps[i] > batch_num - 1 && net.scales[i] > 1) reset_momentum(net); 
        
               } 
        
               return rate;

WTeichert · 2018-06-20T07:30:02Z

Thank You so much for your help! My research is done and went all well.

The manipulation of the filter number per layer and the reduce of the resolution brings the best performances of a pi!
Models, Classes and Greyscale are not that easy to manipulate, dependend of dataset.
Random should be selected, Performance decrease is minimal.

AlexeyAB · 2018-06-20T10:24:56Z

@WTeichert Can you attach your result cfg-file?

WTeichert · 2018-06-20T10:59:37Z

Sorry, I lost the orginals at a system reset and have only the converted h5 files...

use VOC dataset
resoultion set to minium of 224
half the amount of filter per layer
reduce number of layer by one
set random =1

That were the results I found.
Performance on Pi increased from 4s to 1s per picture with a pretty good mAP
Mainfocus of my work were the decrease of fps.
Hope these information can help

here the h5 file with changed layernumber based on COCO
COCOh5.zip

here the h5 file with changed number of filter per layer based on COCO
COCOh5_2.zip

WTeichert closed this as completed Jun 20, 2018

AlexeyAB added the Solved The problem is solved using the correct settings label Jun 20, 2018

Raspberry Pi YOLO Training #289

Raspberry Pi YOLO Training #289

Comments

WTeichert commented Dec 2, 2017 • edited

AlexeyAB commented Dec 13, 2017

WTeichert commented Dec 16, 2017 • edited

AlexeyAB commented Dec 16, 2017 • edited

WTeichert commented Dec 16, 2017

AlexeyAB commented Dec 16, 2017 • edited

WTeichert commented Mar 29, 2018

AlexeyAB commented Mar 29, 2018

WTeichert commented Mar 29, 2018

AlexeyAB commented Mar 29, 2018

WTeichert commented Mar 29, 2018 • edited

AlexeyAB commented Mar 29, 2018

WTeichert commented Mar 29, 2018

AlexeyAB commented Mar 29, 2018

WTeichert commented Mar 30, 2018

AlexeyAB commented Mar 30, 2018

WTeichert commented Mar 30, 2018

WTeichert commented Apr 3, 2018

AlexeyAB commented Apr 3, 2018

WTeichert commented Apr 6, 2018

AlexeyAB commented Apr 6, 2018

WTeichert commented Apr 7, 2018

WTeichert commented Apr 7, 2018

AlexeyAB commented Apr 7, 2018

WTeichert commented Apr 7, 2018 • edited

WTeichert commented Apr 8, 2018 • edited

AlexeyAB commented Apr 8, 2018

WTeichert commented Apr 8, 2018 • edited

@AlexeyAB

WTeichert commented Apr 12, 2018

AlexeyAB commented Apr 12, 2018

WTeichert commented Apr 14, 2018

AlexeyAB commented Apr 14, 2018

WTeichert commented Apr 27, 2018

AlexeyAB commented Apr 27, 2018

WTeichert commented Jun 20, 2018

AlexeyAB commented Jun 20, 2018

WTeichert commented Jun 20, 2018

WTeichert commented Dec 2, 2017 •

edited

WTeichert commented Dec 16, 2017 •

edited

AlexeyAB commented Dec 16, 2017 •

edited

AlexeyAB commented Dec 16, 2017 •

edited

WTeichert commented Mar 29, 2018 •

edited

WTeichert commented Apr 7, 2018 •

edited

WTeichert commented Apr 8, 2018 •

edited

WTeichert commented Apr 8, 2018 •

edited