Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The network size, image size and object size #3215

Open
trannhutle opened this issue May 23, 2019 · 31 comments
Open

The network size, image size and object size #3215

trannhutle opened this issue May 23, 2019 · 31 comments

Comments

@trannhutle
Copy link

Hi @AlexeyAB

My testing and training image size is 320 x 240 px. Because of the limitation on the computing on the processor (Atom E3845 - Quad core - 1.91 GHz), i have to reduce the network size to 160 x 160 to increase the detection time. I use the tiny-yolo configuration for my network, would it affect the accuracy of the training model ?

Thank you so much!!!
I am a new one on YOLO. If you need more information about this question. please leave the comment.

Thank you so much

@trannhutle
Copy link
Author

Hi @AlexeyAB !!!
Could you please to show me, how to reduce the detection time and keep maintain the accuracy of the model.
Thank you so much!!!

@AlexeyAB
Copy link
Owner

@trannhutle Hi,

You can use width=320 height=224 in yolov3-tiny.cfg to achive high speed without accuracy drop.
If you use random=1 in cfg-file, then you should use only this repository for Training, and any for Detection.


If you use width=160 height=160 then it will lead to slightly loss of accuracy.

@trannhutle
Copy link
Author

@AlexeyAB Hi Alexey,

Thank you so much for your answer. It increase the accuracy of the model so much!!!

  1. Because the limitation on the processor, for detection, the network resolution that I could set would be width 192 and height: 192. Could you give me some advice for setting configuration training and detect without dropping in detection accuracy ?

  2. Does the image resolution for the training have to be bigger or same size with the network resolution?

  3. Do we have to maintain the image resolution for both training and testing ?

Thank you so much for your help!!!

@trannhutle
Copy link
Author

@AlexeyAB Hi Alexey,
For increase the time for recognition, when i build the libdarknet.so, on the Makefile i change the AVX = 1. It does not work, do you know how to fix it?

Thank you so much !

@trannhutle
Copy link
Author

@AlexeyAB Hi Alexey,

I used your modify cfg file (Tiny-model: 3 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny_3l.cfg). The result is so amazing, but it costs over 5 secs to detect objects. Could you please teach me how to change the cfg file to improve the calculation time and avoid accuracy reduction. Thank you so much!!!! Alexey!!!

@AlexeyAB
Copy link
Owner

@trannhutle Hi,

  1. To speedup Detection on CPU set OPENMP=1 or OPENMP=1 AVX=1 in the Makefile.

  2. Try to train with width 192 and height: 160

For increase the time for recognition, when i build the libdarknet.so, on the Makefile i change the AVX = 1. It does not work, do you know how to fix it?

Can you show screenshot?
What CPU do you use?

@trannhutle
Copy link
Author

trannhutle commented Jun 1, 2019

@AlexeyAB Hi Alexey,

  1. I did try with 'width 192 and height: 160' on training and it work really well, Thank you so much.

  2. About the detection, I think it would be better when we setup the network resolution to

width 224 and height 192 on detection.

  1. About

To speedup Detection on CPU set OPENMP=1 or OPENMP=1 AVX=1 in the Makefile.

There is no error when i build with 'OPENMP=1', but with the 'OPENMP=1 AVX=1', there is no error on build the so lib. However, when i initialize the network, it show Try to load cfg: ./config/cfg/test_so.cfg, weights: ./config/weights/test_so.weights, clear = 0 Illegal instruction

  1. About

What CPU do you use?

This is my CPU : (Atom E3845 - Quad core - 1.91 GHz)

Thank you so much!!!

@trannhutle
Copy link
Author

Hi @AlexeyAB ,

I do not acctualy understand the meaning of Network Resolution, could you please give me some document to understand about that. Thank you so much!!!

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 1, 2019

@trannhutle Hi,
width= and height= in cfg-file is a network resolution

Atom E3845 doesn't have AVX2, since it is old CPU: https://ark.intel.com/content/www/ru/ru/ark/products/78475/intel-atom-processor-e3845-2m-cache-1-91-ghz.html

So you should compile with OPENMP=1 AVX=0

@trannhutle
Copy link
Author

Hi @AlexeyAB ,

I have changed the configuration and It does work really well.

I have another question, is the background (except from the bounding box) from the images in training data set, affect learning of YOLO ? Or the learning is affected by the region inside of the bounding box ?
About the overexposed and underexposed images on the detection image, how could we train the model (including capturing the images) to deal with overexposed and underexposed on the image ?

What if the network just learn the objects with the same color ? like (apple, cucumber, avocado, green capsicum, ...) How could we deal with those kind of problems?

Thank you so much for your strong support!!!

@trannhutle
Copy link
Author

HI @AlexeyAB ,

Why do i train the Tiny Yolo for 4 objects, data set for every object around 160 images the accuracy is very low, while I train with the same configuration for 14 objects It work better?

What does the factor affect to the training model?

@trannhutle
Copy link
Author

Hi @AlexeyAB,

About your comment on this, #3001 (comment), Does the background affect training, even though it does not include the objects ?

@AlexeyAB
Copy link
Owner

I have another question, is the background (except from the bounding box) from the images in training data set, affect learning of YOLO ? Or the learning is affected by the region inside of the bounding box ?

Background from the images affects learning Yolo.

About the overexposed and underexposed images on the detection image, how could we train the model (including capturing the images) to deal with overexposed and underexposed on the image ?

Use data augmentation, set exposure=3.0 in cfg: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-%5Bnet%5D-section

What if the network just learn the objects with the same color ? like (apple, cucumber, avocado, green capsicum, ...) How could we deal with those kind of problems?

What is the problem?

About your comment on this, #3001 (comment), Does the background affect training, even though it does not include the objects ?

Yes.

@trannhutle
Copy link
Author

Hi @AlexeyAB,

What i would like to do next is capturing the background images and i will crop the objects on different angles and location. Next I would apply those cropped objects into different background. Does it help to improve the accuracy of training the network ?

Thank for your quick response!!!

@trannhutle
Copy link
Author

@AlexeyAB,

Although I know that the reflection of the color from the background affect the object, does applying the cropped objects increase the training and detecting?
Someone says It would not help the network learn more feature about the object. Could you please give me some idea about that ?

Thank you so much Alexey!

@AlexeyAB
Copy link
Owner

Although I know that the reflection of the color from the background affect the object, does applying the cropped objects increase the training and detecting?

No (in this case).

@AlexeyAB
Copy link
Owner

Next I would apply those cropped objects into different background. Does it help to improve the accuracy of training the network ?

It can improve accuracy.

@trannhutle
Copy link
Author

Although I know that the reflection of the color from the background affect the object, does applying the cropped objects increase the training and detecting?

No (in this case).

In this case you mean that increasing the training time and detecting time or what ? I do not very much understand ?

Thank you so much Alexey!!!

@trannhutle
Copy link
Author

Hi @AlexeyAB ,

About the function in image.c
void draw_detections(image im, int num, float thresh, box *boxes, float **probs, char **names, image **alphabet, int classes),
How could use it in python ? Because now when i get the detection result, i draw the bounding box is so bad ?

If we could use it on Python, how could i use it ? What parameters do i have to pass on ?

Thank you so much @AlexeyAB

@AlexeyAB
Copy link
Owner

@trannhutle

Use this in Python:

darknet/darknet_video.py

Lines 18 to 33 in c9129c2

def cvDrawBoxes(detections, img):
for detection in detections:
x, y, w, h = detection[2][0],\
detection[2][1],\
detection[2][2],\
detection[2][3]
xmin, ymin, xmax, ymax = convertBack(
float(x), float(y), float(w), float(h))
pt1 = (xmin, ymin)
pt2 = (xmax, ymax)
cv2.rectangle(img, pt1, pt2, (0, 255, 0), 1)
cv2.putText(img,
detection[0].decode() +
" [" + str(round(detection[1] * 100, 2)) + "]",
(pt1[0], pt1[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
[0, 255, 0], 2)

Or this:

darknet/darknet.py

Lines 413 to 424 in c9129c2

# Wiggle it around to make a 3px border
rr, cc = draw.polygon_perimeter([x[1] for x in boundingBox], [x[0] for x in boundingBox], shape= shape)
rr2, cc2 = draw.polygon_perimeter([x[1] + 1 for x in boundingBox], [x[0] for x in boundingBox], shape= shape)
rr3, cc3 = draw.polygon_perimeter([x[1] - 1 for x in boundingBox], [x[0] for x in boundingBox], shape= shape)
rr4, cc4 = draw.polygon_perimeter([x[1] for x in boundingBox], [x[0] + 1 for x in boundingBox], shape= shape)
rr5, cc5 = draw.polygon_perimeter([x[1] for x in boundingBox], [x[0] - 1 for x in boundingBox], shape= shape)
boxColor = (int(255 * (1 - (confidence ** 2))), int(255 * (confidence ** 2)), 0)
draw.set_color(image, (rr, cc), boxColor, alpha= 0.8)
draw.set_color(image, (rr2, cc2), boxColor, alpha= 0.8)
draw.set_color(image, (rr3, cc3), boxColor, alpha= 0.8)
draw.set_color(image, (rr4, cc4), boxColor, alpha= 0.8)
draw.set_color(image, (rr5, cc5), boxColor, alpha= 0.8)

@isgursoy
Copy link

Next I would apply those cropped objects into different background. Does it help to improve the accuracy of training the network ?

It can improve accuracy.

@AlexeyAB
So cropping positive rectangle and putting it randomly on different background does not hurt accuracy?
There will be strong borders and region in the box will be totally different than outside. It will allow us to reduce labeling errors but I am not sure if this is beneficial.
What if there are many annotations? Or what if I leave some padding inside box before moving to a new background?

For example, we use a pseudo labeler to detect detectable objects and putting them on random or its own clean bg and there are claims in team that this hurts accuracy.

@AlexeyAB
Copy link
Owner

@isgursoy

Can you show examples?

Cropped objects that are inserted in another image increases accuracy - is known as CutMix: https://arxiv.org/pdf/1905.04899v2.pdf

image


Also read:
#4264

@isgursoy
Copy link

We will be back with examples from our case in few hours. Thanks for your time.

@ekarabulut
Copy link

ekarabulut commented Nov 25, 2019

In addition to isgursoy's post:

Putting cropped object to a different background improves the model? By cropping an object we mean to take the object from its original background by its bounding box, we don’t mean a technique like CutMix. In case of a human detection problem, we mean cropping the entire human object and putting it to a different background. My question is about three cases:

  1. Does it improve the model to put the cropped human to a completely different background?
    image

  2. We automatically detected humans and labelled them in a pseudo way. Then we cropped them and located the detected boxes back onto a specified general background that is slightly different from its original background. Does it affect accuracy?
    image

  3. Original image sizes may be different from the network size. For example, image size can be 512x512 (square) while the network size can be 416x416 (square) and they are proportional. What if the image size is rectangle and network size is square or vice versa? Does it affect the accuracy?

@AlexeyAB
Copy link
Owner

@ekarabulut

  1. If we believe the results of the article https://arxiv.org/pdf/1905.04899v2.pdf , yes, it increases accuracy.

  2. Yes, it increases accuracy

  3. If network size 416x416 and image size was 640x480 during both Training and Detection - then this is normal.
    It’s bad when objects have different aspect ratios during training and detection, after that image is resized to the network size, for example, training image 1000x100, while detection image 100x1000

@isgursoy
Copy link

isgursoy commented Nov 25, 2019

in addition to @ekarabulut 's post:
1-2) @AlexeyAB Strong gradient makes me think. May model learn borders and wants to use this trick? In my opinion, small padding for positive makes me feel better.
3) Images in varying sizes in many aspect ratios.

  • a) Square model: Distort 16:9 validation set by resizing it to a square and use the knowledge model learned by distorting many 16:9 and 4:3 images to a square. The train set mostly contains 16:9.
  • b) 16:9 model: See 16:9 samples as they are, distort 4:3 samples (minority) to 16:9 in training. The validation set is in 16:9.
    Option B looks better but we can use ~300px for training height in 16:9 model, instead ~500 px in the square. Also, we can only use small batches in training because of GPU memory.

@ekarabulut
Copy link

ekarabulut commented Nov 25, 2019

@AlexeyAB First off, thanks for the quick reply.

In CutMix, a part of bounding box (e.g. human's leg) is inserted into another bounding box. In the above example image (1), whole bounding box is put into another background (e.g. context is removed or replaced for the box). Is your comment still valid for this situation?

@AlexeyAB
Copy link
Owner

@ekarabulut

It depends on your task.

In general it improves accuracy, like any variety improves accuracy.

But to be more precise, in your training dataset:

  • there should be images as similar as possible to the ones you will use for Detection
  • and there should not be those that you will not use for Detection

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more

@isgursoy
Copy link

isgursoy commented Nov 25, 2019

@isgursoy @ekarabulut

1-2) @AlexeyAB Strong gradient makes me think. May model learn borders and want to use this trick? In my opinion, small padding for positive makes me feel better.
3) Images in varying sizes in many aspect ratios.

Yes, a model can simply be overfitted to boundaries (Strong gradient), in the end it will just look for sharp boundaries instead of the objects themselves - it will degrade accuracy.

May be later I will add something like this with Blending by using Pyramids (if OPENCV=1): https://docs.opencv.org/master/dc/dff/tutorial_py_pyramids.html
orapple

I added this issue: #4378

About different aspect-ratios there are pros and cons for different resize approaches: #232 (comment)

What do you think about leave some padding after a positive box in crop and move? What changes in this case in your opinion?

@AlexeyAB
Copy link
Owner

@isgursoy @ekarabulut

1-2) @AlexeyAB Strong gradient makes me think. May model learn borders and want to use this trick? In my opinion, small padding for positive makes me feel better.
3) Images in varying sizes in many aspect ratios.

Yes, a model can simply be overfitted to boundaries (Strong gradient), in the end it will just look for sharp boundaries instead of the objects themselves - it will degrade accuracy.

May be later I will add something like this with Blending by using Pyramids (if OPENCV=1): https://docs.opencv.org/master/dc/dff/tutorial_py_pyramids.html
orapple

I added this issue: #4378


About different aspect-ratios there are pros and cons for different resize approaches: #232 (comment)

@isgursoy
Copy link

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants