Deeper Training Questions/Theory pt. 2 #380

SJRogue · 2018-02-07T22:43:45Z

[BEFORE READING: Hoping these questions are not stupid in your eyes.. These are not very technical discussions, just a bit.. this is more high-level theorem posure not low-level pro-coding/pro-tech.. if there is a pt. 3 I will go more low level.. I still think everybody can benefit from these questions on a more abstract level]

Hello Alexey, TheMikeyR,anybody else involved enough..

I have not had the time (between work and college) to implement or solve the problems in pt. 1 of this topic... I can not give feedback in 1 day. I have a timeframe so for 3-4 months I will possibly shoot questions before I can verify answers on my previous topics (but I will).. after that I can do it all step by step with feedback to you, for now I ask ask ask.. and try to contribute by asking.. : )

So.. last topic I said I would ask, so I'm asking:

I saw this was asked before but was left unanswered (I lost the topic title, I'm sorry).. I'll try to formulate the questions short and clear:

Okay.. for this episode:
Capture resolution, marking resolution (Yolo_mark app found on Alexey's git) & .cfg-resolution.
How do they relate ? I've seen the huge mathematical explanations, is there a high level, less mathematical explanation? I have the capacity, not the time, to get into the math in these posts.

So 👍 ..

Question 1 - CAPTURE STAGE:

----> If possible keep everything in mind from the 1st post with the same title (same model .. same environment).

Is there a problem if I capture Video in 3840x2160 and use the technique explained in the repo Yolo_mark to slice for images? What will the images be like in resolution after slice? Should I rather capture in 1920 x 1080 and what resolution will the sliced images be ? Should I capture in a different resolution? Should I film square or doesn't it mather what shape the capture resolution has?

Question 2 - MARKING STAGE:

----> If possible keep everything in mind from the 1st post with the same title (same model .. same environment).

When we start the Yolo_mark app, we can choose resolution for the marking app (forgot where exactly but doesn't matter). How does this resolution relate to the resolution answered in Question 1? Do I take the same resolution, or is the same proportion/shape enough and do I go for lower res so that the training can go faster?

Question 3 - TRAINING STAGE:

----> If possible keep everything in mind from the 1st post with the same title (same model .. same environment).

So, after that, the resolutions precised in the top of the CFG file for training.. how do they relate to everything before this? Do I go for the resolution used for marking? Do I go back to the resolution used in CAPTURE STAGE (video res or generated sliced images res [if these are different]?)? Do I go for something different because there is no relation between optimal detection res and optimal capture res and optimal marking res?

Question 4

I will not start question 4 or this post will be too elaborate. I'll pose this in pt. 3 of this topic. It's about detection output capture (this will be more low-level coding-stuff).

AlexeyAB · 2018-02-07T23:18:06Z

If we take into account the conditions from the first post, then: #377

Yolo_mark will slice the video to the frames with the same resolution as in video. When you will label objects on these frames - labels will be float-point values (0.0 - 1.0), where x and width relative to the image_width and y and height relarive to the image_height. So after any resizing of image, the relative values will remain the same: https://github.com/AlexeyAB/Yolo_mark#how-to-get-frames-from-videofile
Your resolution in the training and detection dataset can be any value, but should be the same (for these conditions ).
Your network resolution (at the top of cfg-file) should be any value multiple of 32, and it is very desirable to be square, but may differ from the resolution of the training/detection dataset. For example, can be: training/detection dataset 1920x1080, and network resolution 832x832.

TheMikeyR · 2018-02-08T08:20:18Z

I will add some extra details to @AlexeyAB answer on question 3.

From my understandings (please correct me if I'm wrong) the network resolution (top of cfg-file) is what your dataset will be scaled to. When you train the image, it will be scaled into the resolution (in the cfg) before it is fed into the network, and then trained on that. This is also the reason why the detections must be formatted as a ratio as explained in @AlexeyAB answer to question 1, so no matter what your network resolution is, the detections will be correct (the network resolution, should still be a multiple of 32 and preferable squared).

Furthermore the original darknet (pjreddie's) will keep aspect ratio (introduce black borders), and this fork (AlexeyAB's) will not preserve the aspect ratio of the image. You can see a nice example and explaniation here #232 (comment)

SJRogue · 2018-02-08T10:30:38Z

Thank you both for clearing this up. Very helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deeper Training Questions/Theory pt. 2 #380

Deeper Training Questions/Theory pt. 2 #380

SJRogue commented Feb 7, 2018 •

edited

Loading

AlexeyAB commented Feb 7, 2018

TheMikeyR commented Feb 8, 2018

SJRogue commented Feb 8, 2018

Deeper Training Questions/Theory pt. 2 #380

Deeper Training Questions/Theory pt. 2 #380

Comments

SJRogue commented Feb 7, 2018 • edited Loading

AlexeyAB commented Feb 7, 2018

TheMikeyR commented Feb 8, 2018

SJRogue commented Feb 8, 2018

SJRogue commented Feb 7, 2018 •

edited

Loading