-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deeper Training Questions/Theory pt. 2 #380
Comments
If we take into account the conditions from the first post, then: #377
|
I will add some extra details to @AlexeyAB answer on question 3. From my understandings (please correct me if I'm wrong) the network resolution (top of cfg-file) is what your dataset will be scaled to. When you train the image, it will be scaled into the resolution (in the cfg) before it is fed into the network, and then trained on that. This is also the reason why the detections must be formatted as a ratio as explained in @AlexeyAB answer to question 1, so no matter what your network resolution is, the detections will be correct (the network resolution, should still be a multiple of 32 and preferable squared). Furthermore the original darknet (pjreddie's) will keep aspect ratio (introduce black borders), and this fork (AlexeyAB's) will not preserve the aspect ratio of the image. You can see a nice example and explaniation here #232 (comment) |
Thank you both for clearing this up. Very helpful. |
[BEFORE READING: Hoping these questions are not stupid in your eyes.. These are not very technical discussions, just a bit.. this is more high-level theorem posure not low-level pro-coding/pro-tech.. if there is a pt. 3 I will go more low level.. I still think everybody can benefit from these questions on a more abstract level]
Hello Alexey, TheMikeyR,anybody else involved enough..
I have not had the time (between work and college) to implement or solve the problems in pt. 1 of this topic... I can not give feedback in 1 day. I have a timeframe so for 3-4 months I will possibly shoot questions before I can verify answers on my previous topics (but I will).. after that I can do it all step by step with feedback to you, for now I ask ask ask.. and try to contribute by asking.. : )
So.. last topic I said I would ask, so I'm asking:
I saw this was asked before but was left unanswered (I lost the topic title, I'm sorry).. I'll try to formulate the questions short and clear:
Okay.. for this episode:
Capture resolution, marking resolution (Yolo_mark app found on Alexey's git) & .cfg-resolution.
How do they relate ? I've seen the huge mathematical explanations, is there a high level, less mathematical explanation? I have the capacity, not the time, to get into the math in these posts.
So 👍 ..
Question 1 - CAPTURE STAGE:
----> If possible keep everything in mind from the 1st post with the same title (same model .. same environment).
Is there a problem if I capture Video in 3840x2160 and use the technique explained in the repo Yolo_mark to slice for images? What will the images be like in resolution after slice? Should I rather capture in 1920 x 1080 and what resolution will the sliced images be ? Should I capture in a different resolution? Should I film square or doesn't it mather what shape the capture resolution has?
Question 2 - MARKING STAGE:
----> If possible keep everything in mind from the 1st post with the same title (same model .. same environment).
When we start the Yolo_mark app, we can choose resolution for the marking app (forgot where exactly but doesn't matter). How does this resolution relate to the resolution answered in Question 1? Do I take the same resolution, or is the same proportion/shape enough and do I go for lower res so that the training can go faster?
Question 3 - TRAINING STAGE:
----> If possible keep everything in mind from the 1st post with the same title (same model .. same environment).
So, after that, the resolutions precised in the top of the CFG file for training.. how do they relate to everything before this? Do I go for the resolution used for marking? Do I go back to the resolution used in CAPTURE STAGE (video res or generated sliced images res [if these are different]?)? Do I go for something different because there is no relation between optimal detection res and optimal capture res and optimal marking res?
Question 4
I will not start question 4 or this post will be too elaborate. I'll pose this in pt. 3 of this topic. It's about detection output capture (this will be more low-level coding-stuff).
The text was updated successfully, but these errors were encountered: