-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some basic questions about the paper #51
Comments
I might have discovered some things, looking at the code, please tell me if I'm wrong somewhere. 1- We are only using annotated frames, eg. only frame 6 and 11. Frame 6 becomes previous, frame 11 becomes current. 2- In augmentation we shift and scale the ground-truth box in the current frame, and crop twice the area around the shifted box, only to define a new synthetic search region. While getting labels (coordinates relative to the new search region), we still use the not-shifted ground-truth box coordinates. Are these correct? I'm still not clear about question 3. Thanks |
1-2) That is correct
|
Thanks for the quick answer.
|
We do it by resizing both sides to 227x227. You can try the padding approach if you wish. The ground-truth bounding box coordinates are resized accordingly. |
Hi again David, |
To test my implementation, I want to use accuracy and robustness errors,
instead of ranking. In your paper I couldn't find a direct reference to
these metrics but I have some idea, so please let me know if I'm wrong
somewhere.
See Table 1
http://davheld.github.io/GOTURN/GOTURN.pdf
For the other questions - I'm not sure, sorry. You'd have to check with
VOT or look at their code
Best of luck!.
… —
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#51 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEHoHJyjZwa-RXK85tZXfBX9qn4iMsbWks5sWw4JgaJpZM4NaKBW>
.
|
Hi David,
I'm a masters student wanting to work on deep networks for tracking and I'm trying to implement your paper for Julia language. However I didn't understand some parts:
1- For ALOV300+, we have the ground-truth bounding box for the frames 1, 6, 11 and so on. We need 2 annotated frames so suppose we are using frames 6 and 11. What I assumed was, for a single forward run in the network, we crop twice the area around the GT box corresponding to frame 6, in frames 6 and 7 (which construct the inputs of the network), make a prediction for bounding box location in frame 7 (output), using this prediction like a new ground-truth, repeat for frames 7 and 8, all the way to frame 11 and calculate the loss between prediction and ground-truth boxes for frame 11. Is this correct?
2- How about data augmentation? On which frames do we take random crops? What about the ground-truth labels for these random crops? This is the part where I'm really lost. Same for imagenet dataset.
3- How does the network deal with varying size input (because ground-truth bounding box sizes aren't constant), since the fully connected layer sizes are fixed?
Appreciate your help.
Serkan
The text was updated successfully, but these errors were encountered: