Some basic questions about the paper #51

serkansulun · 2017-05-13T18:43:16Z

Hi David,
I'm a masters student wanting to work on deep networks for tracking and I'm trying to implement your paper for Julia language. However I didn't understand some parts:

1- For ALOV300+, we have the ground-truth bounding box for the frames 1, 6, 11 and so on. We need 2 annotated frames so suppose we are using frames 6 and 11. What I assumed was, for a single forward run in the network, we crop twice the area around the GT box corresponding to frame 6, in frames 6 and 7 (which construct the inputs of the network), make a prediction for bounding box location in frame 7 (output), using this prediction like a new ground-truth, repeat for frames 7 and 8, all the way to frame 11 and calculate the loss between prediction and ground-truth boxes for frame 11. Is this correct?

2- How about data augmentation? On which frames do we take random crops? What about the ground-truth labels for these random crops? This is the part where I'm really lost. Same for imagenet dataset.

3- How does the network deal with varying size input (because ground-truth bounding box sizes aren't constant), since the fully connected layer sizes are fixed?

Appreciate your help.
Serkan

serkansulun · 2017-05-13T22:22:24Z

I might have discovered some things, looking at the code, please tell me if I'm wrong somewhere.

1- We are only using annotated frames, eg. only frame 6 and 11. Frame 6 becomes previous, frame 11 becomes current.

2- In augmentation we shift and scale the ground-truth box in the current frame, and crop twice the area around the shifted box, only to define a new synthetic search region. While getting labels (coordinates relative to the new search region), we still use the not-shifted ground-truth box coordinates.

Are these correct?

I'm still not clear about question 3.

Thanks

davheld · 2017-05-14T04:51:03Z

1-2) That is correct

After figuring out what part of the image we want to crop, we crop that region, and then we resize the cropped region to the fixed-size input for the network (227 x 227).

serkansulun · 2017-05-15T13:50:12Z

Thanks for the quick answer.

Do we do it by matching the shorter side to 227 and zero padding the remaining area? Do the ground-truth bounding box coordinates stay the same?
Thanks

davheld · 2017-05-15T15:48:21Z

We do it by resizing both sides to 227x227. You can try the padding approach if you wish. The ground-truth bounding box coordinates are resized accordingly.

serkansulun · 2017-08-10T14:04:57Z

Hi again David,
To test my implementation, I want to use accuracy and robustness errors, instead of ranking. In your paper I couldn't find a direct reference to these metrics but I have some idea, so please let me know if I'm wrong somewhere.
1- Accuracy is simply intersection-over-union value, between ground-truth and predicted boxes.
2- When the accuracy becomes zero (no intersection), the tracker is reinitialized, meaning the ground-truth box is provided for the next frame. Or is it reinitialized n frames later?
3- Robustness error is number of reinitializations divided by number of frames.
4- Are these two measures averaged first over frames (in a single sequence) then over all sequences; or over all frames in all sequences at once?
Thanks in advance

davheld · 2017-08-11T04:09:11Z

To test my implementation, I want to use accuracy and robustness errors, instead of ranking. In your paper I couldn't find a direct reference to these metrics but I have some idea, so please let me know if I'm wrong somewhere.

See Table 1 http://davheld.github.io/GOTURN/GOTURN.pdf For the other questions - I'm not sure, sorry. You'd have to check with VOT or look at their code Best of luck!.

…

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#51 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEHoHJyjZwa-RXK85tZXfBX9qn4iMsbWks5sWw4JgaJpZM4NaKBW> .

davheld closed this as completed May 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some basic questions about the paper #51

Some basic questions about the paper #51

serkansulun commented May 13, 2017

serkansulun commented May 13, 2017

davheld commented May 14, 2017

serkansulun commented May 15, 2017

davheld commented May 15, 2017

serkansulun commented Aug 10, 2017

davheld commented Aug 11, 2017 via email

Some basic questions about the paper #51

Some basic questions about the paper #51

Comments

serkansulun commented May 13, 2017

serkansulun commented May 13, 2017

davheld commented May 14, 2017

serkansulun commented May 15, 2017

davheld commented May 15, 2017

serkansulun commented Aug 10, 2017

davheld commented Aug 11, 2017 via email