Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some basic questions about the paper #51

Closed
serkansulun opened this issue May 13, 2017 · 6 comments
Closed

Some basic questions about the paper #51

serkansulun opened this issue May 13, 2017 · 6 comments

Comments

@serkansulun
Copy link

Hi David,
I'm a masters student wanting to work on deep networks for tracking and I'm trying to implement your paper for Julia language. However I didn't understand some parts:

1- For ALOV300+, we have the ground-truth bounding box for the frames 1, 6, 11 and so on. We need 2 annotated frames so suppose we are using frames 6 and 11. What I assumed was, for a single forward run in the network, we crop twice the area around the GT box corresponding to frame 6, in frames 6 and 7 (which construct the inputs of the network), make a prediction for bounding box location in frame 7 (output), using this prediction like a new ground-truth, repeat for frames 7 and 8, all the way to frame 11 and calculate the loss between prediction and ground-truth boxes for frame 11. Is this correct?

2- How about data augmentation? On which frames do we take random crops? What about the ground-truth labels for these random crops? This is the part where I'm really lost. Same for imagenet dataset.

3- How does the network deal with varying size input (because ground-truth bounding box sizes aren't constant), since the fully connected layer sizes are fixed?

Appreciate your help.
Serkan

@serkansulun
Copy link
Author

I might have discovered some things, looking at the code, please tell me if I'm wrong somewhere.

1- We are only using annotated frames, eg. only frame 6 and 11. Frame 6 becomes previous, frame 11 becomes current.

2- In augmentation we shift and scale the ground-truth box in the current frame, and crop twice the area around the shifted box, only to define a new synthetic search region. While getting labels (coordinates relative to the new search region), we still use the not-shifted ground-truth box coordinates.

Are these correct?

I'm still not clear about question 3.

Thanks

@davheld
Copy link
Owner

davheld commented May 14, 2017

1-2) That is correct

  1. After figuring out what part of the image we want to crop, we crop that region, and then we resize the cropped region to the fixed-size input for the network (227 x 227).

@davheld davheld closed this as completed May 14, 2017
@serkansulun
Copy link
Author

Thanks for the quick answer.

  1. Do we do it by matching the shorter side to 227 and zero padding the remaining area? Do the ground-truth bounding box coordinates stay the same?
    Thanks

@davheld
Copy link
Owner

davheld commented May 15, 2017

We do it by resizing both sides to 227x227. You can try the padding approach if you wish. The ground-truth bounding box coordinates are resized accordingly.

@serkansulun
Copy link
Author

Hi again David,
To test my implementation, I want to use accuracy and robustness errors, instead of ranking. In your paper I couldn't find a direct reference to these metrics but I have some idea, so please let me know if I'm wrong somewhere.
1- Accuracy is simply intersection-over-union value, between ground-truth and predicted boxes.
2- When the accuracy becomes zero (no intersection), the tracker is reinitialized, meaning the ground-truth box is provided for the next frame. Or is it reinitialized n frames later?
3- Robustness error is number of reinitializations divided by number of frames.
4- Are these two measures averaged first over frames (in a single sequence) then over all sequences; or over all frames in all sequences at once?
Thanks in advance

@davheld
Copy link
Owner

davheld commented Aug 11, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants