Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About multi-camera Multi-shot test #12

Closed
sunxia233 opened this issue Jan 4, 2021 · 6 comments
Closed

About multi-camera Multi-shot test #12

sunxia233 opened this issue Jan 4, 2021 · 6 comments

Comments

@sunxia233
Copy link

Thank you for your great job!
In Figure 3, you use the multi-shot feature fusion of multi-camera for testing. I would like to ask you for a feature fusion of a multi-camera. Do you exclude all the cameras of the same id that participate in the fusion in the gallery? Or just remove a camera?

@angpo
Copy link
Collaborator

angpo commented Jan 4, 2021

Hi,

As we mentioned in the main paper, we do not leverage multi-camera information during the test phase. Instead, as done by recent works, each example of the query and the gallery set is represented by eight frames sampled from the corresponding tracklet.

So, Figure 3 refers to the procedure we adopt during training. Here, each example is obtained by sampling several frames depicting the same subject but from different cameras.

@sunxia233
Copy link
Author

I can make my understanding as clear as possible. I know some multi-shot methods in person re-id. They use a camera tracklet as a multi-query. What I understand is that you used the student network in the final test, using the single-shot method without using camera information. Figure 3 is to verify the impact of multi-camera multi-query on the test. But you mean that Figure 3 is not the input during testing but the input during training. Figure 3(d) result is a single-shot retrieval method that is always maintained? In that case, why not directly use the teacher network as the final test result?

@angpo
Copy link
Collaborator

angpo commented Jan 4, 2021

I'm sorry, there was a misunderstanding before. I thought you meant another Figure (precisely Fig. 2, which depicts the procedure we adopted during training).

Yes, we always used the student net in the final test. To be clear, its input is "multi-shot" during training, while being single-shot and tracklet-based during evaluation. This was done to be fair and in line with the standard evaluation protocol.

In Figure 3d) we compare the performance of the student net (orange lines) and the teacher one (blue lines), again assuming a single-shot scenario during evaluation. The figure shows that the way we trained the student net (namely, by distilling multi-camera information) leads to huge improvements in terms of performance (and this holds for different architectures).

I hope this is clearer for you now (if not, please feel free to write again).

@sunxia233
Copy link
Author

Thank you for your patience. I understand the standard test prototype of single-shot. For example, for a query image a(1)(1), the first represents the category, and the second represents the camera, which represents category 1, camera 1 . The test method for scheme A is to remove category 1 and camera 2 in the gallery, and look for targets under different cameras. But for shceme B and C, I really don’t know how to exclude cameras of the same category in gallery, such as a(1)(1),b(1)(2),c(1)(3),d(1) (4) Participate in the feature fusion, are all categories 1 and cameras 1, 2, 3, and 4 in the gallery removed?

@angpo
Copy link
Collaborator

angpo commented Jan 5, 2021

Thank you for your patience. I understand the standard test prototype of single-shot. For example, for a query image a(1)(1), the first represents the category, and the second represents the camera, which represents category 1, camera 1 . The test method for scheme A is to remove category 1 and camera 2 in the gallery, and look for targets under different cameras.

Yes, this is the one and only protocol we follow during evaluation, both for the teacher and the student. Even if the latter has been trained with multi-camera input, we switch to single-camera input at test time (i.e. a subset of images of the same tracklet).

But for shceme B and C, I really don’t know how to exclude cameras of the same category in gallery,

What do you mean with "scheme B and C"?

such as a(1)(1),b(1)(2),c(1)(3),d(1) (4) Participate in the feature fusion, are all categories 1 and cameras 1, 2, 3, and 4 in the gallery removed?

Let me remind you that our work deals with video re-identification. Therefore, we apply feature fusion on a single (video) example at a time, precisely for merging representations of several frames into a single one. So, we never apply feature fusion to different examples of the gallery set (nor of query set).

@angpo angpo closed this as completed Jan 7, 2021
@sunxia233
Copy link
Author

I’m very sorry, because my webpage does not display the picture and did not look at the title of your paper carefully, which led me to think that you are the author of Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification. So forget my question, it is a misunderstanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants