About multi-camera Multi-shot test #12

sunxia233 · 2021-01-04T07:29:45Z

Thank you for your great job!
In Figure 3, you use the multi-shot feature fusion of multi-camera for testing. I would like to ask you for a feature fusion of a multi-camera. Do you exclude all the cameras of the same id that participate in the fusion in the gallery? Or just remove a camera?

angpo · 2021-01-04T10:22:05Z

Hi,

As we mentioned in the main paper, we do not leverage multi-camera information during the test phase. Instead, as done by recent works, each example of the query and the gallery set is represented by eight frames sampled from the corresponding tracklet.

So, Figure 3 refers to the procedure we adopt during training. Here, each example is obtained by sampling several frames depicting the same subject but from different cameras.

sunxia233 · 2021-01-04T10:49:49Z

I can make my understanding as clear as possible. I know some multi-shot methods in person re-id. They use a camera tracklet as a multi-query. What I understand is that you used the student network in the final test, using the single-shot method without using camera information. Figure 3 is to verify the impact of multi-camera multi-query on the test. But you mean that Figure 3 is not the input during testing but the input during training. Figure 3(d) result is a single-shot retrieval method that is always maintained? In that case, why not directly use the teacher network as the final test result?

angpo · 2021-01-04T16:12:50Z

I'm sorry, there was a misunderstanding before. I thought you meant another Figure (precisely Fig. 2, which depicts the procedure we adopted during training).

Yes, we always used the student net in the final test. To be clear, its input is "multi-shot" during training, while being single-shot and tracklet-based during evaluation. This was done to be fair and in line with the standard evaluation protocol.

In Figure 3d) we compare the performance of the student net (orange lines) and the teacher one (blue lines), again assuming a single-shot scenario during evaluation. The figure shows that the way we trained the student net (namely, by distilling multi-camera information) leads to huge improvements in terms of performance (and this holds for different architectures).

I hope this is clearer for you now (if not, please feel free to write again).

sunxia233 · 2021-01-05T01:35:38Z

Thank you for your patience. I understand the standard test prototype of single-shot. For example, for a query image a(1)(1), the first represents the category, and the second represents the camera, which represents category 1, camera 1 . The test method for scheme A is to remove category 1 and camera 2 in the gallery, and look for targets under different cameras. But for shceme B and C, I really don’t know how to exclude cameras of the same category in gallery, such as a(1)(1),b(1)(2),c(1)(3),d(1) (4) Participate in the feature fusion, are all categories 1 and cameras 1, 2, 3, and 4 in the gallery removed?

angpo · 2021-01-05T09:00:40Z

Thank you for your patience. I understand the standard test prototype of single-shot. For example, for a query image a(1)(1), the first represents the category, and the second represents the camera, which represents category 1, camera 1 . The test method for scheme A is to remove category 1 and camera 2 in the gallery, and look for targets under different cameras.

Yes, this is the one and only protocol we follow during evaluation, both for the teacher and the student. Even if the latter has been trained with multi-camera input, we switch to single-camera input at test time (i.e. a subset of images of the same tracklet).

But for shceme B and C, I really don’t know how to exclude cameras of the same category in gallery,

What do you mean with "scheme B and C"?

such as a(1)(1),b(1)(2),c(1)(3),d(1) (4) Participate in the feature fusion, are all categories 1 and cameras 1, 2, 3, and 4 in the gallery removed?

Let me remind you that our work deals with video re-identification. Therefore, we apply feature fusion on a single (video) example at a time, precisely for merging representations of several frames into a single one. So, we never apply feature fusion to different examples of the gallery set (nor of query set).

sunxia233 · 2021-01-11T13:14:12Z

I’m very sorry, because my webpage does not display the picture and did not look at the title of your paper carefully, which led me to think that you are the author of Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification. So forget my question, it is a misunderstanding.

angpo closed this as completed Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About multi-camera Multi-shot test #12

About multi-camera Multi-shot test #12

sunxia233 commented Jan 4, 2021

angpo commented Jan 4, 2021

sunxia233 commented Jan 4, 2021

angpo commented Jan 4, 2021

sunxia233 commented Jan 5, 2021

angpo commented Jan 5, 2021

sunxia233 commented Jan 11, 2021

About multi-camera Multi-shot test #12

About multi-camera Multi-shot test #12

Comments

sunxia233 commented Jan 4, 2021

angpo commented Jan 4, 2021

sunxia233 commented Jan 4, 2021

angpo commented Jan 4, 2021

sunxia233 commented Jan 5, 2021

angpo commented Jan 5, 2021

sunxia233 commented Jan 11, 2021