-
Notifications
You must be signed in to change notification settings - Fork 25
Description
First of all, congratulations on this paper. It was a very interesting read. However, I think technically MinVIS can not be considered an online method. You are processing each frame separately but an online method must not include information of future frames for the decision making on the current frame. In this line
MinVIS/minvis/video_maskformer_model.py
Line 308 in 3038871
| out_logits = sum(out_logits)/len(out_logits) |
you compute mean scores for each query and class across the entire sequence. These scores are later used for the topk selection of final outputs. While your frame processing might be online, the utilization of information of all frames at once means your decision making is not. Please clarify what I might be misunderstanding or your point of view on the matter. Thank you!