Hi InternVideo team,
If I want to use a pretrained stage 1 InternVideo2 model for zero-shot or linear-probing video understanding tasks, which of the classes under https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2/single_modality/models is the correct one?
Thank you!