Thank you for the excellent work on this model. I have a question regarding the training process for handling videos of varying lengths. Could you please clarify the distribution of video durations within your training dataset?
- Since the model need to generate different length of videos, I'm interested in the proportion of videos with different frame counts.
- Additionally, was a uniform learning rate applied across all training samples, or was it adjusted based on the video length?
- Lastly, could you share the learning rate and the training steps?
Thanks in advance!