finetune stage2 of Internvideo2 with num_frames 12 error

When I try to finetune stage2 of Internvideo2 with num_frames 12, I meet the error below:
```python
[rank0]:   File "/root/nginx/multi_modality/tasks/shared_utils.py", line 192, in setup_model
[rank0]:     msg = model_without_ddp.load_state_dict(state_dict, strict=False)
[rank0]:   File "/usr/local/Python3.8.12/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
[rank0]:     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
[rank0]: RuntimeError: Error(s) in loading state_dict for InternVideo2_Stage2:
[rank0]:        size mismatch for vision_encoder.pos_embed: copying a param with shape torch.Size([1, 1025, 1408]) from checkpoint, the shape in current model is torch.Size([1, 3073, 1408]).
```


How to solve it? Looking forward to your reply. Thanks.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finetune stage2 of Internvideo2 with num_frames 12 error #271

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

finetune stage2 of Internvideo2 with num_frames 12 error #271

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions