Questions about Pipeline for Frame Captioning and Visaul Tokenization #10

qzhb · 2023-05-12T04:00:48Z

When I use the run_frame_captioning_and_visual_tokenization.sh to extract visual tokenization and frame captioning for my own dataset, I meet the following issue under run_video_CapFilt.py file:

File "/extract_frame_concepts/models/med.py", line 178, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (12) must match the size of tensor b (36) at non-singleton dimension 0

Is this because I did something wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about Pipeline for Frame Captioning and Visaul Tokenization #10

Questions about Pipeline for Frame Captioning and Visaul Tokenization #10

qzhb commented May 12, 2023

Questions about Pipeline for Frame Captioning and Visaul Tokenization #10

Questions about Pipeline for Frame Captioning and Visaul Tokenization #10

Comments

qzhb commented May 12, 2023