Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while training on Vocaset #40

Closed
ujjawalcse opened this issue Sep 23, 2022 · 5 comments
Closed

Issue while training on Vocaset #40

ujjawalcse opened this issue Sep 23, 2022 · 5 comments

Comments

@ujjawalcse
Copy link

Hey @EvelynFan ,
Thanks for this awesome repo.
I'm just trying to play with training on vocaset data. So just followed the steps for data preparation and run training with
the following command,

python main.py --dataset vocaset --vertice_dim 15069 --feature_dim 64 --period 30 --train_subjects "FaceTalk_170728_03272_TA FaceTalk_170904_00128_TA FaceTalk_170725_00137_TA FaceTalk_170915_00223_TA FaceTalk_170811_03274_TA FaceTalk_170913_03279_TA FaceTalk_170904_03276_TA FaceTalk_170912_03278_TA" --val_subjects "FaceTalk_170811_03275_TA FaceTalk_170908_03277_TA" --test_subjects "FaceTalk_170809_00138_TA FaceTalk_170731_00024_TA"

I'm getting the following error,

Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
model parameters:  92215197
Loading data...
100%|█████████████████████████████████████████| 475/475 [03:05<00:00,  2.55it/s]
314 40 39
  0%|                                                   | 0/314 [00:00<?, ?it/s]vertice shape: torch.Size([1, 117, 15069])
vertice_input shape: torch.Size([1, 1, 64])
vertice_input shape: torch.Size([1, 1, 64])
tgt_mask: tensor([[[0.]],

        [[0.]],

        [[0.]],

        [[0.]]], device='cuda:0')
memory_mask: tensor([[False,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True]], device='cuda:0')
  0%|                                                   | 0/314 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 151, in <module>
    main()
  File "main.py", line 146, in main
    model = trainer(args, dataset["train"], dataset["valid"],model, optimizer, criterion, epoch=args.max_epoch)
  File "main.py", line 34, in trainer
    loss = model(audio, template,  vertice, one_hot, criterion,teacher_forcing=False)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ujjawal/my_work/object_recon/FaceFormer/faceformer.py", line 135, in forward
    vertice_out = self.transformer_decoder(vertice_input, hidden_states, tgt_mask=tgt_mask, memory_mask=memory_mask)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/transformer.py", line 233, in forward
    memory_key_padding_mask=memory_key_padding_mask)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/transformer.py", line 369, in forward
    key_padding_mask=memory_key_padding_mask)[0]
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/activation.py", line 845, in forward
    attn_mask=attn_mask)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/functional.py", line 3873, in multi_head_attention_forward
    raise RuntimeError('The size of the 2D attn_mask is not correct.')

If anyone get this type of error while training, Please suggest how to resolve this issue.

@brbernardo90
Copy link

brbernardo90 commented Sep 24, 2022

Hey @ujjawalcse , take a look my colab, the trainning is running: https://colab.research.google.com/drive/1BjSd3RGkm8LSZDnxjOCVEB5f4g4PIURy?usp=sharing

I copied from yours and comment some things.

@ujjawalcse
Copy link
Author

Thanks @brbernardo90
It need your access. Sent you an access request.
Please check it once.

@brbernardo90
Copy link

@ujjawalcse Ops, done! Thank you, your colab helped me a lot to start it.

@ujjawalcse
Copy link
Author

yaa @brbernardo90 , It's running fine in Google Colab.
But It's not running fine on my local PC.
This is configuration of my pc,
Ubuntu 18.04,
torch 1.5.1+cu101 (Also tried the Torch 1.9 but got the same error)
transformers 4.6.1
GPU : 8GB RTX 2070 Super
RAM : 32 GB,

@ujjawalcse
Copy link
Author

Got the training working properly now.
Actually, I was using previous FaceFormer repository cloned when it was newly introduced in local PC.
So, I was missing one parameter batch_first=True at line no. 81 in decoder_layer in faceformer.py

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants