Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beam size problem #3

Closed
maryawwm opened this issue May 26, 2022 · 5 comments
Closed

beam size problem #3

maryawwm opened this issue May 26, 2022 · 5 comments

Comments

@maryawwm
Copy link

Hi,

I trained the code with beam size of 1 and it worked well. Now I want to try it with other values but when I try beam size 3 in train script I got this error:

iter 2999 (epoch 0), train_loss = 0.770, time/batch = 0.202
250.90925693511963 ms needed to decode one sentece under batch size 10 and beam size 3
Traceback (most recent call last):
File "train.py", line 325, in
train(opt)
File "train.py", line 273, in train
dp_model, lw_model.crit, loader, eval_kwargs)
File "/mnt/f/satic/eval_utils.py", line 138, in eval_split
sents_list = [utils.decode_sequence(loader.get_vocab(), _['seq'].unsqueeze(0))[0] for _ in model.done_beams[i]]
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 772, in getattr
type(self).name, name))
torch.nn.modules.module.ModuleAttributeError: 'DataParallel' object has no attribute 'done_beams'

Can you help me how to fix that?(because you provided results with different beam size in your paper and I guess the code should be ok )

@YuanEZhou
Copy link
Owner

Hi @maryawwm , you can try to replace "model.done_beams" with "model.module.done_beams".

@maryawwm
Copy link
Author

Thanks @YuanEZhou !It works.

@maryawwm maryawwm reopened this May 28, 2022
@maryawwm
Copy link
Author

maryawwm commented May 28, 2022

hi again,

after changing the beam size when I want to run the second training stage I face new error :

iter 330103 (epoch 29), avg_reward = 0.000, time/batch = 0.975
Read data: 0.4967498779296875
Save ckpt on exception ...
model saved to save/nsc-sat-2-from-nsc-seqkd/model.pth
Save ckpt done.
Traceback (most recent call last):
File "train.py", line 213, in train
model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/misc/loss_wrapper.py", line 45, in forward
reward = get_self_critical_reward(self.model, fc_feats, att_feats, att_masks, gts, gen_result, self.opt)
File "/mnt/f/satic/misc/rewards.py", line 42, in get_self_critical_reward
greedy_res, _ = model(fc_feats, att_feats, att_masks=att_masks, mode='sample')
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/CaptionModel.py", line 33, in forward
return getattr(self, '
'+mode)(*args, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 396, in _sample
p_fc_feats, p_att_feats, pp_att_feats, p_att_masks = self._prepare_feature(fc_feats, att_feats, att_masks)
File "/mnt/f/satic/models/SAT.py", line 310, in _prepare_feature
memory = self.model.encode(att_feats, att_masks)
File "/mnt/f/satic/models/SAT.py", line 45, in encode
return self.encoder(self.src_embed(src), src_mask)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 86, in forward
x = layer(x, mask)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 128, in forward
return self.sublayer[1](x, self.feed_forward)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 114, in forward
return x + self.dropout(sublayer(self.norm(x)))
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 219, in forward
return self.w_2(self.dropout(F.relu(self.w_1(x))))
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/functional.py", line 1119, in relu
result = torch.relu(input)
RuntimeError: CUDA error: unknown error

Terminating BlobFetcher

@YuanEZhou
Copy link
Owner

Hi @maryawwm , we usually set beam size to 1 during training and set beam size to 3 during testing. This setting is ok and it is not very necessary to set beam size to 3 during training. Based on the above, I may not write the code in support of setting a beam size greater than 1 during the second training stage.

@maryawwm
Copy link
Author

That's right. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants