beam size problem #3

maryawwm · 2022-05-26T09:39:35Z

Hi,

I trained the code with beam size of 1 and it worked well. Now I want to try it with other values but when I try beam size 3 in train script I got this error:

iter 2999 (epoch 0), train_loss = 0.770, time/batch = 0.202
250.90925693511963 ms needed to decode one sentece under batch size 10 and beam size 3
Traceback (most recent call last):
File "train.py", line 325, in
train(opt)
File "train.py", line 273, in train
dp_model, lw_model.crit, loader, eval_kwargs)
File "/mnt/f/satic/eval_utils.py", line 138, in eval_split
sents_list = [utils.decode_sequence(loader.get_vocab(), _['seq'].unsqueeze(0))[0] for _ in model.done_beams[i]]
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 772, in getattr
type(self).name, name))
torch.nn.modules.module.ModuleAttributeError: 'DataParallel' object has no attribute 'done_beams'

Can you help me how to fix that?(because you provided results with different beam size in your paper and I guess the code should be ok )

YuanEZhou · 2022-05-26T12:27:35Z

Hi @maryawwm , you can try to replace "model.done_beams" with "model.module.done_beams".

maryawwm · 2022-05-26T21:17:26Z

Thanks @YuanEZhou !It works.

maryawwm · 2022-05-28T03:37:59Z

hi again,

after changing the beam size when I want to run the second training stage I face new error :

iter 330103 (epoch 29), avg_reward = 0.000, time/batch = 0.975
Read data: 0.4967498779296875
Save ckpt on exception ...
model saved to save/nsc-sat-2-from-nsc-seqkd/model.pth
Save ckpt done.
Traceback (most recent call last):
File "train.py", line 213, in train
model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/misc/loss_wrapper.py", line 45, in forward
reward = get_self_critical_reward(self.model, fc_feats, att_feats, att_masks, gts, gen_result, self.opt)
File "/mnt/f/satic/misc/rewards.py", line 42, in get_self_critical_reward
greedy_res, _ = model(fc_feats, att_feats, att_masks=att_masks, mode='sample')
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/CaptionModel.py", line 33, in forward
return getattr(self, ''+mode)(*args, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 396, in _sample
p_fc_feats, p_att_feats, pp_att_feats, p_att_masks = self._prepare_feature(fc_feats, att_feats, att_masks)
File "/mnt/f/satic/models/SAT.py", line 310, in _prepare_feature
memory = self.model.encode(att_feats, att_masks)
File "/mnt/f/satic/models/SAT.py", line 45, in encode
return self.encoder(self.src_embed(src), src_mask)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 86, in forward
x = layer(x, mask)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 128, in forward
return self.sublayer[1](x, self.feed_forward)
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 114, in forward
return x + self.dropout(sublayer(self.norm(x)))
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/f/satic/models/SAT.py", line 219, in forward
return self.w_2(self.dropout(F.relu(self.w_1(x))))
File "/home/maryam/anaconda3/envs/satic/lib/python3.7/site-packages/torch/nn/functional.py", line 1119, in relu
result = torch.relu(input)
RuntimeError: CUDA error: unknown error

Terminating BlobFetcher

YuanEZhou · 2022-05-29T01:58:37Z

Hi @maryawwm , we usually set beam size to 1 during training and set beam size to 3 during testing. This setting is ok and it is not very necessary to set beam size to 3 during training. Based on the above, I may not write the code in support of setting a beam size greater than 1 during the second training stage.

maryawwm · 2022-05-30T03:42:36Z

That's right. Thank you!

maryawwm closed this as completed May 26, 2022

maryawwm reopened this May 28, 2022

maryawwm closed this as completed May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

beam size problem #3

beam size problem #3

maryawwm commented May 26, 2022

YuanEZhou commented May 26, 2022

maryawwm commented May 26, 2022

maryawwm commented May 28, 2022 •

edited

YuanEZhou commented May 29, 2022

maryawwm commented May 30, 2022

beam size problem #3

beam size problem #3

Comments

maryawwm commented May 26, 2022

YuanEZhou commented May 26, 2022

maryawwm commented May 26, 2022

maryawwm commented May 28, 2022 • edited

YuanEZhou commented May 29, 2022

maryawwm commented May 30, 2022

maryawwm commented May 28, 2022 •

edited