Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the code does not convert IntTensor to LongTensor #2

Open
maryawwm opened this issue Aug 7, 2021 · 4 comments
Open

the code does not convert IntTensor to LongTensor #2

maryawwm opened this issue Aug 7, 2021 · 4 comments

Comments

@maryawwm
Copy link

maryawwm commented Aug 7, 2021

I'm gonna train this code with the same environmental requirements:
python 3.6
pytorch 1.6

but when I run the first training stage I got error:

DataLoader loading json file: data/cocotalk.json
vocab size is 9487
DataLoader loading h5 file: data/mscoco/cocobu_fc data/mscoco/cocobu_att data/mscoco/cocobu_box data/cocotalk_seq-kd-from-nsc-transformer-baseline-b5_label.h5
max sequence length in data is 16
read 123287 image features
assigned 113287 images to split train
assigned 5000 images to split val
assigned 5000 images to split test
Read data: 0.046845197677612305
Save ckpt on exception ...
model saved to save/sat-2-from-nsc-seqkd\model.pth
Save ckpt done.
Traceback (most recent call last):
File "train.py", line 213, in train
model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag).to(device).long()
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision\satic\misc\loss_wrapper.py", line 30, in forward
student_output = self.model(fc_feats, att_feats, labels, att_masks)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision\satic\models\CaptionModel.py", line 33, in forward
return getattr(self, '
'+mode)(*args, **kwargs)
File "C:\Users\vision\satic\models\SAT.py", line 347, in _forward
out = self.model(att_feats, seq, att_masks, seq_mask)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision\satic\models\SAT.py", line 42, in forward
tgt, tgt_mask)
File "C:\Users\vision\satic\models\SAT.py", line 48, in decode
return self.decoder(self.tgt_embed(tgt), memory, src_mask, tgt_mask)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\container.py", line 117, in forward
input = module(input)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision\satic\models\SAT.py", line 228, in forward
return self.lut(x) * math.sqrt(self.d_model)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\sparse.py", line 126, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\functional.py", line 1814, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

@YuanEZhou
Copy link
Owner

Hi @maryawwm , my env is the linux os, it seems that you are using the windows version. Anyway, this RuntimeError is caused by the unexpected dtype of torch.embedding input, you can try to change this line to out = self.model(att_feats, seq.long(), att_masks, seq_mask).

@maryawwm
Copy link
Author

maryawwm commented Aug 8, 2021

Hi, Thanks for your respond.

Yes I'm using windows and your solution fix that error but after that I face 2 new errors:

C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\rnn.py:60: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
File "train.py", line 337, in
train(opt)
File "train.py", line 225, in train
model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.


Original Traceback (most recent call last):
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision\satic\misc\loss_wrapper.py", line 32, in forward
teacher_output = self.teacher(fc_feats, att_feats, labels, att_masks)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "C:\Users\vision.conda\envs\caption\lib\site-packages\torch\nn\modules\module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vision\satic\models\CaptionModel.py", line 33, in forward
return getattr(self, '
'+mode)(*args, **kwargs)
File "C:\Users\vision\satic\models\ShowTellModel.py", line 52, in _forward
state = self.init_hidden(batch_size)
File "C:\Users\vision\satic\models\ShowTellModel.py", line 43, in init_hidden
weight = next(self.parameters()).data
StopIteration

@YuanEZhou
Copy link
Owner

Hi @maryawwm , first you can check whether you are using pytorch 1.6. Second, if you have multiple GPUs on your machine, you can temporarily set CUDA_VISIBLE_DEVICES=0 and try again.

@YuanEZhou
Copy link
Owner

You can also refer to Multi-GPU Error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants