Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating pull request of hacks I needed to run sample.py on cuda #178

Closed
wants to merge 1 commit into from

Conversation

DavidLKing
Copy link

When I run even the basic sample.py script under examples on a cuda enabled machine, I still get errors that not all the vectors are cuda vectors (some are still cpu). You can ignore my edits in sample.py, but I did annotate in each place where I needed to add vector = vector.cuda() to make sample.py run. This occurs even when torch.device('cuda') is called, which should not be the case in PyTorch 0.4.0+.

Thank you, and feel free to follow up with any questions.

@pskrunner14
Copy link

Hi @DavidLKing did you try the develop branch? The necessary changes for moving tensors to CUDA have already been made there. Master branch is yet to be updated.

@DavidLKing
Copy link
Author

Hi @pskrunner14! I did, but it still fails when I try to run sample.py with the same error:

(pytorch-seq2seq) david@Arjuna:~/bin/git/pytorch-seq2seq$ python examples/sample.py --train_path data/toy_reverse/train/data.txt --dev_path data/toy_reverse/dev/data.txt 
/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.
  warnings.warn(warning.format(ret))
2018-11-11 15:23:53,307 root         INFO     Namespace(dev_path='data/toy_reverse/dev/data.txt', expt_dir='./experiment', load_checkpoint=None, log_level='info', resume=False, train_path='data/toy_reverse/train/data.txt')
/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
2018-11-11 15:23:56,082 seq2seq.trainer.supervised_trainer INFO     Optimizer: Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.001
    weight_decay: 0
), Scheduler: None
Traceback (most recent call last):
  File "examples/sample.py", line 129, in <module>
    resume=opt.resume)
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/seq2seq-0.1.6-py3.6.egg/seq2seq/trainer/supervised_trainer.py", line 186, in train
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/seq2seq-0.1.6-py3.6.egg/seq2seq/trainer/supervised_trainer.py", line 103, in _train_epoches
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/seq2seq-0.1.6-py3.6.egg/seq2seq/trainer/supervised_trainer.py", line 55, in _train_batch
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/seq2seq-0.1.6-py3.6.egg/seq2seq/models/seq2seq.py", line 48, in forward
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/seq2seq-0.1.6-py3.6.egg/seq2seq/models/EncoderRNN.py", line 68, in forward
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 110, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/david/miniconda2/envs/pytorch-seq2seq/lib/python3.6/site-packages/torch/nn/functional.py", line 1110, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #3 'index'

I'm fine with this not being a full merge. Would an issue make more sense?

@pskrunner14
Copy link

Hey @DavidLKing are you sure you have the latest changes? It seems to be working fine on my side:

psk@ubuntu:~/projects/pytorch-seq2seq$ python examples/sample.py $TRAIN_SRC $TRAIN_TGT $DEV_SRC $DEV_TGT
2018-11-12 02:19:04,671:root:INFO: train_source: data/toy_reverse/train/src.txt
2018-11-12 02:19:04,671:root:INFO: train_target: data/toy_reverse/train/tgt.txt
2018-11-12 02:19:04,671:root:INFO: dev_source: data/toy_reverse/dev/src.txt
2018-11-12 02:19:04,671:root:INFO: dev_target: data/toy_reverse/dev/tgt.txt
2018-11-12 02:19:04,671:root:INFO: experiment_directory: ./experiment
2018-11-12 02:19:06,860:seq2seq.trainer.supervised_trainer:INFO: Optimizer: Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    initial_lr: 0.001
    lr: 0.001
    weight_decay: 0
), Scheduler: <torch.optim.lr_scheduler.StepLR object at 0x7fd4f1c03c88>
Train Perplexity: 8.6506: 100%|██████████████████████████████████████| 157/157 [00:03<00:00, 45.63it/s]
2018-11-12 02:19:10,454:seq2seq.trainer.supervised_trainer:INFO: Finished epoch 1: Train Perplexity: 6.4669, Dev Perplexity: 4.7727, Accuracy: 0.6091
Train Perplexity: 1.1459: 100%|██████████████████████████████████████| 157/157 [00:03<00:00, 47.59it/s]
2018-11-12 02:19:13,900:seq2seq.trainer.supervised_trainer:INFO: Finished epoch 2: Train Perplexity: 2.4010, Dev Perplexity: 4.4814, Accuracy: 0.7656
Train Perplexity: 1.0013: 100%|██████████████████████████████████████| 157/157 [00:03<00:00, 46.45it/s]
2018-11-12 02:19:17,427:seq2seq.trainer.supervised_trainer:INFO: Finished epoch 3: Train Perplexity: 1.2013, Dev Perplexity: 1.0010, Accuracy: 1.0000
['3', '2', '1', '<eos>']

@DavidLKing
Copy link
Author

Okay, it seems there have been more drastic changes. I was using an older version of sample.py. The newer version runs fine with the develop branch. I'll need to update my own project, but I think we're good to go. Thanks for the time! When's the next release scheduled?

@pskrunner14
Copy link

Hi @DavidLKing yes there have been quite a few changes in the core API. Hopefully by next month it will be ready, still have a few bugs in the new features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants