Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

ConditionalRandomFields doesn't train on the GPU #5313

Closed
9 tasks done
alle-pawols opened this issue Jul 16, 2021 · 0 comments · Fixed by #5315
Closed
9 tasks done

ConditionalRandomFields doesn't train on the GPU #5313

alle-pawols opened this issue Jul 16, 2021 · 0 comments · Fixed by #5315
Assignees
Labels

Comments

@alle-pawols
Copy link
Contributor

alle-pawols commented Jul 16, 2021

Checklist

  • I have verified that the issue exists against the main branch of AllenNLP.
  • I have read the relevant section in the contribution guide on reporting bugs.
  • I have checked the issues list for similar or identical bug reports.
  • I have checked the pull requests list for existing proposed fixes.
  • I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
  • I have included in the "Description" section below a traceback from any exceptions related to this bug.
  • I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
  • I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
  • I have included in the "Environment" section below the output of pip freeze.
  • [] I have included in the "Steps to reproduce" section below a minimally reproducible example.

Description

When training the ConditionalRandomFields as an additional layer in a model implemented in the PyTorch lightning framework on the GPU I've got the error about inconsistent devices.

Python traceback: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Traceback (most recent call last):
  [...]
  File "/root/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 812, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/root/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 280, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "/root/.local/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
    return self.training_type_plugin.training_step(*args)
  File "/root/.local/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 155, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
  File "/root/.local/lib/python3.7/site-packages/marinero/architectures/models/sequence_taggers/lstm_crf_tagger.py", line 103, in training_step
    loss_value = self(input_tokens_ids, unrolled_target_tokens)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/.local/lib/python3.7/site-packages/marinero/architectures/models/sequence_taggers/lstm_crf_tagger.py", line 82, in forward
    log_likelihood = self.crf_tagger(logits, targets)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/.local/lib/python3.7/site-packages/allennlp/modules/conditional_random_field.py", line 331, in forward
    log_denominator = self._input_likelihood(inputs, mask)
  File "/root/.local/lib/python3.7/site-packages/allennlp/modules/conditional_random_field.py", line 251, in _input_likelihood
    alpha = util.logsumexp(inner, 1) * mask[i].view(batch_size, 1) + alpha * (
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Related issues or possible duplicates

  • None

Environment

OS: Linux

Python version: 3.7

Output of pip freeze:

pytorch-lightning==1.3.3
torch==1.7.1
torchmetrics==0.3.2
allennlp==2.5.0

Steps to reproduce

Puttin

Example source:

alle-pawols added a commit to alle-pawols/allennlp that referenced this issue Jul 16, 2021
alle-pawols added a commit to alle-pawols/allennlp that referenced this issue Jul 16, 2021
alle-pawols added a commit to alle-pawols/allennlp that referenced this issue Jul 19, 2021
epwalsh added a commit that referenced this issue Jul 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants