ConditionalRandomFields doesn't train on the GPU #5313

alle-pawols · 2021-07-16T08:02:20Z

Checklist

I have verified that the issue exists against the main branch of AllenNLP.
I have read the relevant section in the contribution guide on reporting bugs.
I have checked the issues list for similar or identical bug reports.
I have checked the pull requests list for existing proposed fixes.
I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
I have included in the "Description" section below a traceback from any exceptions related to this bug.
I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
I have included in the "Environment" section below the output of pip freeze.
[] I have included in the "Steps to reproduce" section below a minimally reproducible example.

Description

When training the ConditionalRandomFields as an additional layer in a model implemented in the PyTorch lightning framework on the GPU I've got the error about inconsistent devices.

Python traceback: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Traceback (most recent call last):
  [...]
  File "/root/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 812, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/root/.local/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 280, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "/root/.local/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
    return self.training_type_plugin.training_step(*args)
  File "/root/.local/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 155, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
  File "/root/.local/lib/python3.7/site-packages/marinero/architectures/models/sequence_taggers/lstm_crf_tagger.py", line 103, in training_step
    loss_value = self(input_tokens_ids, unrolled_target_tokens)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/.local/lib/python3.7/site-packages/marinero/architectures/models/sequence_taggers/lstm_crf_tagger.py", line 82, in forward
    log_likelihood = self.crf_tagger(logits, targets)
  File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/.local/lib/python3.7/site-packages/allennlp/modules/conditional_random_field.py", line 331, in forward
    log_denominator = self._input_likelihood(inputs, mask)
  File "/root/.local/lib/python3.7/site-packages/allennlp/modules/conditional_random_field.py", line 251, in _input_likelihood
    alpha = util.logsumexp(inner, 1) * mask[i].view(batch_size, 1) + alpha * (
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Related issues or possible duplicates

None

Environment

OS: Linux

Python version: 3.7

Output of pip freeze:

pytorch-lightning==1.3.3
torch==1.7.1
torchmetrics==0.3.2
allennlp==2.5.0

Steps to reproduce

Puttin

Example source:

The text was updated successfully, but these errors were encountered:

Co-authored-by: Pete <petew@allenai.org>

alle-pawols added the bug label Jul 16, 2021

alle-pawols added a commit to alle-pawols/allennlp that referenced this issue Jul 16, 2021

allenai#5313 Fix training Conditional Random Fields on GPU

0775b33

alle-pawols added a commit to alle-pawols/allennlp that referenced this issue Jul 16, 2021

Fix training Conditional Random Fields on GPU (allenai#5313)

0869d08

alle-pawols mentioned this issue Jul 16, 2021

Fix training Conditional Random Fields on GPU (#5313) #5315

Merged

6 tasks

AkshitaB assigned epwalsh Jul 16, 2021

alle-pawols added a commit to alle-pawols/allennlp that referenced this issue Jul 19, 2021

Fix training Conditional Random Fields on GPU (allenai#5313)

43a0c51

epwalsh closed this as completed in #5315 Jul 19, 2021

epwalsh added a commit that referenced this issue Jul 19, 2021

Fix training Conditional Random Fields on GPU (#5313) (#5315)

56e1f49

Co-authored-by: Pete <petew@allenai.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConditionalRandomFields doesn't train on the GPU #5313

ConditionalRandomFields doesn't train on the GPU #5313

alle-pawols commented Jul 16, 2021 •

edited

ConditionalRandomFields doesn't train on the GPU #5313

ConditionalRandomFields doesn't train on the GPU #5313

Comments

alle-pawols commented Jul 16, 2021 • edited

Checklist

Description

Related issues or possible duplicates

Environment

Steps to reproduce

alle-pawols commented Jul 16, 2021 •

edited