run without CUDA #7

varunnrao · 2018-01-04T06:26:54Z

Is there a way to convert the preprocess-images.py to a version that doesnt require CUDA?

Cyanogenoid · 2018-01-04T18:43:38Z

Simply removing the two calls to .cuda in preprocess-images.py should work.

varunnrao · 2018-01-05T03:03:41Z

That does not work. We did try that.
There is an issue with this part of the code which expects CUDA.

torch.utils.data.DataLoader(
        dataset,
        batch_size=config.preprocess_batch_size,
        num_workers=config.data_workers,
        shuffle=False,
        pin_memory=True,
    )

We get an error saying no NVIDIA device found.
So, we tried setting pin_memory=False. However this did not work as well.

out = net(imgs) failed since there mismatch in image sizes.

We would like to replicate your results. Is it possible for you to commit 2 new working codes of preprocess-image.py and train.py?

varunnrao · 2018-01-05T03:19:01Z

with pin_memory=True and after removing .cuda, this was the error log


Traceback (most recent call last):
  File "preprocess-images.py", line 73, in <module>
    main()
  File "preprocess-images.py", line 62, in main
    for ids, imgs in loader:
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 201, in __next__
    return self._process_next_batch(batch)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
AssertionError: Traceback (most recent call last):
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 62, in _pin_memory_loop
    batch = pin_memory_batch(batch)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 123, in pin_memory_batch
    return [pin_memory_batch(sample) for sample in batch]
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 123, in <listcomp>
    return [pin_memory_batch(sample) for sample in batch]
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 117, in pin_memory_batch
    return batch.pin_memory()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 82, in pin_memory
    return type(self)().set_(storage.pin_memory()).view_as(self)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/storage.py", line 83, in pin_memory
    allocator = torch.cuda._host_allocator()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 220, in _host_allocator
    _lazy_init()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 84, in _lazy_init
    _check_driver()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 58, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

varunnrao · 2018-01-05T03:19:55Z

with pin_memory=False, this was the error log

Traceback (most recent call last):
  File "preprocess-images.py", line 73, in <module>
    main()
  File "preprocess-images.py", line 64, in main
    out = net(imgs)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "preprocess-images.py", line 25, in forward
    self.model(x)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torchvision-0.1.9-py3.6.egg/torchvision/models/resnet.py", line 151, in forward
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 53, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 553, in linear
    return torch.addmm(bias, input, weight.t())
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 924, in addmm
    return cls._blas(Addmm, args, False)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 920, in _blas
    return cls.apply(*(tensors + (alpha, beta, inplace)))
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/blas.py", line 26, in forward
    matrix1, matrix2, out=output)
RuntimeError: size mismatch, m1: [64 x 8192], m2: [2048 x 1000] at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/TH/generic/THTensorMath.c:1293

varunnrao · 2018-01-05T03:21:54Z

please do note that we have imported the following model for resnet since your command on line 12 did not work
import torchvision.models.resnet as caffe_resnet

Cyanogenoid · 2018-01-05T13:19:07Z

The torchvision net is not quite a drop-in replacement. Get the git submodule for the caffe resnet fixed and try the pin_memory=False version. Either way, I don't recommend running this with a CPU-only -- it will take ages to train.

varunnrao · 2018-01-05T14:47:19Z

Okay. Thanks

…

On 5 January 2018 at 18:49, Yan Zhang ***@***.***> wrote: The torchvision net is not quite a drop-in replacement. Get the git submodule for the caffe resnet fixed and try the pin_memory=False version. Either way, I don't recommend running this with a CPU-only -- it will take ages to train. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AY_7IZrqCg9fK09ttXvHkiADB01wV7Wmks5tHiFLgaJpZM4RSnZB> .

varunnrao closed this as completed Jan 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run without CUDA #7

run without CUDA #7

varunnrao commented Jan 4, 2018

Cyanogenoid commented Jan 4, 2018

varunnrao commented Jan 5, 2018

varunnrao commented Jan 5, 2018

varunnrao commented Jan 5, 2018

varunnrao commented Jan 5, 2018

Cyanogenoid commented Jan 5, 2018

varunnrao commented Jan 5, 2018 via email

run without CUDA #7

run without CUDA #7

Comments

varunnrao commented Jan 4, 2018

Cyanogenoid commented Jan 4, 2018

varunnrao commented Jan 5, 2018

varunnrao commented Jan 5, 2018

varunnrao commented Jan 5, 2018

varunnrao commented Jan 5, 2018

Cyanogenoid commented Jan 5, 2018

varunnrao commented Jan 5, 2018 via email