Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run without CUDA #7

Closed
varunnrao opened this issue Jan 4, 2018 · 7 comments
Closed

run without CUDA #7

varunnrao opened this issue Jan 4, 2018 · 7 comments

Comments

@varunnrao
Copy link

Is there a way to convert the preprocess-images.py to a version that doesnt require CUDA?

@Cyanogenoid
Copy link
Owner

Simply removing the two calls to .cuda in preprocess-images.py should work.

@varunnrao
Copy link
Author

That does not work. We did try that.
There is an issue with this part of the code which expects CUDA.

torch.utils.data.DataLoader(
        dataset,
        batch_size=config.preprocess_batch_size,
        num_workers=config.data_workers,
        shuffle=False,
        pin_memory=True,
    )

We get an error saying no NVIDIA device found.
So, we tried setting pin_memory=False. However this did not work as well.

out = net(imgs) failed since there mismatch in image sizes.

We would like to replicate your results. Is it possible for you to commit 2 new working codes of preprocess-image.py and train.py?

@varunnrao
Copy link
Author

with pin_memory=True and after removing .cuda, this was the error log


Traceback (most recent call last):
  File "preprocess-images.py", line 73, in <module>
    main()
  File "preprocess-images.py", line 62, in main
    for ids, imgs in loader:
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 201, in __next__
    return self._process_next_batch(batch)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
AssertionError: Traceback (most recent call last):
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 62, in _pin_memory_loop
    batch = pin_memory_batch(batch)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 123, in pin_memory_batch
    return [pin_memory_batch(sample) for sample in batch]
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 123, in <listcomp>
    return [pin_memory_batch(sample) for sample in batch]
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 117, in pin_memory_batch
    return batch.pin_memory()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 82, in pin_memory
    return type(self)().set_(storage.pin_memory()).view_as(self)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/storage.py", line 83, in pin_memory
    allocator = torch.cuda._host_allocator()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 220, in _host_allocator
    _lazy_init()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 84, in _lazy_init
    _check_driver()
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 58, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

@varunnrao
Copy link
Author

with pin_memory=False, this was the error log

Traceback (most recent call last):
  File "preprocess-images.py", line 73, in <module>
    main()
  File "preprocess-images.py", line 64, in main
    out = net(imgs)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "preprocess-images.py", line 25, in forward
    self.model(x)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torchvision-0.1.9-py3.6.egg/torchvision/models/resnet.py", line 151, in forward
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 53, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 553, in linear
    return torch.addmm(bias, input, weight.t())
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 924, in addmm
    return cls._blas(Addmm, args, False)
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 920, in _blas
    return cls.apply(*(tensors + (alpha, beta, inplace)))
  File "/home/vqaproject2018/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/blas.py", line 26, in forward
    matrix1, matrix2, out=output)
RuntimeError: size mismatch, m1: [64 x 8192], m2: [2048 x 1000] at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/TH/generic/THTensorMath.c:1293

@varunnrao
Copy link
Author

please do note that we have imported the following model for resnet since your command on line 12 did not work
import torchvision.models.resnet as caffe_resnet

@Cyanogenoid
Copy link
Owner

The torchvision net is not quite a drop-in replacement. Get the git submodule for the caffe resnet fixed and try the pin_memory=False version. Either way, I don't recommend running this with a CPU-only -- it will take ages to train.

@varunnrao
Copy link
Author

varunnrao commented Jan 5, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants