Can't get it to run with multi-GPU #3

Tylersuard · 2022-11-06T04:29:33Z

Here is my code:

import os
import time
import datetime

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader

import argparse
from tensorboardX import SummaryWriter

gpu_devices = '0,1,2,3'
os.environ["CUDA_VISIBLE_DEVICES"] = gpu_devices


device = 'cuda' if torch.cuda.is_available() else 'cpu'

net = GConv(
    d_model=256,
    d_state=64,
    l_max=1_000_000,
    bidirectional=True,
    kernel_dim=32,
    n_scales=None,
    decay_min=2,
    decay_max=2,
)

net = nn.DataParallel(net)
net = net.to(device)
num_params = sum(p.numel() for p in net.parameters() if p.requires_grad)
print('The number of parameters of model is', num_params)
                
x = torch.randn(1, 256, 1_000_000)
x = x.to(device)

y, k = net(x, return_kernel=True)

And here is the error I am getting:

IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ec2-user/SageMaker/SGConv/gconv_standalone.py", line 416, in forward
self.kernel_list[i],
File "/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/nn/modules/container.py", line 462, in getitem
idx = self._get_abs_string_index(idx)
File "/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/nn/modules/container.py", line 445, in _get_abs_string_index
raise IndexError('index {} is out of range'.format(idx))
IndexError: index 0 is out of range

The text was updated successfully, but these errors were encountered:

ctlllll · 2022-11-06T05:27:29Z

Can you please try to update the PyTorch version, this may relate to the issue of incompatibility of nn.DataParallel and nn.ParameterList (e.g., pytorch/pytorch#36035)? Also, please use x = torch.randn(4, 256, 1_000_000) or more samples in your case because otherwise some GPUs may not receive any sample. Generally, we recommend you use nn.parallel.DistributedDataParallel instead of nn.DataParallel as suggested by Pytorch (https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead).

ghost mentioned this issue Feb 6, 2023

Complex Tensors #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't get it to run with multi-GPU #3

Can't get it to run with multi-GPU #3

Tylersuard commented Nov 6, 2022

ctlllll commented Nov 6, 2022

Can't get it to run with multi-GPU #3

Can't get it to run with multi-GPU #3

Comments

Tylersuard commented Nov 6, 2022

ctlllll commented Nov 6, 2022