Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSAGCN test failed on GPU #186

Closed
BlueSkyLT opened this issue Aug 18, 2022 · 2 comments
Closed

TSAGCN test failed on GPU #186

BlueSkyLT opened this issue Aug 18, 2022 · 2 comments

Comments

@BlueSkyLT
Copy link

When running the test,
all tests passed except for

test/attention_test.py:715:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py:1102: in _call_impl                                                                                                             
    return forward_call(*input, **kwargs)
torch_geometric_temporal/nn/attention/tsagcn.py:339: in forward
    y = self.relu(self.tcn1(self.gcn1(x)) + self.residual(x))
../../../anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py:1102: in _call_impl                                                                                                             
    return forward_call(*input, **kwargs)
torch_geometric_temporal/nn/attention/tsagcn.py:262: in forward                                                                                                                                                        
    y = self._non_adaptive_forward(x, y)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = UnitGCN(                                                                                                                                                                                                      (conv_d): ModuleList(
    (0): Conv2d(100, 10, kernel_size=(1, 1), stride=(1, 1))                                                                                                                                                            (1): Conv2d(100, 10, ...ack_running_stats=True)
  (soft): Softmax(dim=-2)
  (tan): Tanh()
  (sigmoid): Sigmoid()
  (relu): ReLU(inplace=True)
)
x = tensor([[[[ 0.3476,  0.1290,  0.4463,  ...,  0.4613,  0.2014,  0.0761],                                                                                                                                                  [ 1.3476,  1.1290,  1.4463,  ...,  1...,  2.4257,  2.1628],                                                                                                                                                        [ 3.6332,  4.1489,  4.0730,  ...,  4.4859,  3.4257,  3.1628]]]],                                                                                                                                                device='cuda:0')
y = None

    def _non_adaptive_forward(self, x, y):
        N, C, T, V = x.size()
        for i in range(self.num_subset):
            A1 = self.A[i]
            A2 = x.view(N, C * T, V)
>           z = self.conv_d[i](torch.matmul(A2, A1).view(N, C, T, V))
E           RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_mm)

torch_geometric_temporal/nn/attention/tsagcn.py:251: RuntimeError

Similar to #46 , if I force it on CPU then it will pass

   #device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    device = torch.device("cpu")
@benedekrozemberczki
Copy link
Owner

I suppose there is a tensor that is not on GPU, can you guess which one?

@cshjin
Copy link

cshjin commented Oct 13, 2023

The issue still existed.

    def _non_adaptive_forward(self, x, y):
        N, C, T, V = x.size()
        for i in range(self.num_subset):
            A1 = self.A[i]
            A2 = x.view(N, C * T, V)
>           z = self.conv_d[i](torch.matmul(A2, A1).view(N, C, T, V))
# A1 is on CPU, A2 is on GPU (same as `x`)

A temp fix could be

    def _non_adaptive_forward(self, x, y):
        _device = x.device
        N, C, T, V = x.size()
        for i in range(self.num_subset):
            A1 = self.A[i].to(_device)
            A2 = x.view(N, C * T, V)
            z = self.conv_d[i](torch.matmul(A2, A1).view(N, C, T, V))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants