Do I have to install the checkpoint model first before execution? Do you have a link? #32

GabbySuwichaya · 2021-03-06T15:54:47Z

Receive an error when executing the demo...

Here I have used the following command...

python3 SuperGAT/main.py --dataset-class Planetoid --dataset-name Cora --custom-key EV13NSO8-ES --num-gpus-total 2

And the errors are

Cannot load model, [Errno 2] No such file or directory: '../checkpoints/GAT-Cora-EV13NSO8-ES/d3d4807'
Then, I get Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

Questions:

Do I have to install the checkpoint model first before execution? Do you have a link?
Also, because my based Nvidia driver is newer than the given PyTorch version, could you provide the general commands to install other dependencies aside from the Pytorch geometry?

I think I have installed all of them, but I may miss some packages then I get this error.....

Full displayed error:

Args PPRINT: GAT-Cora-EV13NSO8-ES
        - att_lambda: 11.346574532931719
        - attention_type: prob_mask_only
        - batch_size: 128
        - checkpoint_dir: ../checkpoints
        - custom_key: EV13NSO8-ES
        - data_num_splits: 1
        - data_root: ~/graph-data
        - data_sampler: None
        - data_sampling_num_hops: None
        - data_sampling_size: None
        - dataset_class: Planetoid
        - dataset_name: Cora
        - dropout: 0.6
        - early_stop_patience: 485
        - early_stop_queue_length: 484
        - early_stop_threshold_loss: 0.009317052513589488
        - early_stop_threshold_perf: 0.0011587124922279313
        - edge_sampling_ratio: 0.8
        - epochs: 490
        - gpu_deny_list: [1, 2, 3]
        - heads: 8
        - is_cgat_full: False
        - is_cgat_ssnc: False
        - is_link_gnn: False
        - is_super_gat: True
        - l1_lambda: 0.0
        - l2_lambda: 0.008228864972965771
        - link_lambda: 0.0
        - loss: None
        - lr: 0.005
        - m: 
        - model_name: GAT
        - neg_sample_ratio: 0.5
        - num_gpus_to_use: 1
        - num_gpus_total: 2
        - num_hidden_features: 8
        - num_layers: 2
        - out_heads: 8
        - perf_task_for_val: Node
        - perf_type: accuracy
        - pool_name: None
        - pretraining_noise_ratio: 0.0
        - save_model: False
        - save_plot: False
        - scaling_factor: None
        - seed: 42
        - start_epoch: 0
        - super_gat_criterion: None
        - task_type: Node_Transductive
        - to_undirected: False
        - to_undirected_at_neg: False
        - total_pretraining_epoch: 0
        - use_bn: False
        - use_early_stop: True
        - use_pretraining: False
        - val_interval: 1
        - verbose: 2
Use GPU the ID of which is [0]
## TRIAL 0 ##
Now loading dataset: Planetoid / Cora
SuperGATNet(
  (conv1): SuperGAT(1433, 8, heads=8, concat=True, att_type=prob_mask_only, nsr=0.5, pnr=0.0)
  (conv2): SuperGAT(64, 7, heads=8, concat=False, att_type=prob_mask_only, nsr=0.5, pnr=0.0)
)
Cannot load model, [Errno 2] No such file or directory: '../checkpoints/GAT-Cora-EV13NSO8-ES/d3d4807'
  0%|                                                                                                        | 0/490 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "SuperGAT/main.py", line 471, in <module>
    many_seeds_result = run_with_many_seeds(main_args, num_total_runs, gpu_id=alloc_gpu[0])
  File "SuperGAT/main.py", line 403, in run_with_many_seeds
    ret = run(_args, gpu_id=gpu_id, **kwargs)
  File "SuperGAT/main.py", line 307, in run
    train_loss = train_model(running_device, net, train_d, loss_func, adam_optim, epoch=epoch, _args=args)
  File "SuperGAT/main.py", line 97, in train_model
    attention_edge_index=getattr(batch, "train_edge_index", None))
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/model.py", line 89, in forward
    x = self.conv1(x, edge_index, batch=batch, **kwargs)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/layer.py", line 135, in forward
    propagated = self.propagate(edge_index, size=size, x=x)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 237, in propagate
    out = self.message(**msg_kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/layer.py", line 206, in message
    alpha = self._get_attention(edge_index_i, x_i, x_j, size_i)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/layer.py", line 276, in _get_attention
    alpha = softmax(alpha, edge_index_i, size_i)
RuntimeError: softmax() Expected a value of type 'Optional[Tensor]' for argument 'ptr' but instead found type 'int'.
Position: 2
Value: 2708
Declaration: softmax(Tensor src, Tensor? index, Tensor? ptr=None, int? num_nodes=None) -> (Tensor)
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

The text was updated successfully, but these errors were encountered:

GabbySuwichaya · 2021-03-06T17:39:01Z

Also, even after I fixed the softmax parameters in your script

from

softmax(alpha, edge_index_i,  size_i)

to

softmax(alpha, edge_index_i, num_nodes=size_i)

I still get the following error....... Can you please help?

(SuperGAT) gabby-suwichaya@gabby-suwichaya:/mnt/HDD4TB3/SuperGAT$ ./run_main.sh 
Args PPRINT: GAT-Cora-EV13NSO8-ES
        - att_lambda: 11.346574532931719
        - attention_type: prob_mask_only
        - batch_size: 128
        - checkpoint_dir: ../checkpoints
        - custom_key: EV13NSO8-ES
        - data_num_splits: 1
        - data_root: ~/graph-data
        - data_sampler: None
        - data_sampling_num_hops: None
        - data_sampling_size: None
        - dataset_class: Planetoid
        - dataset_name: Cora
        - dropout: 0.6
        - early_stop_patience: 485
        - early_stop_queue_length: 484
        - early_stop_threshold_loss: 0.009317052513589488
        - early_stop_threshold_perf: 0.0011587124922279313
        - edge_sampling_ratio: 0.8
        - epochs: 490
        - gpu_deny_list: [1, 2, 3]
        - heads: 8
        - is_cgat_full: False
        - is_cgat_ssnc: False
        - is_link_gnn: False
        - is_super_gat: True
        - l1_lambda: 0.0
        - l2_lambda: 0.008228864972965771
        - link_lambda: 0.0
        - loss: None
        - lr: 0.005
        - m: 
        - model_name: GAT
        - neg_sample_ratio: 0.5
        - num_gpus_to_use: 1
        - num_gpus_total: 2
        - num_hidden_features: 8
        - num_layers: 2
        - out_heads: 8
        - perf_task_for_val: Node
        - perf_type: accuracy
        - pool_name: None
        - pretraining_noise_ratio: 0.0
        - save_model: False
        - save_plot: False
        - scaling_factor: None
        - seed: 42
        - start_epoch: 0
        - super_gat_criterion: None
        - task_type: Node_Transductive
        - to_undirected: False
        - to_undirected_at_neg: False
        - total_pretraining_epoch: 0
        - use_bn: False
        - use_early_stop: True
        - use_pretraining: False
        - val_interval: 1
        - verbose: 2
Use GPU the ID of which is [0]
## TRIAL 0 ##
Now loading dataset: Planetoid / Cora
SuperGATNet(
  (conv1): SuperGAT(1433, 8, heads=8, concat=True, att_type=prob_mask_only, nsr=0.5, pnr=0.0)
  (conv2): SuperGAT(64, 7, heads=8, concat=False, att_type=prob_mask_only, nsr=0.5, pnr=0.0)
)
Cannot load model, [Errno 2] No such file or directory: '../checkpoints/GAT-Cora-EV13NSO8-ES/d3d4807'
  0%|                                                                                                                | 0/490 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "SuperGAT/main.py", line 473, in <module>
    many_seeds_result = run_with_many_seeds(main_args, num_total_runs, gpu_id=alloc_gpu[0])
  File "SuperGAT/main.py", line 405, in run_with_many_seeds
    ret = run(_args, gpu_id=gpu_id, **kwargs)
  File "SuperGAT/main.py", line 309, in run
    train_loss = train_model(running_device, net, train_d, loss_func, adam_optim, epoch=epoch, _args=args)
  File "SuperGAT/main.py", line 98, in train_model
    attention_edge_index=getattr(batch, "train_edge_index", None))
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/model.py", line 89, in forward
    x = self.conv1(x, edge_index, batch=batch, **kwargs)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/layer.py", line 135, in forward
    propagated = self.propagate(edge_index, size=size, x=x)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 253, in propagate
    out = self.aggregate(out, **aggr_kwargs)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 288, in aggregate
    reduce=self.aggr)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_scatter/scatter.py", line 153, in scatter
    """
    if reduce == 'sum' or reduce == 'add': 
        return scatter_sum(src, index, dim, out, dim_size)
               ~~~~~~~~~~~ <--- HERE
    elif reduce == 'mean':
        return scatter_mean(src, index, dim, out, dim_size)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_scatter/scatter.py", line 13, in scatter_sum
                out: Optional[torch.Tensor] = None,
                dim_size: Optional[int] = None) -> torch.Tensor: 
    index = broadcast(index, src, dim)
            ~~~~~~~~~ <--- HERE
    if out is None:
        size = list(src.size())
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_scatter/utils.py", line 13, in broadcast
    for _ in range(src.dim(), other.dim()):
        src = src.unsqueeze(-1)
    src = src.expand_as(other)
          ~~~~~~~~~~~~~ <--- HERE
    return src
RuntimeError: The expanded size of the tensor (8) must match the existing size (13264) at non-singleton dimension 1.  Target sizes: [13264, 8, 8].  Tensor sizes: [1, 13264, 1]

dongkwan-kim · 2021-03-08T03:37:44Z

Hi, Gabby.

First, answers to the questions in the first issue.

Do I have to install the checkpoint model first before execution? Do you have a link?

No, you do not have to. In fact, I have implemented the save/load feature but never used them.

Also, because my based Nvidia driver is newer than the given PyTorch version, could you provide the general commands to install other dependencies aside from the Pytorch geometry?
I have used Docker image nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04 and run bash install.sh.
Maybe https://github.com/dongkwan-kim/SuperGAT#installation can help you.

Second, solutions that came to my mind, but never tested.

Many errors can be produced from the version mismatch of torch-geometric. This repository is using torch-geometric==1.4.3 and there are really many changes in torch-geometric including softmax API (that you have already fixed). Use torch-geometric==1.4.3, or you can use the example in torch-geometric repository https://github.com/rusty1s/pytorch_geometric/blob/master/examples/super_gat.py
We do not support TorchScript for now. Please turn it off when using our model.

Thank you!

GabbySuwichaya · 2021-03-09T06:19:15Z

@dongkwan-kim, Thank you for the answer...

Your second answer is pretty much the answer...

Maybe here is just a record for anyone who is interested in using CUDA 11.1 + Pyg + PyTorch 1.8 with your work.
At the moment, I only run a test on the example.

By the way, I didn't disable the TorchScript... For some reason, after I fixed the following, it is all running fine.

The following lines are my modification that makes it works.....

In SuperGAT/layer.py,

change from super(SuperGAT, self).__init__(aggr='add', **kwargs) to:

Line 35: super(SuperGAT, self).__init__(aggr='add', node_dim=0,  **kwargs)

and at Line 276 from softmax(alpha, edge_index_i, size_i) to

Line 276: softmax(alpha, edge_index_i, num_nodes=size_i)

dongkwan-kim · 2021-03-09T07:29:55Z

Thank you. I will update the part that you mentioned.

GabbySuwichaya · 2021-03-09T16:36:47Z

By the way, I have just realized that I have forgotten two lines before Line 276 softmax(alpha, edge_index_i, num_nodes=size_i)

N, H, C = x.size(0), self.heads, self.out_channels

and change from x = torch.matmul(x, self.weight) to

x = torch.matmul(x, self.weight).view(-1, H, C)

dongkwan-kim · 2021-03-10T02:33:57Z

Thank you again, Gabby.
All the changes are reflected and tested with the torch-geometric==1.4.3.

GabbySuwichaya closed this as completed Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do I have to install the checkpoint model first before execution? Do you have a link? #32

Do I have to install the checkpoint model first before execution? Do you have a link? #32

GabbySuwichaya commented Mar 6, 2021 •

edited

GabbySuwichaya commented Mar 6, 2021

dongkwan-kim commented Mar 8, 2021 •

edited

GabbySuwichaya commented Mar 9, 2021 •

edited

dongkwan-kim commented Mar 9, 2021

GabbySuwichaya commented Mar 9, 2021

dongkwan-kim commented Mar 10, 2021

Do I have to install the checkpoint model first before execution? Do you have a link? #32

Do I have to install the checkpoint model first before execution? Do you have a link? #32

Comments

GabbySuwichaya commented Mar 6, 2021 • edited

GabbySuwichaya commented Mar 6, 2021

dongkwan-kim commented Mar 8, 2021 • edited

GabbySuwichaya commented Mar 9, 2021 • edited

dongkwan-kim commented Mar 9, 2021

GabbySuwichaya commented Mar 9, 2021

dongkwan-kim commented Mar 10, 2021

GabbySuwichaya commented Mar 6, 2021 •

edited

dongkwan-kim commented Mar 8, 2021 •

edited

GabbySuwichaya commented Mar 9, 2021 •

edited