Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do I have to install the checkpoint model first before execution? Do you have a link? #32

Closed
GabbySuwichaya opened this issue Mar 6, 2021 · 6 comments

Comments

@GabbySuwichaya
Copy link

GabbySuwichaya commented Mar 6, 2021

Receive an error when executing the demo...

Here I have used the following command...

python3 SuperGAT/main.py --dataset-class Planetoid --dataset-name Cora --custom-key EV13NSO8-ES --num-gpus-total 2

And the errors are

  1. Cannot load model, [Errno 2] No such file or directory: '../checkpoints/GAT-Cora-EV13NSO8-ES/d3d4807'
  2. Then, I get Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

Questions:

  • Do I have to install the checkpoint model first before execution? Do you have a link?

  • Also, because my based Nvidia driver is newer than the given PyTorch version, could you provide the general commands to install other dependencies aside from the Pytorch geometry?

I think I have installed all of them, but I may miss some packages then I get this error.....

Full displayed error:

Args PPRINT: GAT-Cora-EV13NSO8-ES
        - att_lambda: 11.346574532931719
        - attention_type: prob_mask_only
        - batch_size: 128
        - checkpoint_dir: ../checkpoints
        - custom_key: EV13NSO8-ES
        - data_num_splits: 1
        - data_root: ~/graph-data
        - data_sampler: None
        - data_sampling_num_hops: None
        - data_sampling_size: None
        - dataset_class: Planetoid
        - dataset_name: Cora
        - dropout: 0.6
        - early_stop_patience: 485
        - early_stop_queue_length: 484
        - early_stop_threshold_loss: 0.009317052513589488
        - early_stop_threshold_perf: 0.0011587124922279313
        - edge_sampling_ratio: 0.8
        - epochs: 490
        - gpu_deny_list: [1, 2, 3]
        - heads: 8
        - is_cgat_full: False
        - is_cgat_ssnc: False
        - is_link_gnn: False
        - is_super_gat: True
        - l1_lambda: 0.0
        - l2_lambda: 0.008228864972965771
        - link_lambda: 0.0
        - loss: None
        - lr: 0.005
        - m: 
        - model_name: GAT
        - neg_sample_ratio: 0.5
        - num_gpus_to_use: 1
        - num_gpus_total: 2
        - num_hidden_features: 8
        - num_layers: 2
        - out_heads: 8
        - perf_task_for_val: Node
        - perf_type: accuracy
        - pool_name: None
        - pretraining_noise_ratio: 0.0
        - save_model: False
        - save_plot: False
        - scaling_factor: None
        - seed: 42
        - start_epoch: 0
        - super_gat_criterion: None
        - task_type: Node_Transductive
        - to_undirected: False
        - to_undirected_at_neg: False
        - total_pretraining_epoch: 0
        - use_bn: False
        - use_early_stop: True
        - use_pretraining: False
        - val_interval: 1
        - verbose: 2
Use GPU the ID of which is [0]
## TRIAL 0 ##
Now loading dataset: Planetoid / Cora
SuperGATNet(
  (conv1): SuperGAT(1433, 8, heads=8, concat=True, att_type=prob_mask_only, nsr=0.5, pnr=0.0)
  (conv2): SuperGAT(64, 7, heads=8, concat=False, att_type=prob_mask_only, nsr=0.5, pnr=0.0)
)
Cannot load model, [Errno 2] No such file or directory: '../checkpoints/GAT-Cora-EV13NSO8-ES/d3d4807'
  0%|                                                                                                        | 0/490 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "SuperGAT/main.py", line 471, in <module>
    many_seeds_result = run_with_many_seeds(main_args, num_total_runs, gpu_id=alloc_gpu[0])
  File "SuperGAT/main.py", line 403, in run_with_many_seeds
    ret = run(_args, gpu_id=gpu_id, **kwargs)
  File "SuperGAT/main.py", line 307, in run
    train_loss = train_model(running_device, net, train_d, loss_func, adam_optim, epoch=epoch, _args=args)
  File "SuperGAT/main.py", line 97, in train_model
    attention_edge_index=getattr(batch, "train_edge_index", None))
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/model.py", line 89, in forward
    x = self.conv1(x, edge_index, batch=batch, **kwargs)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/layer.py", line 135, in forward
    propagated = self.propagate(edge_index, size=size, x=x)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 237, in propagate
    out = self.message(**msg_kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/layer.py", line 206, in message
    alpha = self._get_attention(edge_index_i, x_i, x_j, size_i)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/layer.py", line 276, in _get_attention
    alpha = softmax(alpha, edge_index_i, size_i)
RuntimeError: softmax() Expected a value of type 'Optional[Tensor]' for argument 'ptr' but instead found type 'int'.
Position: 2
Value: 2708
Declaration: softmax(Tensor src, Tensor? index, Tensor? ptr=None, int? num_nodes=None) -> (Tensor)
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)
@GabbySuwichaya
Copy link
Author

Also, even after I fixed the softmax parameters in your script

from

softmax(alpha, edge_index_i,  size_i) 

to

softmax(alpha, edge_index_i, num_nodes=size_i) 

I still get the following error....... Can you please help?

(SuperGAT) gabby-suwichaya@gabby-suwichaya:/mnt/HDD4TB3/SuperGAT$ ./run_main.sh 
Args PPRINT: GAT-Cora-EV13NSO8-ES
        - att_lambda: 11.346574532931719
        - attention_type: prob_mask_only
        - batch_size: 128
        - checkpoint_dir: ../checkpoints
        - custom_key: EV13NSO8-ES
        - data_num_splits: 1
        - data_root: ~/graph-data
        - data_sampler: None
        - data_sampling_num_hops: None
        - data_sampling_size: None
        - dataset_class: Planetoid
        - dataset_name: Cora
        - dropout: 0.6
        - early_stop_patience: 485
        - early_stop_queue_length: 484
        - early_stop_threshold_loss: 0.009317052513589488
        - early_stop_threshold_perf: 0.0011587124922279313
        - edge_sampling_ratio: 0.8
        - epochs: 490
        - gpu_deny_list: [1, 2, 3]
        - heads: 8
        - is_cgat_full: False
        - is_cgat_ssnc: False
        - is_link_gnn: False
        - is_super_gat: True
        - l1_lambda: 0.0
        - l2_lambda: 0.008228864972965771
        - link_lambda: 0.0
        - loss: None
        - lr: 0.005
        - m: 
        - model_name: GAT
        - neg_sample_ratio: 0.5
        - num_gpus_to_use: 1
        - num_gpus_total: 2
        - num_hidden_features: 8
        - num_layers: 2
        - out_heads: 8
        - perf_task_for_val: Node
        - perf_type: accuracy
        - pool_name: None
        - pretraining_noise_ratio: 0.0
        - save_model: False
        - save_plot: False
        - scaling_factor: None
        - seed: 42
        - start_epoch: 0
        - super_gat_criterion: None
        - task_type: Node_Transductive
        - to_undirected: False
        - to_undirected_at_neg: False
        - total_pretraining_epoch: 0
        - use_bn: False
        - use_early_stop: True
        - use_pretraining: False
        - val_interval: 1
        - verbose: 2
Use GPU the ID of which is [0]
## TRIAL 0 ##
Now loading dataset: Planetoid / Cora
SuperGATNet(
  (conv1): SuperGAT(1433, 8, heads=8, concat=True, att_type=prob_mask_only, nsr=0.5, pnr=0.0)
  (conv2): SuperGAT(64, 7, heads=8, concat=False, att_type=prob_mask_only, nsr=0.5, pnr=0.0)
)
Cannot load model, [Errno 2] No such file or directory: '../checkpoints/GAT-Cora-EV13NSO8-ES/d3d4807'
  0%|                                                                                                                | 0/490 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "SuperGAT/main.py", line 473, in <module>
    many_seeds_result = run_with_many_seeds(main_args, num_total_runs, gpu_id=alloc_gpu[0])
  File "SuperGAT/main.py", line 405, in run_with_many_seeds
    ret = run(_args, gpu_id=gpu_id, **kwargs)
  File "SuperGAT/main.py", line 309, in run
    train_loss = train_model(running_device, net, train_d, loss_func, adam_optim, epoch=epoch, _args=args)
  File "SuperGAT/main.py", line 98, in train_model
    attention_edge_index=getattr(batch, "train_edge_index", None))
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/model.py", line 89, in forward
    x = self.conv1(x, edge_index, batch=batch, **kwargs)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/HDD4TB3/SuperGAT/SuperGAT/layer.py", line 135, in forward
    propagated = self.propagate(edge_index, size=size, x=x)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 253, in propagate
    out = self.aggregate(out, **aggr_kwargs)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 288, in aggregate
    reduce=self.aggr)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_scatter/scatter.py", line 153, in scatter
    """
    if reduce == 'sum' or reduce == 'add': 
        return scatter_sum(src, index, dim, out, dim_size)
               ~~~~~~~~~~~ <--- HERE
    elif reduce == 'mean':
        return scatter_mean(src, index, dim, out, dim_size)
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_scatter/scatter.py", line 13, in scatter_sum
                out: Optional[torch.Tensor] = None,
                dim_size: Optional[int] = None) -> torch.Tensor: 
    index = broadcast(index, src, dim)
            ~~~~~~~~~ <--- HERE
    if out is None:
        size = list(src.size())
  File "/home/gabby-suwichaya/anaconda3/envs/SuperGAT/lib/python3.6/site-packages/torch_scatter/utils.py", line 13, in broadcast
    for _ in range(src.dim(), other.dim()):
        src = src.unsqueeze(-1)
    src = src.expand_as(other)
          ~~~~~~~~~~~~~ <--- HERE
    return src
RuntimeError: The expanded size of the tensor (8) must match the existing size (13264) at non-singleton dimension 1.  Target sizes: [13264, 8, 8].  Tensor sizes: [1, 13264, 1]

@dongkwan-kim
Copy link
Owner

dongkwan-kim commented Mar 8, 2021

Hi, Gabby.

First, answers to the questions in the first issue.

Do I have to install the checkpoint model first before execution? Do you have a link?

No, you do not have to. In fact, I have implemented the save/load feature but never used them.

Also, because my based Nvidia driver is newer than the given PyTorch version, could you provide the general commands to install other dependencies aside from the Pytorch geometry?
I have used Docker image nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04 and run bash install.sh.
Maybe https://github.com/dongkwan-kim/SuperGAT#installation can help you.

Second, solutions that came to my mind, but never tested.

  • Many errors can be produced from the version mismatch of torch-geometric. This repository is using torch-geometric==1.4.3 and there are really many changes in torch-geometric including softmax API (that you have already fixed). Use torch-geometric==1.4.3, or you can use the example in torch-geometric repository https://github.com/rusty1s/pytorch_geometric/blob/master/examples/super_gat.py
  • We do not support TorchScript for now. Please turn it off when using our model.

Thank you!

@GabbySuwichaya
Copy link
Author

GabbySuwichaya commented Mar 9, 2021

@dongkwan-kim, Thank you for the answer...

Your second answer is pretty much the answer...

  • Maybe here is just a record for anyone who is interested in using CUDA 11.1 + Pyg + PyTorch 1.8 with your work.

  • At the moment, I only run a test on the example.

By the way, I didn't disable the TorchScript... For some reason, after I fixed the following, it is all running fine.

The following lines are my modification that makes it works.....

In SuperGAT/layer.py,

change from super(SuperGAT, self).__init__(aggr='add', **kwargs) to:

Line 35: super(SuperGAT, self).__init__(aggr='add', node_dim=0,  **kwargs)

and at Line 276 from softmax(alpha, edge_index_i, size_i) to

Line 276: softmax(alpha, edge_index_i, num_nodes=size_i) 

@dongkwan-kim
Copy link
Owner

Thank you. I will update the part that you mentioned.

@GabbySuwichaya
Copy link
Author

By the way, I have just realized that I have forgotten two lines before Line 276 softmax(alpha, edge_index_i, num_nodes=size_i)

N, H, C = x.size(0), self.heads, self.out_channels

and change from x = torch.matmul(x, self.weight) to

x = torch.matmul(x, self.weight).view(-1, H, C)

@dongkwan-kim
Copy link
Owner

Thank you again, Gabby.
All the changes are reflected and tested with the torch-geometric==1.4.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants