Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong test result on Cora #2

Closed
llooFlashooll opened this issue Jan 20, 2022 · 8 comments
Closed

Wrong test result on Cora #2

llooFlashooll opened this issue Jan 20, 2022 · 8 comments

Comments

@llooFlashooll
Copy link

When I modified the shell file for the Cora dataset, and ran the command: bash experiments/gcn_exp.sh Cora. The test results only get around 47~48. And we all know that GCN as a classical model will get results around at 81.

I try hard to debug and modify the codes to get what’s wrong. But still be confused. Could you please give me an answer or a solution?

@cptq
Copy link
Collaborator

cptq commented Jan 20, 2022

Hi, what do you mean that you "modified the shell file"? Could you share your versions of pytorch and pytorch geometric?

Running bash experiments/gcn_exp.sh Cora on our repo, I get 82.06 test accuracy on the first hyperparameter setting.

@llooFlashooll
Copy link
Author

llooFlashooll commented Jan 21, 2022

So if you don't modify the shell gcn_exp.sh, I think you don't test on the Cora dataset. I modified gcn_exp.sh like:

dataset_lst=("Cora")
sub_dataset=${2:-"None"}

And actually If we just clone the repo without anymore modification, run bash experiments/gcn_exp.sh Cora will get the errors:

Traceback (most recent call last):
  File "main.py", line 46, in <module>
    split_idx_lst = load_fixed_splits(args.dataset, args.sub_dataset)
  File "..Non-Homophily-Large-Scale\data_utils.py", line 229, in load_fixed_splits
    assert dataset in splits_drive_url.keys()
AssertionError

So I modified main.py in line 42 like:

if args.rand_split or args.dataset in ['ogbn-proteins', 'wiki', 'Cora']:

And run bash experiments/gcn_exp.sh Cora get the wrong test results, in which the environment are torch=1.9.1+cu111, torch-geometric=2.0.3.

Also, if I run in your required environment torch=1.7.1+cu110, torch-geometric=1.6.3, will get the errors:

Traceback (most recent call last):
  File "main.py", line 36, in <module>
    dataset = load_nc_dataset(args.dataset, args.sub_dataset)
  File "..\Non-Homophily-Large-Scale\dataset.py", line 108, in load_nc_dataset
    dataset = load_planetoid_dataset(dataname)
  File "..\Non-Homophily-Large-Scale\dataset.py", line 310, in load_planetoid_dataset
    torch_dataset = Planetoid(root=f'{DATAPATH}/Planetoid',
  File "D:\Softwares\Anaconda\envs\non-hom\lib\site-packages\torch_geometric\datasets\planetoid.py", line 56, in __init__
    self.data, self.slices = torch.load(self.processed_paths[0])
  File "D:\Softwares\Anaconda\envs\non-hom\lib\site-packages\torch\serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "D:\Softwares\Anaconda\envs\non-hom\lib\site-packages\torch\serialization.py", line 853, in _load
    result = unpickler.load()
ModuleNotFoundError: No module named 'torch_geometric.data.storage'

@Xiuyu-Li
Copy link
Member

Xiuyu-Li commented Jan 21, 2022

Hi, can you try commenting out

dataset.get_idx_split = planetoid_orig_split

and runnning python main.py --dataset Cora --method gcn --num_layers 3 --hidden_channels 32 --lr 0.01 --rand_split --train_prop 0.48 --valid_prop 0.32 --runs 5 --weight_decay 5e-4 --no_bn? This should be able to reproduce our GCN results in C.4.

@llooFlashooll
Copy link
Author

llooFlashooll commented Jan 21, 2022

I think by running on the command you provide, can get your GCN results.

But my question is still your work gets wrong test results on the original GCN data splits.
So we just don't comment out the code:

dataset.get_idx_split = planetoid_orig_split

Then the Cora dataset is split into train/valid/test = 140/500/1000, which is the original Cora data splits on semi-supervised task.
And the test results look like wrong. So I am thinking mayby something wrong with your repo's model etc?

@cptq
Copy link
Collaborator

cptq commented Jan 21, 2022

I see, my fault I had some typos earlier. @llooFlashooll the reason you get low performance on the first run is that the first run uses a width 4 hidden layer. If you instead use a higher width, say width 64, then you will do better (I get 78% on the planetoid splits doing this). Playing around with the hyperparameters some more should get it up to 81% or higher.

Also, you can use bash experiments/gcn_single_exp.sh Cora, with the aforementioned fix on the fixed splits in main.py to run on Cora.

@llooFlashooll
Copy link
Author

Really really thankful for your timely reply!
I try what you suggest. But when I change the hidden_channels size to 128, 256, 512 etc. It approximately gets 77%, not higher than 80%.

In classical blogs like: https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html#learning-methods-on-graphs, Paragraph Learning Methods on Graphs. Or https://github.com/pyg-team/pytorch_geometric/blob/master/examples/gcn.py

They build tiny network and can easily get up to 81% result.
I am sorry that I still wonder if there is something wrong with the model.

@cptq
Copy link
Collaborator

cptq commented Jan 22, 2022

I get 81.50 +- .62 test accuracy using bash gcn_single_exp.sh Cora if you take the learning rate to be .01 (--lr .01), pass in --no_bn to disable batch norm, and take --hidden_channels 128. We tend to have batch norm enabled as it increases performance for large graphs, but Cora is quite a small graph.

It seems to be the hyperparameter choices that give low performance, which shows the necessity of using the full hyperparameter grid to test these methods!

@llooFlashooll
Copy link
Author

With our test, very useful! Thanks for your guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants