Wrong test result on Cora #2

llooFlashooll · 2022-01-20T07:08:26Z

When I modified the shell file for the Cora dataset, and ran the command: bash experiments/gcn_exp.sh Cora. The test results only get around 47~48. And we all know that GCN as a classical model will get results around at 81.

I try hard to debug and modify the codes to get what’s wrong. But still be confused. Could you please give me an answer or a solution?

The text was updated successfully, but these errors were encountered:

cptq · 2022-01-20T19:58:25Z

Hi, what do you mean that you "modified the shell file"? Could you share your versions of pytorch and pytorch geometric?

Running bash experiments/gcn_exp.sh Cora on our repo, I get 82.06 test accuracy on the first hyperparameter setting.

llooFlashooll · 2022-01-21T01:11:39Z

So if you don't modify the shell gcn_exp.sh, I think you don't test on the Cora dataset. I modified gcn_exp.sh like:

dataset_lst=("Cora")
sub_dataset=${2:-"None"}

And actually If we just clone the repo without anymore modification, run bash experiments/gcn_exp.sh Cora will get the errors:

Traceback (most recent call last):
  File "main.py", line 46, in <module>
    split_idx_lst = load_fixed_splits(args.dataset, args.sub_dataset)
  File "..Non-Homophily-Large-Scale\data_utils.py", line 229, in load_fixed_splits
    assert dataset in splits_drive_url.keys()
AssertionError

So I modified main.py in line 42 like:

if args.rand_split or args.dataset in ['ogbn-proteins', 'wiki', 'Cora']:

And run bash experiments/gcn_exp.sh Cora get the wrong test results, in which the environment are torch=1.9.1+cu111, torch-geometric=2.0.3.

Also, if I run in your required environment torch=1.7.1+cu110, torch-geometric=1.6.3, will get the errors:

Traceback (most recent call last):
  File "main.py", line 36, in <module>
    dataset = load_nc_dataset(args.dataset, args.sub_dataset)
  File "..\Non-Homophily-Large-Scale\dataset.py", line 108, in load_nc_dataset
    dataset = load_planetoid_dataset(dataname)
  File "..\Non-Homophily-Large-Scale\dataset.py", line 310, in load_planetoid_dataset
    torch_dataset = Planetoid(root=f'{DATAPATH}/Planetoid',
  File "D:\Softwares\Anaconda\envs\non-hom\lib\site-packages\torch_geometric\datasets\planetoid.py", line 56, in __init__
    self.data, self.slices = torch.load(self.processed_paths[0])
  File "D:\Softwares\Anaconda\envs\non-hom\lib\site-packages\torch\serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "D:\Softwares\Anaconda\envs\non-hom\lib\site-packages\torch\serialization.py", line 853, in _load
    result = unpickler.load()
ModuleNotFoundError: No module named 'torch_geometric.data.storage'

Xiuyu-Li · 2022-01-21T03:28:13Z

Hi, can you try commenting out

Non-Homophily-Large-Scale/dataset.py

Line 336 in eb531f3

dataset.get_idx_split = planetoid_orig_split

and runnning

python main.py --dataset Cora --method gcn --num_layers 3 --hidden_channels 32 --lr 0.01 --rand_split --train_prop 0.48 --valid_prop 0.32 --runs 5 --weight_decay 5e-4 --no_bn

? This should be able to reproduce our GCN results in C.4.

llooFlashooll · 2022-01-21T04:55:58Z

I think by running on the command you provide, can get your GCN results.

But my question is still your work gets wrong test results on the original GCN data splits.
So we just don't comment out the code:

dataset.get_idx_split = planetoid_orig_split

Then the Cora dataset is split into train/valid/test = 140/500/1000, which is the original Cora data splits on semi-supervised task.
And the test results look like wrong. So I am thinking mayby something wrong with your repo's model etc?

cptq · 2022-01-21T15:22:32Z

I see, my fault I had some typos earlier. @llooFlashooll the reason you get low performance on the first run is that the first run uses a width 4 hidden layer. If you instead use a higher width, say width 64, then you will do better (I get 78% on the planetoid splits doing this). Playing around with the hyperparameters some more should get it up to 81% or higher.

Also, you can use bash experiments/gcn_single_exp.sh Cora, with the aforementioned fix on the fixed splits in main.py to run on Cora.

llooFlashooll · 2022-01-21T15:52:52Z

Really really thankful for your timely reply!
I try what you suggest. But when I change the hidden_channels size to 128, 256, 512 etc. It approximately gets 77%, not higher than 80%.

In classical blogs like: https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html#learning-methods-on-graphs, Paragraph Learning Methods on Graphs. Or https://github.com/pyg-team/pytorch_geometric/blob/master/examples/gcn.py

They build tiny network and can easily get up to 81% result.
I am sorry that I still wonder if there is something wrong with the model.

cptq · 2022-01-22T03:00:42Z

I get 81.50 +- .62 test accuracy using bash gcn_single_exp.sh Cora if you take the learning rate to be .01 (--lr .01), pass in --no_bn to disable batch norm, and take --hidden_channels 128. We tend to have batch norm enabled as it increases performance for large graphs, but Cora is quite a small graph.

It seems to be the hyperparameter choices that give low performance, which shows the necessity of using the full hyperparameter grid to test these methods!

llooFlashooll · 2022-02-04T04:22:09Z

With our test, very useful! Thanks for your guidance.

llooFlashooll closed this as completed Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong test result on Cora #2

Wrong test result on Cora #2

llooFlashooll commented Jan 20, 2022

cptq commented Jan 20, 2022

llooFlashooll commented Jan 21, 2022 •

edited

Xiuyu-Li commented Jan 21, 2022 •

edited

llooFlashooll commented Jan 21, 2022 •

edited

cptq commented Jan 21, 2022

llooFlashooll commented Jan 21, 2022

cptq commented Jan 22, 2022

llooFlashooll commented Feb 4, 2022

Wrong test result on Cora #2

Wrong test result on Cora #2

Comments

llooFlashooll commented Jan 20, 2022

cptq commented Jan 20, 2022

llooFlashooll commented Jan 21, 2022 • edited

Xiuyu-Li commented Jan 21, 2022 • edited

llooFlashooll commented Jan 21, 2022 • edited

cptq commented Jan 21, 2022

llooFlashooll commented Jan 21, 2022

cptq commented Jan 22, 2022

llooFlashooll commented Feb 4, 2022

llooFlashooll commented Jan 21, 2022 •

edited

Xiuyu-Li commented Jan 21, 2022 •

edited

llooFlashooll commented Jan 21, 2022 •

edited