Custom Data #2

Roh15 · 2023-05-11T21:44:33Z

Hello,

I wanted to run NCNC on my own dataset. Can you guide me as to how to do so?

Thank you!

Xi-yuanWang · 2023-05-16T14:32:27Z

Hi,

Please add a branch in ogbdataset.py (line 30-41) as follows,

if name in ["YourDatasetName"]:
        dataset = ... # some pyg dataset object
        split_edge = randomsplit(dataset) # random split  dataset like this. Or create a split_edge dict as follows.
       '''
       split_edge['train']['edge'] = # tensor of shape [*, 2]
       split_edge['valid']['edge'] = # tensor of shape [*, 2]
       split_edge['valid']['edge_neg'] = # tensor of shape [*, 2]
       split_edge['test']['edge'] = # tensor of shape [*, 2]
       split_edge['test']['edge_neg'] = # tensor of shape [*, 2]
       '''
        data = dataset[0]
        data.edge_index = to_undirected(split_edge["train"]["edge"].t())
        edge_index = data.edge_index
        data.num_nodes = # number of nodes in the graph

And use --dataset YourDatasetName option when running NeighborOverlap.py.

We will also try to provide a more convenient method next week.

Sincerely,
Xiyuan Wang

Roh15 · 2023-05-17T23:50:29Z

Can the model work with a graph where the nodes don't have any features?
I would imagine that the algorithm should work since you are coming up with your own features for each node, if I'm understanding it right.

The code references data.x at a bunch of places, but what if my nodes don't have any features?
A torch geometric graph does not need node features.

Thank you!

Xi-yuanWang · 2023-05-18T02:03:10Z

Hi,

Yes, our model can work with a graph where the nodes don't have any features. In fact, ogbl-ddi, one of our benckmark, also have no node feature.

If the data have no node feature, data.x is None.

Sincerely,
Xiyuan Wang

Roh15 · 2023-05-18T03:21:50Z

I greatly appreciate your swift responses.
I have been trying to modify the code to make it work on my dataset. It is simply a bipartite graph with edge weights. I am unable to do so. I await your more convenient method eagerly.
Thank you!

Xi-yuanWang · 2023-05-18T03:50:47Z

Could you please provide more details on on why you cannot? Or please give me a demo data. It will help us to design a better way to incorporate a new dataset.

Roh15 · 2023-05-18T07:13:57Z

Here is a sample from the dataset I am using.
NetworkX graph
Torch Geometric Data Object

I use the following to run your code:
Namespace(use_valedges_as_input=False, epochs=40, runs=3, dataset='Sample', batch_size=8192, testbs=8192, maskinput=False, mplayers=1, nnlayers=3, hiddim=32, ln=False, lnnn=False, res=False, jk=False, gnndp=0.3, xdp=0.3, tdp=0.3, gnnedp=0.3, predp=0.3, preedp=0.3, gnnlr=0.0003, prelr=0.0003, beta=1, alpha=1, use_xlin=False, tailact=False, twolayerlin=False, increasealpha=False, splitsize=131072, probscale=5, proboffset=3, pt=0.5, learnpt=False, trndeg=-1, tstdeg=-1, cndeg=-1, predictor='incn1cn1', depth=2, model='puregcn', save_gemb=True, load=None, loadmod=False, savemod=True, savex=True, loadx=False, cnprob=0)

The error:
RuntimeError: result type Float can't be cast to the desired output type Long
at line 65 in NeighborOverlap.py

To the original code I added the following in loaddataset() in ogbdataset.py

elif name in ["Sample"]:
        with open('sample_data_nx_graph.pkl', 'rb') as f:
            G = pickle.load(f)
        pyg_graph = from_networkx(G)
        dataset = [pyg_graph]
        split_edge = randomsplit(dataset)  # random split  dataset like this. Or create a split_edge dict as follows.
        data = dataset[0]
        data.edge_index = to_undirected(split_edge["train"]["edge"].t())
        edge_index = data.edge_index
        # copied from branch elif name == "ddi": 
        data.x = torch.arange(data.num_nodes)
        data.max_x = data.num_nodes

Xi-yuanWang · 2023-05-28T17:19:37Z

Hello. Thank you for your demo data. We have updated a new branch refactor. With this branch, you can directed put network data at the NeuralCommonNeighbor directory and run a command like python NeighborOverlap.py --xdp 0.7 --tdp 0.3 --pt 0.75 --gnnedp 0.0 --preedp 0.4 --predp 0.05 --gnndp 0.05 --probscale 4.3 --proboffset 2.8 --alpha 1.0 --gnnlr 0.0043 --prelr 0.0024 --batch_size 65536 --ln --lnnn --predictor cn1 --dataset Sample --epochs 100 --runs 10 --model puregcn --hiddim 256 --mplayers 1 --testbs 8192 --maskinput --jk --use_xlin --tailact

Roh15 · 2023-05-30T04:26:15Z

Hi!
Thank you so much it seems to be running now.
Our deadline is fast approaching so I am asking the question in advance -
Is it possible to get a completed graph (after link prediction) using the model after it has trained? And is there a way to do binary link prediction or a thresholded weighted link prediction on a graph?

Again thank you for your help. It is immensely valuable.

Xi-yuanWang · 2023-05-30T06:30:41Z

Please pull our update in the refactor branch.

Is it possible to get a completed graph (after link prediction) using the model after it has trained?

Yes, but it takes $O(N^2)$ time, where $N$ is the number of nodes. On the demo data and 4090 GPU, it takes 5 minutes. You can use --predictfullgraph option to save an NxN matrix to adj.pt. Each element $x_ij$ in the matrix is the prediction of the link between node $i$ and $j$. $1/(1+e^{-x_{ij}})$ is the probability that the link exist.

To save the trained model, use --savemod option. To load saved model and generate full adjacency matrix only, use --loadmod option and set epochs to 0 and runs to 1.

Is there a way to do binary link prediction or a thresholded weighted link prediction on a graph?

Currently, binary link prediction is available. To predict edge weight, you must change the loss to square loss (y changed to edge weight, and negative links are considered as edge weight 0) or NLLLoss (discretize edge weight to multiple classes).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Data #2

Custom Data #2

Roh15 commented May 11, 2023

Xi-yuanWang commented May 16, 2023 •

edited

Loading

Roh15 commented May 17, 2023

Xi-yuanWang commented May 18, 2023

Roh15 commented May 18, 2023

Xi-yuanWang commented May 18, 2023

Roh15 commented May 18, 2023 •

edited

Loading

Xi-yuanWang commented May 28, 2023

Roh15 commented May 30, 2023

Xi-yuanWang commented May 30, 2023

Custom Data #2

Custom Data #2

Comments

Roh15 commented May 11, 2023

Xi-yuanWang commented May 16, 2023 • edited Loading

Roh15 commented May 17, 2023

Xi-yuanWang commented May 18, 2023

Roh15 commented May 18, 2023

Xi-yuanWang commented May 18, 2023

Roh15 commented May 18, 2023 • edited Loading

Xi-yuanWang commented May 28, 2023

Roh15 commented May 30, 2023

Xi-yuanWang commented May 30, 2023

Xi-yuanWang commented May 16, 2023 •

edited

Loading

Roh15 commented May 18, 2023 •

edited

Loading