Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Data #2

Open
Roh15 opened this issue May 11, 2023 · 9 comments
Open

Custom Data #2

Roh15 opened this issue May 11, 2023 · 9 comments

Comments

@Roh15
Copy link

Roh15 commented May 11, 2023

Hello,

I wanted to run NCNC on my own dataset. Can you guide me as to how to do so?

Thank you!

@Xi-yuanWang
Copy link
Contributor

Xi-yuanWang commented May 16, 2023

Hi,

Please add a branch in ogbdataset.py (line 30-41) as follows,

if name in ["YourDatasetName"]:
        dataset = ... # some pyg dataset object
        split_edge = randomsplit(dataset) # random split  dataset like this. Or create a split_edge dict as follows.
       '''
       split_edge['train']['edge'] = # tensor of shape [*, 2]
       split_edge['valid']['edge'] = # tensor of shape [*, 2]
       split_edge['valid']['edge_neg'] = # tensor of shape [*, 2]
       split_edge['test']['edge'] = # tensor of shape [*, 2]
       split_edge['test']['edge_neg'] = # tensor of shape [*, 2]
       '''
        data = dataset[0]
        data.edge_index = to_undirected(split_edge["train"]["edge"].t())
        edge_index = data.edge_index
        data.num_nodes = # number of nodes in the graph

And use --dataset YourDatasetName option when running NeighborOverlap.py.

We will also try to provide a more convenient method next week.

Sincerely,
Xiyuan Wang

@Roh15
Copy link
Author

Roh15 commented May 17, 2023

Can the model work with a graph where the nodes don't have any features?
I would imagine that the algorithm should work since you are coming up with your own features for each node, if I'm understanding it right.

The code references data.x at a bunch of places, but what if my nodes don't have any features?
A torch geometric graph does not need node features.

Thank you!

@Xi-yuanWang
Copy link
Contributor

Hi,

Yes, our model can work with a graph where the nodes don't have any features. In fact, ogbl-ddi, one of our benckmark, also have no node feature.

If the data have no node feature, data.x is None.

Sincerely,
Xiyuan Wang

@Roh15
Copy link
Author

Roh15 commented May 18, 2023

I greatly appreciate your swift responses.
I have been trying to modify the code to make it work on my dataset. It is simply a bipartite graph with edge weights. I am unable to do so. I await your more convenient method eagerly.
Thank you!

@Xi-yuanWang
Copy link
Contributor

Could you please provide more details on on why you cannot? Or please give me a demo data. It will help us to design a better way to incorporate a new dataset.

@Roh15
Copy link
Author

Roh15 commented May 18, 2023

Here is a sample from the dataset I am using.
NetworkX graph
Torch Geometric Data Object

I use the following to run your code:
Namespace(use_valedges_as_input=False, epochs=40, runs=3, dataset='Sample', batch_size=8192, testbs=8192, maskinput=False, mplayers=1, nnlayers=3, hiddim=32, ln=False, lnnn=False, res=False, jk=False, gnndp=0.3, xdp=0.3, tdp=0.3, gnnedp=0.3, predp=0.3, preedp=0.3, gnnlr=0.0003, prelr=0.0003, beta=1, alpha=1, use_xlin=False, tailact=False, twolayerlin=False, increasealpha=False, splitsize=131072, probscale=5, proboffset=3, pt=0.5, learnpt=False, trndeg=-1, tstdeg=-1, cndeg=-1, predictor='incn1cn1', depth=2, model='puregcn', save_gemb=True, load=None, loadmod=False, savemod=True, savex=True, loadx=False, cnprob=0)

The error:
RuntimeError: result type Float can't be cast to the desired output type Long
at line 65 in NeighborOverlap.py

To the original code I added the following in loaddataset() in ogbdataset.py

elif name in ["Sample"]:
        with open('sample_data_nx_graph.pkl', 'rb') as f:
            G = pickle.load(f)
        pyg_graph = from_networkx(G)
        dataset = [pyg_graph]
        split_edge = randomsplit(dataset)  # random split  dataset like this. Or create a split_edge dict as follows.
        data = dataset[0]
        data.edge_index = to_undirected(split_edge["train"]["edge"].t())
        edge_index = data.edge_index
        # copied from branch elif name == "ddi": 
        data.x = torch.arange(data.num_nodes)
        data.max_x = data.num_nodes

@Xi-yuanWang
Copy link
Contributor

Hello. Thank you for your demo data. We have updated a new branch refactor. With this branch, you can directed put network data at the NeuralCommonNeighbor directory and run a command like python NeighborOverlap.py --xdp 0.7 --tdp 0.3 --pt 0.75 --gnnedp 0.0 --preedp 0.4 --predp 0.05 --gnndp 0.05 --probscale 4.3 --proboffset 2.8 --alpha 1.0 --gnnlr 0.0043 --prelr 0.0024 --batch_size 65536 --ln --lnnn --predictor cn1 --dataset Sample --epochs 100 --runs 10 --model puregcn --hiddim 256 --mplayers 1 --testbs 8192 --maskinput --jk --use_xlin --tailact

@Roh15
Copy link
Author

Roh15 commented May 30, 2023

Hi!
Thank you so much it seems to be running now.
Our deadline is fast approaching so I am asking the question in advance -
Is it possible to get a completed graph (after link prediction) using the model after it has trained? And is there a way to do binary link prediction or a thresholded weighted link prediction on a graph?

Again thank you for your help. It is immensely valuable.

@Xi-yuanWang
Copy link
Contributor

Please pull our update in the refactor branch.

  • Is it possible to get a completed graph (after link prediction) using the model after it has trained?

Yes, but it takes $O(N^2)$ time, where $N$ is the number of nodes. On the demo data and 4090 GPU, it takes 5 minutes. You can use --predictfullgraph option to save an NxN matrix to adj.pt. Each element $x_ij$ in the matrix is the prediction of the link between node $i$ and $j$. $1/(1+e^{-x_{ij}})$ is the probability that the link exist.

To save the trained model, use --savemod option. To load saved model and generate full adjacency matrix only, use --loadmod option and set epochs to 0 and runs to 1.

  • Is there a way to do binary link prediction or a thresholded weighted link prediction on a graph?

Currently, binary link prediction is available. To predict edge weight, you must change the loss to square loss (y changed to edge weight, and negative links are considered as edge weight 0) or NLLLoss (discretize edge weight to multiple classes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants