Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New user questions #112

Closed
fmellomascarenhas opened this issue Jun 17, 2022 · 3 comments
Closed

New user questions #112

fmellomascarenhas opened this issue Jun 17, 2022 · 3 comments

Comments

@fmellomascarenhas
Copy link

Hi all, thanks for making this library available. I am trying to use it for my benchmarks, but I am having a bit of trouble.

I want to evaluate my own dataset for recommendation. In the website, there is an example only for node classification. I started to dig in the git repository and found an example for link_prediction under examples/customization.

I decided to settle for link_prediction, because I don't know what would be the equivalent to AsLinkPredictionDataset for recommendation.

I want to compute hits@k, but it is not clear where to change the metric, since I couldn't find the It as an input of AsLinkPredictionDataset, config.ini or OpenHGNN, so I have no idea how to change it.

In OGB benchmarks, they do the hits@k by providing a neg_df and a positive_df and comparing scores_pos > scores_neg. Maybe this could be part of the link_prediction pipeline to support hits@k?

I could also calculate the metric on my own if I could save the predictions, but it is not clear how to do inference or access the model after it is trained. I couldn't find it in the tutorials or examples.

In summary:
It would be nice to have:

  1. Tutorial for recommendation system;
  2. How to change metrics;
  3. Can I calculate hits@K in OGB style, where I compare hits@K on a given neg set?
  4. How to save/load the trained model and do inference?

thanks very much!!
Felipe

my code

import torch as th
from openhgnn.dataset import AsLinkPredictionDataset, generate_random_hg
from dgl import transforms as T
from dgl import DGLHeteroGraph
from dgl.data import DGLDataset
from dgl.dataloading.negative_sampler import GlobalUniform
import os
import numpy as np
meta_paths_dict ={}#{'APA': [('author', 'author-paper', 'paper'), ('paper', 'rev_author-paper', 'author')]}
target_link = [('DRUG', 'DRUG_DIS', 'DIS')]

class MySplitLPDatasetWithNegEdges(DGLDataset):
    def __init__(self):
        super().__init__(name='my-split-lp-dataset-with-neg-edges',
                         force_reload=True)

    def process(self):
        hg, neg_edges = np.load('pathtomydataset.npy'), allow_pickle=True)
        self._neg_val_edges, self._neg_test_edges = neg_edges['valid'], neg_edges['test']
        self._g = hg

    @property
    def neg_val_edges(self):
        return self._neg_val_edges

    @property
    def neg_test_edges(self):
        return self._neg_test_edges

    @property
    def meta_paths_dict(self):
        return meta_paths_dict

    def __getitem__(self, idx):
        return self._g

    def __len__(self):
        return 1


def train_with_custom_lp_dataset(dataset):
    from openhgnn.config import Config
    from openhgnn.start import OpenHGNN
    config_file = ["../../openhgnn/config.ini"]
    config = Config(file_path=config_file, model='RGCN', dataset=dataset, task='link_prediction', gpu=-1)
    OpenHGNN(args=config)

if __name__ == '__main__':
    mySplitLPDatasetWithNegEdges = AsLinkPredictionDataset(MySplitLPDatasetWithNegEdges(), target_link=target_link,
                                                           target_link_r=None,
                                                           force_reload=True)
    train_with_custom_lp_dataset(mySplitLPDatasetWithNegEdges)
@dddg617
Copy link
Collaborator

dddg617 commented Jun 24, 2022

Thank you for your comments.

  1. So far, we have only one model KGCN that can support recommendation system. We will consider it after the number of relevant models increases.
  2. We have not yet designed the relevant interface. If you do want to change, you can change trainerflow: self.task.get_evaluator() to change the metric.
  3. Still, we do not have the hits@k metric, but we have already implemented some knowledge graph models which contains hits@10 metric. I think this can be a reference. The models are here.
  4. The trained models are saved in /openhgnn/output/(model name), but our system do not support directly load models and do inference. Users may load models and perform downstream tasks themselves.

All right, we may consider these as our future plans for openhgnn. Thank you again.

@fmellomascarenhas
Copy link
Author

Thanks for your reply @dddg617 !

Regarding your last point, I am afraid that your current pipeline doesn't save the models, at least not using the script under examples/customization. It only saves the logs.

I checked your code for parts doing something like torch.save and checkpoint, and apparently this is only called if early stopping happens. But I checked the code very briefly, so I might be wrong.

@dddg617
Copy link
Collaborator

dddg617 commented Jun 26, 2022

All right, for the last point, currently, we do not support saving models in examples/customization. But we support this in openhgnn/trainerflow. If you use our previous way to run the script, you will get the file .pt in openhgnn/output/{model name}. We will add the same function in examples/customization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants