HIN2vec_pytorch

A pytorch implementation of HIN2vec

07/13/2021 add binary activation function

HIN2Vec learns distributed representations of nodes in heterogeneous information networks (HINs) by capturing the distiguishing metapath relationships between nodes. Please refer the paper here.

Please find more details in the original repo

Requirements

python3.6+
pytorch
numpy
pandas
networkx

Some differences with the original implementation

use Adam instead of SGD to optimize weights

I find Adam much more effective in training.

some tricks not implemented
- sampling the same type of nodes
- learning rate decay
may be more robust comparing to the original implementation

Maybe there is something wrong about my parameter setting, but when I do some experiments using the original code, the program won't continue with window set to more than 4. This pytorch code works fine.

this implementation is slower (It takes about 10 times of the time)

But is surly more easier to understand and modify. (And I would try to use a bigger window size to get comparable or better results.)

Usage

create your own edge.csv referring to demo_data.csv

The input graph is assumed to be undirected by default.

create a main.py in the project folder, copy the following code and modify it.

    import torch
    import pandas as pd
    import torch.nn as nn
    import torch.optim as optim
    from torch.utils.data import DataLoader
    
    from walker import load_a_HIN_from_pandas
    from model import NSTrainSet, HIN2vec, train

    # set method parameters
    window = 4
    walk = 10
    walk_length = 300
    embed_size = 100
    neg = 5
    sigmoid_reg = True 
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(f'device = {device}')

    # set dataset [PLEASE USE YOUR OWN DATASET TO REPLACE THIS]
    demo_edge = pd.read_csv('./demo_data.csv', index_col=0)

    edges = [demo_edge]

    print('finish loading edges')

    # init HIN
    hin = load_a_HIN_from_pandas(edges)
    hin.window = window

    dataset = NSTrainSet(hin.sample(walk_length, walk), hin.node_size, neg=neg)

    hin2vec = HIN2vec(hin.node_size, hin.path_size, embed_size, sigmoid_reg)

    # load model
    # hin2vec.load_state_dict(torch.load('hin2vec.pt'))

    # set training parameters
    n_epoch = 10
    batch_size = 20
    log_interval = 200

    data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
    optimizer = optim.AdamW(hin2vec.parameters())  # 原作者使用的是SGD？ 这里使用AdamW
    loss_function = nn.BCELoss()

    for epoch in range(n_epoch):
        train(log_interval, hin2vec, device, data_loader, optimizer, loss_function, epoch)

    torch.save(hin2vec, 'hin2vec.pt')

    # set output parameters [the output file is a bit different from the original code.]
    node_vec_fname = 'node_vec.txt'
    # path_vec_fname = 'meta_path_vec.txt'
    path_vec_fname = None
    
    print(f'saving node embedding vectors to {node_vec_fname}...')
    node_embeds = pd.DataFrame(hin2vec.start_embeds.weight.data.numpy())
    node_embeds.rename(hin.id2node).to_csv(node_vec_fname, sep=' ')

    if path_vec_fname:
        print(f'saving meta path embedding vectors to {path_vec_fname}...')
        path_embeds = pd.DataFrame(hin2vec.path_embeds.weight.data.numpy())
        path_embeds.rename(hin.id2path).to_csv(path_vec_fname, sep=' ')
       
    # save model
    # torch.save(hin2vec.state_dict(), 'hin2vec.pt')

About `data_demo.csv`

In this repository, the Heterogeneous Information Network is considered as an undirected weighted simple graph with multiple node types and multiple edge types. And each edge type is presented as "<node_type_A>-<node_type_B>" (e.g. "User-Item"), so there is at most one type of edges for each pair of node types. Besides, isolated nodes are not preferable and the only way to add a node into the graph is to add an edge with this node on one side.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo_data.csv		demo_data.csv
model.py		model.py
walker.py		walker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HIN2vec_pytorch

Requirements

Some differences with the original implementation

Usage

About `data_demo.csv`

About

Releases

Packages

Languages

License

e-yi/hin2vec_pytorch

Folders and files

Latest commit

History

Repository files navigation

HIN2vec_pytorch

Requirements

Some differences with the original implementation

Usage

About data_demo.csv

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

About `data_demo.csv`

Packages