The following code is for HDGL which uses HD-Computing Operations. Clearly, Pytorch doesn't support operations such as Bit-wise Majority (For Bundling). This is why we translate 0/1 vectors to 1/-1 vectors as the Bundle Operator becomes signed addition in $\{1,-1\}^{\beta}$ space. Similarly, Binding operator which is XOR operation in $\{0,1\}^{\beta}$ space is the multiplication operation in the $\{1,-1\}^{\beta}$ space.



To summarize, below are the details for the bipolar counterparts:-


*    Space: $\{0,1\}^{\beta} \longleftrightarrow \{1,-1\}^{\beta}$

*    Bundle: Bitwise Majority $\longleftrightarrow$ Signed Addition

*    Binding: XOR $\longleftrightarrow$ Multiplication

We now describe the pseudo-code to make things easier:-

Given:-
*   feat (Features of N nodes)
*   G (graph with adjacnency information for N nodes)
*   Train_val_nodes (ids of train and val nodes)
*   Test_nodes (ids of test nodes)
*   Number_of_Labels
*   Labels_train_val (Labels for train which are from index 0 to Number_of_Labels-1)



Psuedo-code:
Let β=50k
1.   Create Bipolar HD-vectors for each Labels; stored as Labels_HD_Vectors (size= (Number_of_Labels, 50k))
2.   Project feat to $\{-1,1\}^{50k}$ space using RHPT to obtain feat_hashed.
3.   Convert feat_hashed to bipolar vectors.
4.   For i in train_and_validation nodes:-
       *   Sample 11 1-hop Neighbors of i
       *   Sample 21 2-hop Neighbors of i
       *   r_i = feat_hashed[i],
       *   R_1hop = Bundle(feat_hashed[1-hop_neighbors])
       *   R_2hop = Bundle(feat_hashed[2-hop_neighbors])
       *   z_i = Bind(r_i, \pi(R_1hop), \pi\pi(R_2hop))
       *   y_i = Labels_train_val[i] # get the label of train node
       *   Labels_HD_Vectors[y_i] = Labels_HD_Vectors[y_i] + z_i (additon part of Bundling)
5.    Labels_HD_Vectors = sign(Labels_HD_Vectors) # Bundle is complete
-------------------Train Part Ends Here-----------------


---


Now For Testing....
1.   For i in test nodes:-
       *   Sample 11 1-hop Neighbors of i
       *   Sample 21 2-hop Neighbors of i
       *   r_i = feat_hashed[i],
       *   R_1hop = Bundle(feat_hashed[1-hop_neighbors])
       *   R_2hop = Bundle(feat_hashed[2-hop_neighbors])
       *   z_i = Bind(r_i, \pi(R_1hop), \pi\pi(R_2hop))
       *   y_prediction_i = Find_index_nearest_neighbor of z_i ( Labels_HD_Vectors[0], Labels_HD_Vectors[1], ... )
       


In [1]:
%time
!nvcc --version

CPU times: user 3 µs, sys: 2 µs, total: 5 µs
Wall time: 7.15 µs
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


In [2]:
!nvidia-smi

/bin/bash: line 1: nvidia-smi: command not found


In [3]:
!pip uninstall torch -y
!pip uninstall torchvision -y
!pip uninstall torchaudio -y
!pip uninstall torchtext -y

Found existing installation: torch 2.5.0+cu121
Uninstalling torch-2.5.0+cu121:
  Successfully uninstalled torch-2.5.0+cu121
Found existing installation: torchvision 0.20.0+cu121
Uninstalling torchvision-0.20.0+cu121:
  Successfully uninstalled torchvision-0.20.0+cu121
Found existing installation: torchaudio 2.5.0+cu121
Uninstalling torchaudio-2.5.0+cu121:
  Successfully uninstalled torchaudio-2.5.0+cu121
[0m

In [4]:
!pip install torch==2.4

Collecting torch==2.4
  Downloading torch-2.4.0-cp310-cp310-manylinux1_x86_64.whl.metadata (26 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.4)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.4)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.4)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch==2.4)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.4)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.4)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.

In [5]:
!pip install  dgl -f https://data.dgl.ai/wheels/torch-2.4/cu121/repo.html
import os
os.environ["DGLBACKEND"] = "pytorch"
import dgl
import time
import numpy as np

Looking in links: https://data.dgl.ai/wheels/torch-2.4/cu121/repo.html
Collecting dgl
  Downloading https://data.dgl.ai/wheels/torch-2.4/cu121/dgl-2.4.0%2Bcu121-cp310-cp310-manylinux1_x86_64.whl (355.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m355.2/355.2 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: dgl
Successfully installed dgl-2.4.0+cu121


In [24]:
import sys
import torch
import scipy
from scipy.sparse import csr_matrix
from sklearn.metrics import pairwise_distances
from scipy.sparse import coo_matrix

In [25]:
class HDGL_utils_functions():

  def __init__(self, features_dimension, hash_length):
    self.random_A = torch.randn(features_dimension, hash_length)
    low = -2
    high = 2
    self.lmbda = (high - low) * torch.rand(hash_length) + low

    print("Here")

  def get_ids_labels(self, train_nodes_mask, val_nodes_mask, test_nodes_mask, labels_for_nodes):

    train_node_ids = torch.nonzero(train_nodes_mask).flatten()
    val_node_ids = torch.nonzero(val_nodes_mask).flatten()
    test_node_ids = torch.nonzero(test_nodes_mask).flatten()

    train_node_labels = labels_for_nodes[train_node_ids]
    val_node_labels = labels_for_nodes[val_node_ids]
    test_node_labels= labels_for_nodes[test_node_ids]

    return train_node_ids, train_node_labels, val_node_ids, val_node_labels, test_node_ids, test_node_labels

  def create_hash(self, features):
    r = torch.sparse.mm(features, self.random_A)
    r = r + self.lmbda
    r = (r > 0).float()
    r = self.convert_binary_to_bipolar(r)
    return r

  def convert_binary_to_bipolar(self, HD_vecs):
    return (2 * HD_vecs) -1

To run experiments on different dataset, change below

In [30]:
from dgl.data import CoraGraphDataset, CiteseerGraphDataset
dataset = CoraGraphDataset() # change here
num_classes = dataset.num_classes
print(num_classes)
g = dataset[0]

# get data split
train_mask = g.ndata['train_mask']
val_mask = g.ndata['val_mask']
test_mask = g.ndata['test_mask']

# get labels
labels = g.ndata['label']
feat = g.ndata['feat']



#---------------row normalzie
row_sum = torch.sum(feat, dim=1, keepdim=True)

# Avoid division by zero by adding a small epsilon
epsilon = 1e-8
row_sum = torch.where(row_sum == 0, torch.tensor(epsilon, dtype=row_sum.dtype, device=row_sum.device), row_sum)

# Normalize each row by dividing by its sum
normalized_features = feat / row_sum
feat = normalized_features

#---------------row normalzie end

print("Number of classes:-", torch.unique(labels))
print("Features dimension:-", feat.size()[1])

HDC_helper = HDGL_utils_functions(features_dimension =  feat.size()[1], hash_length=50000)
Labels_HD_Vectors = torch.randint(0, 2, size=(num_classes,50000))
Labels_HD_Vectors = HDC_helper.convert_binary_to_bipolar(Labels_HD_Vectors)

mask_Labels =  torch.randint(0, 2, size=Labels_HD_Vectors.size())
mask_Labels = HDC_helper.convert_binary_to_bipolar(mask_Labels)
mask_Labels = mask_Labels * 0.1

train_node_ids, train_node_labels, val_node_ids, val_node_labels, test_node_ids, test_node_labels = HDC_helper.get_ids_labels(train_mask, val_mask, test_mask, labels)

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.
7
Number of classes:- tensor([0, 1, 2, 3, 4, 5, 6])
Features dimension:- 1433
Here




---


Learning begins here


---



In [31]:
###########################
start_time = time.time()
###########################
feat = HDC_helper.create_hash(feat.to_sparse()) # Mapping features to HD-space using RHPT
###################
print("Mapping raw features to HD space done")
#####################
print("Starting to calculate HD latent representaion for nodes in the train/val set and creating Label HD Vectors")
###########################
sampled_neighbors = {}
g_2hop = dgl.transforms.khop_graph(g, 2)
train_val_nodes = torch.cat((train_node_ids, val_node_ids))
train_val_labels = torch.cat((train_node_labels, val_node_labels))
for node, node_label_1 in zip(train_val_nodes,train_val_labels):
    # Get 1-hop neighbors
    one_hop_neighbors = g.successors(node).numpy()

    if len(one_hop_neighbors) == 0:
      continue

    # Sample 11 1-hop neighbors
    sampled_one_hop = np.random.choice(one_hop_neighbors, size=11, replace=True)

    # Get 2-hop neighbors
    two_hop_neighbors = g_2hop.successors(node).numpy()

    if len(two_hop_neighbors) == 0:
      continue

    # Sample 21 2-hop neighbors
    sampled_two_hop = np.random.choice(two_hop_neighbors, size=21, replace=True)

    N_1hop = sampled_one_hop.tolist()

    N_2hop = sampled_two_hop.tolist()

    r_i = torch.sum((torch.unsqueeze(feat[node],0)), axis=0)

    R_1hop = torch.sum((feat[N_1hop]),axis=0)

    R_2hop = torch.sum((feat[N_2hop]),axis=0)

    R_1hop = torch.sign(R_1hop)
    R_2hop = torch.sign(R_2hop)

    R_1hop = torch.roll(R_1hop,-1) #rotate once
    R_2hop = torch.roll(R_2hop,-2) #rotate twice

    z_i = r_i * R_1hop * R_2hop

    y_i = node_label_1
    Labels_HD_Vectors[y_i.item()] = Labels_HD_Vectors[y_i.item()] + z_i


Labels_HD_Vectors = Labels_HD_Vectors + mask_Labels
Labels_HD_Vectors = torch.sign(Labels_HD_Vectors)
Labels_HD_Vectors = torch.where(Labels_HD_Vectors == -1, torch.tensor(0.0), torch.tensor(1.0)) # convert to binary

###################
end_time = time.time()
elapsed_time_seconds = end_time - start_time
print("Time Taken in Seconds", elapsed_time_seconds)
# Convert elapsed time to minutes
elapsed_time_minutes = elapsed_time_seconds / 60
print("Time Taken in Minutes", elapsed_time_minutes)
#####################

Mapping raw features to HD space done
Starting to calculate HD latent representaion for nodes in the train/val set and creating Label HD Vectors
Time Taken in Seconds 3.6730549335479736
Time Taken in Minutes 0.06121758222579956




---


Learning Phase Ends


---



Calcuating latent HD representation for Test nodes and predicting labels

In [32]:
# Compute neighborrd for test nodes

Test_nodes_label_preds = []
for node in test_node_ids:
    # Get 1-hop neighbors
    one_hop_neighbors = g.successors(node).numpy()

    if len(one_hop_neighbors) == 0: # No 1-hop neighbors for a test node then some default label
      Test_nodes_label_preds.append(0)
      continue

    # Sample 11 1-hop neighbors
    sampled_one_hop = np.random.choice(one_hop_neighbors, size=11, replace=True)

    # Get 2-hop neighbors
    two_hop_neighbors = g_2hop.successors(node).numpy()

    # Exclude node form 2 hop nodes
    two_hop_neighbors = list(set(two_hop_neighbors))

    # Sample 21 2-hop neighbors
    sampled_two_hop = np.random.choice(two_hop_neighbors, size=21, replace=True)


    r_i = torch.sum((torch.unsqueeze(feat[node],0)), axis=0)

    N_1hop = sampled_one_hop.tolist()
    R_1hop = torch.sum((feat[N_1hop]),axis=0)

    N_2hop = sampled_two_hop.tolist()
    R_2hop = torch.sum((feat[N_2hop]),axis=0)

    R_1hop = torch.sign(R_1hop)
    R_2hop = torch.sign(R_2hop)

    R_1hop = torch.roll(R_1hop,-1) #rotate once
    R_2hop = torch.roll(R_2hop,-2) #rotate twice

    z_i = r_i * R_1hop * R_2hop

    z_i = torch.where(z_i == -1, torch.tensor(0.0), torch.tensor(1.0)) # convert to binary

    Test_labels_pred_distances = torch.cdist(torch.unsqueeze(z_i,0), Labels_HD_Vectors, p=1)

    y_i_pred = torch.argmin(Test_labels_pred_distances, dim=1)

    Test_nodes_label_preds.append(y_i_pred.item())

Compute Accuracy of the predictions

In [33]:
from sklearn.metrics import accuracy_score
accuracy_score(test_node_labels.numpy(), Test_nodes_label_preds)

0.797