# TigerGraph Data Science Library 101 - Topological Link Predication Algorithm
This notebook shows the examples of using the most common topological link predication algorithms in TigerGraph Graph Science Library. More detailed explanations of these algorithms can be found in the official documentation (https://docs.tigergraph.com/graph-ml/current/link-prediction/).


## Step1: Setting things up
- Connect and Load data
- Visualize the graph schema 
- Get basic stats, e.g., counts of nodes & edges

### Create connection

In [1]:
import json
import pandas as pd
from pyTigerGraph import TigerGraphConnection

# Read in DB configs
with open('../config.json', "r") as config_file:
    config = json.load(config_file)

conn = TigerGraphConnection(
    host=config["host"],
    username=config["username"],
    password=config["password"],
)

### Download movie dataset

In [2]:
from pyTigerGraph.datasets import Datasets

dataset_social = Datasets("social")

Downloading:   0%|          | 0/1970 [00:00<?, ?it/s]

### Ingest data

In [3]:
conn.ingestDataset(dataset_social, getToken=config["getToken"])

---- Checking database ----
A graph with name social already exists in the database. Please drop it first before ingesting.


### Visualize schema

In [4]:
from pyTigerGraph.visualization import drawSchema

drawSchema(conn.getSchema(force=True))

CytoscapeWidget(cytoscape_layout={'name': 'circle', 'animate': True, 'padding': 1}, cytoscape_style=[{'selecto…

### Print graph stats

In [5]:
vertices = conn.getVertexTypes()
total_count = 0
for vertex in vertices:
    vertex_cnt = conn.getVertexCount(vertex)
    total_count += vertex_cnt
    print("Node count: ({} : {}) ".format(vertex, vertex_cnt))
print("Total node count: ", total_count)

Node count: (Person : 12) 
Total node count:  12


In [6]:
import pprint
edge_count = conn.getEdgeCount()
print("Edges count: total ", sum(edge_count.values()))
pprint.pprint(edge_count) 

Edges count: total  39
{'Coworker': 11, 'Friend': 14, 'reverse_Friend': 14}


## Step 2: Leveraging pyTigerGraph’s featurizer to run Topological Link Prediction algorithms

pyTigerGraph provides a full suit of data science capabilities, and in this tutorial, we will showcase how to use featurizer to list out all available Topological Link Prediction algorithms in our GDS library, and to run a few popular algorithms as an example.

In [7]:
feat = conn.gds.featurizer()

In [8]:
feat.listAlgorithms("Topological Link Prediction")

Available algorithms for Topological Link Prediction:
  adamic_adar:
    01. name: tg_adamic_adar
  common_neighbors:
    02. name: tg_common_neighbors
  preferential_attachment:
    03. name: tg_preferential_attachment
  resource_allocation:
    04. name: tg_resource_allocation
  same_community:
    05. name: tg_same_community
  total_neighbors:
    06. name: tg_total_neighbors
Call runAlgorithm() with the algorithm name to execute it


## tg_adamic_adar

The Adamic/Adar index is a measure according to the number of shared links between two vertices. It is defined as the sum of the inverse logarithmic degree centrality of the neighbors shared by the two vertices. This algorithm ignores edge weights. (https://docs.tigergraph.com/graph-ml/current/link-prediction/adamic-adar)


## Input Parameters

* VERTEX v_source: The first vertex to compare {"id": "vertex_id", "type": "vertex_type"}
* VERTEX v_target: The second vertex to compare with the first {"id": "vertex_id", "type": "vertex_type"}
* SET<STRING> e_type_set: Edge types to traverse 
* BOOL print_results: if True, print result (True by default)

In [9]:
params = {
    "v_source": {"id": "Alex", "type": "Person"},
    "v_target": {"id": "Bob", "type": "Person"},
    "e_type_set": ["Coworker"],
    "print_results": True
}

results = feat.runAlgorithm("tg_adamic_adar", params=params)

## Results

Returns Adamic Adar index between the two given vertices. If the two vertices do not have common neighbors, the algorithm will return a division by 0 error

In [10]:
df_adamic_adar = pd.json_normalize(results)

display(df_adamic_adar)

Unnamed: 0,@@sum_closeness
0,3.32193


## tg_common_neighbors

A vertex 𝐴 that is connected to vertices 𝐵 and 𝐶 is considered to be a "common neighbor" of 𝐵 and 𝐶. The common neighbors algorithm counts the number of common neighbors between two vertices. This algorithm ignores edge weights. (https://docs.tigergraph.com/graph-ml/current/link-prediction/common-neighbors)


## Input Parameters

* VERTEX v_source: The first vertex to compare {"id": "vertex_id", "type": "vertex_type"}
* VERTEX v_target: The second vertex to compare with the first {"id": "vertex_id", "type": "vertex_type"}
* SET<STRING> e_type_set: Edge types to traverse 
* BOOL print_results: if True, print result (True by default)

In [11]:
params = {
    "v_source": {"id": "Alex", "type": "Person"},
    "v_target": {"id": "Bob", "type": "Person"},
    "e_type_set": ["Coworker"],
    "print_results": True
}

results = feat.runAlgorithm("tg_common_neighbors", params=params)

## Results

Returns the number of common neighbors between two vertices expressed as a closeness value.

In [12]:
df_common_neighbors = pd.json_normalize(results)

display(df_common_neighbors)

Unnamed: 0,closeness
0,1


## tg_preferential_attachment

Preferential Attachment is a measure to compute the closeness of vertices based on the number of their neighbors. The algorithm returns the product of the number of neighbors of the first vertex and the number of neighbors of the second vertex. (https://docs.tigergraph.com/graph-ml/current/link-prediction/preferential-attachment)

## Input Parameters

* VERTEX v_source: The first vertex to compare {"id": "vertex_id", "type": "vertex_type"}
* VERTEX v_target: The second vertex to compare with the first {"id": "vertex_id", "type": "vertex_type"}
* SET<STRING> e_type_set: Edge types to traverse
* BOOL print_results: if True, print result (True by default)

In [13]:
params = {
    "v_source": {"id": "Alex", "type": "Person"},
    "v_target": {"id": "Bob", "type": "Person"},
    "e_type_set": ["Coworker"],
    "print_results": True
}

results = feat.runAlgorithm("tg_preferential_attachment", params=params)

## Results

The product of the number of neighbors of the two vertices. 

In [14]:
df_preferential_attachment = pd.json_normalize(results)

display(df_preferential_attachment)

Unnamed: 0,closeness
0,4


## tg_resource_allocation

Resource Allocation is used to compute the closeness of nodes based on their shared neighbors. (https://docs.tigergraph.com/graph-ml/current/link-prediction/resource-allocation)

## Input Parameters

* VERTEX v_source: The first vertex to compare {"id": "vertex_id", "type": "vertex_type"}
* VERTEX v_target: The second vertex to compare with the first {"id": "vertex_id", "type": "vertex_type"}
* SET<STRING> e_type_set: Edge types to traverse 
* BOOL print_results: if True, print result (True by default)

In [15]:
params = {
    "v_source": {"id": "Alex", "type": "Person"},
    "v_target": {"id": "Bob", "type": "Person"},
    "e_type_set": ["Coworker"],
    "print_results": True
}

results = feat.runAlgorithm("tg_resource_allocation", params=params)

## Results

Returns a closeness value for two input vertices.

In [16]:
df_resource_allocation = pd.json_normalize(results)

display(df_resource_allocation)

Unnamed: 0,@@sum_closeness
0,0.5


## tg_total_neighbors

The algorithm counts the total number of neighbors, or vertices connected by one hop, of two vertices. (https://docs.tigergraph.com/graph-ml/current/link-prediction/total-neighbors)

## Input Parameters

* VERTEX v_source: The first vertex to compare {"id": "vertex_id", "type": "vertex_type"}
* VERTEX v_target: The second vertex to compare with the first {"id": "vertex_id", "type": "vertex_type"}
* SET<STRING> e_type_set: Edge types to traverse 
* BOOL print_results: if True, print result (True by default)

In [17]:
params = {
    "v_source": {"id": "Alex", "type": "Person"},
    "v_target": {"id": "Bob", "type": "Person"},
    "e_type_set": ["Coworker"],
    "print_results": True
}

results = feat.runAlgorithm("tg_total_neighbors", params=params)

In [None]:
print(results)

## Results

The total number of neighbors of two vertices as a closeness value.

In [18]:
df_total_neighbors = pd.json_normalize(results)

display(df_total_neighbors)

Unnamed: 0,closeness
0,3
