<span>
<b>Author:</b> <a href="http://about.giuliorossetti.net">Giulio Rossetti</a>, <a href="https://andreafailla.github.io/">Andrea Failla</a><br>
<b>Python version:</b>  >=3.6<br/>
<b>LinkPred version:</b>  0.4.1<br/>
<b>Last update:</b> 04/07/2025
</span>

<a id='top'></a>
# Link Prediction

``linkpred`` is a python library designed to provide support to *unsupervised* link prediction analysis.

In this notebook are introduced some of the main features of the library and an overview of its functionalities.

**Note:** 
- this notebook is purposely not 100% comprehensive, it only discusses the basic things you need to get started.
- LinkPred is developed and maintained by Raf Guns (University of Antwerp)

## Table of Contents

1. [Installing LinkPred](#install)
2. [Prediction Workflow](#workflow)
    1. [Loading the network from file](#graph)
    2. [Computing the desired unsupervied predictors](#pred)
    3. [Evaluation and comparison of different predictors](#eval)
3. [Conclusions](#conclusion)

<a id='install'></a>
## 1. Installing LinkPred ([to top](#top))

As a first step, we need to make sure that ``linkpred`` is installed and working.

The library can be installed using ``pip``:

    pip install linkpred

In order to check if ``linkpred`` has been correctly installed just try to import it

In [None]:
#!pip install linkpred
import linkpred

<a id='workflow'></a>
## 2. Prediction Workflow

``linkpred`` offers complete support to all stages of the Link Prediction workflow:

1. Network Loading
2. Predictor selection and application
3. Results evaluation 

<a id='graph'></a>
### 2.A Loading the network from file

To get started, ``linkpred`` requires to read the graph from file. 
In our example we'll use Game of Thrones Season 6 edge data

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import networkx as nx

In [None]:
#!wget https://andreafailla.github.io/uploads/data/got-s6-edges.csv

In [None]:
def read_net_w(filename):
    g = nx.Graph()
    with open(filename) as f:
        f.readline()
        for l in f:
            l = l.split(",")
            g.add_edge(l[0], l[1], weight=int(l[2]))
    return g

# Game of Thrones data
g = read_net_w(f'got-s6-edges.csv')

<a id='pred'></a>
### 2.B Computing the desired unsupervied predictors

``linkpred`` offers a wide number of unsupervied predictors organized into four families:
- **Neighborhood:**
    - *AdamicAdar*, AssociationStrength, *CommonNeighbours*, Cosine, DegreeProduct, *Jaccard*, MaxOverlap, MinOverlap, NMeasure, Pearson,ResourceAllocation
- **Paths:**
    - GraphDistance, *Katz*
- **Ranking:**
    - *SimRank*, RootedPageRank
- **Miscellanea:**
    - Community, Copy, *Random*
    
In the following we'll test only few of them.

For sake of simplicity here we'll show only the top-5 forecasts for each selected predictor.

#### 2.B.1 Neighborhood

The first family of predictors is the one that tries to relate neighborhoods of node pairs with the likelihood of observing a tie among them. </br>
The question such predictors try to answer is: *How many friend we have to share in order to become friends?*

##### **Common Neighbors**

The more friends we share, the more likely we will become friends.

In [None]:
cn = linkpred.predictors.CommonNeighbours(g, excluded=g.edges()) # We aim to predict only new links, thus we exclude existing ones
cn_results = cn.predict()

top = cn_results.top(5)
for edge, score in top.items():
    print(edge, score)

##### **Jaccard**
The more similar our friends circles are, the more likely we will become friends.

In [None]:
jc = linkpred.predictors.Jaccard(g, excluded=g.edges())
jc_results = jc.predict()

top = jc_results.top(5)
for edge, score in top.items():
    print(edge, score)

##### **Adamic Adar**
The more selective our mutual friends are, the more likely we will become friends.

In [None]:
aa = linkpred.predictors.AdamicAdar(g, excluded=g.edges())
aa_results = aa.predict()

top = aa_results.top(5)
for edge, score in top.items():
    print(edge, score)

#### 2.B.2 Paths

The second family of predictors is the one that tries to relate node pairs distance with the likelihood of observing a tie among them in the future. </br>
The question such predictors try to answer is: *How distant are we?*


##### **Katz**
Katz computes the weighted sum over all the paths between two nodes.

In [None]:
kz = linkpred.predictors.Katz(g, excluded=g.edges())
kz_results = kz.predict()

top = kz_results.top(5)
for edge, score in top.items():
    print(edge, score)

##### **Graph Distance**
Graph Distance computes the (negated) length of the shortest path between two nodes

In [None]:
gd = linkpred.predictors.GraphDistance(g, excluded=g.edges())
gd_results = gd.predict()

top = gd_results.top(5)
for edge, score in top.items():
    print(edge, score)

#### 2.B.3 Ranking

The third family of predictors is the one that tries to relate node pairs position in the graph with the likelihood of observing a tie among them in the future. </br>
The question such predictors try to answer is: *How similar are we?

##### **SimRank**
Two nodes are similar to the extent that their neighborhoods are similar.

In [None]:
simrank = linkpred.predictors.SimRank(g, excluded=g.edges())
simrank_results = simrank.predict(c=0.5)

top = simrank_results.top(5)
for edge, score in top.items():
    print(edge, score)

#### 2.B.4 Miscellanea

In this family fall alternative definitios of link predictors. </br>
``linkpred`` groups here approaches that are commonly used as baselines.

##### **Random**

Random guessing.

In [None]:
rnd = linkpred.predictors.Random(g, excluded=g.edges())
rnd_results = rnd.predict()

top = rnd_results.top(5)
for edge, score in top.items():
    print(edge, score)

<a id='eval'></a>
### 2.C Evaluation and comparison of different predictors

To evaluate a link predictor we have to separate the network used for training from the one used for testing purposes.

In [None]:
import random
import itertools
from linkpred.evaluation import Pair

# Building the test network
test = read_net_w(f'got-s7-edges.csv')

# Exclude test network from learning phase
training = g.copy()

# Node set
nodes = list(g.nodes())
nodes.extend(list(test.nodes()))

# Compute the test set and the universe set
test = [Pair(i) for i in test.edges()]
universe = set([Pair(i) for i in itertools.product(nodes, nodes) if i[0]!=i[1]])

After that, we can apply the predictors to the training network

In [None]:
cn = linkpred.predictors.CommonNeighbours(training, excluded=training.edges())
cn_results = cn.predict()

aa = linkpred.predictors.AdamicAdar(training, excluded=training.edges())
aa_results = aa.predict()

jc = linkpred.predictors.Jaccard(training, excluded=training.edges())
jc_results = jc.predict()

Thus, we can evaluate the obtained prediction against the test

In [None]:
cn_evaluation = linkpred.evaluation.EvaluationSheet(cn_results, test, universe)
aa_evaluation = linkpred.evaluation.EvaluationSheet(aa_results, test, universe)
jc_evaluation = linkpred.evaluation.EvaluationSheet(jc_results, test, universe)

The results can be easily compared using a ROC plot

In [None]:
plt.plot(cn_evaluation.fallout(), cn_evaluation.recall(), label="Common Neighbors")
plt.plot(aa_evaluation.fallout(), aa_evaluation.recall(), label="Adamic Adar")
plt.plot(jc_evaluation.fallout(), jc_evaluation.recall(), label="Jaccard")
plt.ylabel("TPR")
plt.xlabel("FPR")
plt.legend()
plt.show()

A simple way to summarize the results offered by the ROC curve is through its AUC

In [None]:
from sklearn.metrics import auc

print("Area Under Roc Curve (AUROC)")
print(f"Common Neigh.: \t {auc(cn_evaluation.fallout(), cn_evaluation.recall())}")
print(f"Adamic Adar: \t {auc(aa_evaluation.fallout(), aa_evaluation.recall())}")
print(f"Jaccard: \t {auc(jc_evaluation.fallout(), jc_evaluation.recall())}")

<a id="Exercises"></a>
## 3. Exercises ([to top](#top))


### Unsupervised Link prediction
- Download data about co-acting relations during season 6 of Game of Thrones

In [None]:
!wget https://andreafailla.github.io/uploads/data/got-s6-edges.csv

Explore the network’s basic properties: number of nodes, edges, density, and clustering coefficient.

Select at least 3 methods and apply them to your dataset. Then the predicted links by score and display the top 10 predicted edges.

We will test the predictors on data from the 7th season of GoT. You can download the data with:

In [None]:
!wget https://andreafailla.github.io/uploads/data/got-s7-edges.csv


Using the Pair object from linkpred, compute the universe and test sets.

Apply the predictors to the training network. Then use the EvaluationSheet object (from linkpred.evaluation import EvaluationSheet) to evaluate the obtained prediction against the test


### Link prediction and community structure

Use the cdlib library to detect communities in the network (e.g., Louvain method)

Compare the accuracy of link prediction within communities versus across communities. Are links more likely to form within the same community?