# Differentially Private Egocentric Betweenness Centrality
Reference implementations of techniques described in the paper [Differentially-Private Two-Party Egocentric Betweenness Centrality](https://arxiv.org/pdf/1901.05562.pdf) by Leyla Roohi, Benjamin I.P. Rubinstein, and Vanessa Teague.

In the paper, the authors descirbe an algorithm to compute a differentially private estimate of the egocentric betweenness centrality (EBC) of a node in a graph. Their algorithm is based on a non-private algorithm that splits the required computations between two players. To make this algorithm differentially private, they use standard differentially private release mechanisms to perturb each message that the players send to one another.

Below, we provide an implementation of both the non-private and the differentially private version of the EBC algorithm the authors describe.

Here we assume that two players, $X$ and $Y$, have partial knowledge of a graph.  Every node in the graph belongs to one of two vertex sets, $V_X$ and $V_Y$. Both players have full knowledge of both vertex sets.

Every edge in the graph belongs to one of three edge sets: $E_X$, $E_Y$, and $E_{XY}$.

  - The edges in $E_X$ connect two nodes in $V_X$. Only player $X$ knows $E_X$
  - The edges in $E_Y$ connect two nodes in $V_Y$. Only player $Y$ knows $E_Y$.
  - The edges in $E_{XY}$ connect a node in $V_X$ to a node in $V_Y$. Both players know $E_{XY}$.
  
Using their partial information about the underlying graph, the two players will collaborate to compute the egocentric betweenness centrality of a node $a \in V_X$.

## Library Imports
We will use the graph algorithms provided by the networkx package to perform the required graph manipulations.

In [1]:
import random
import numpy as np
import networkx as nx

from itertools import combinations, product
from collections import Counter

## Ground Truth Algorithm

This algorithm is a quick way to compute the EBC of a node. It uses the algorithm provided by networkx to compute the (regular) betweenness centrality of each node in a graph. We will use this algorithm as a basis for comparison to verify that our implementation of the non-private version of the EBC algorithm described is correct.

In [2]:
def egocentric_betweenness_centrality(a, g):
    neighbors = g.neighbors(a)
    g2 = g.subgraph(list(neighbors) + [a])
    ebc = nx.betweenness_centrality(g2, normalized=False)
    return ebc[a]

## Generate a Random Graph 
Here we generate a random graph to use both for testing purposes and to evaluate the utility that the differentially private version of the algorithm can provide.

In [3]:
n = 10000
p = 0.001
g = nx.gnp_random_graph(n, p)

In [4]:
# These are the sets of vertices in the each player's network.
Vx = set(np.random.choice(np.arange(n), replace=False, size=n//2))
Vy = set(np.arange(n)).difference(Vx)

In [5]:
# A is the point for which we want to compute the EBC.
a = random.choice(list(Vx)) # We require that a is in X.

In [6]:
# These are the sets of edges in the combined graph.
Ex = set(g.subgraph(Vx).edges())
Ey = set(g.subgraph(Vy).edges())
Exy = set(g.edges()).difference(Ex.union(Ey))

## Warm-Up: A Non-Private Protocol

The non-private protocol described in this section can be used to compute the exact EBC for a node in the graph. Each player uses the information available to them to compute their contribution to the final computation.

In [7]:
def compute_R_star(a, Vx, Vy, Ex, Exy):
    # Player X computes the set of nodes in V_X that are neighbors of a.
    
    h = nx.Graph(Ex.union(Exy))
    neighbors = set(h.neighbors(a))
    R_star = neighbors.intersection(Vx)
    
    return R_star

In [8]:
def compute_Ty(a, Vx, Vy, Ey, Exy, R):
    # Player Y computes their contribution to the quantity Sxy.
    # To do so, they count the number of two-step paths between (i,j) where i is in Vx, j is in Vy
    # and the intermediate point k is in Vy.

    temp = nx.Graph(Exy)
    Ny = {j for j in Vy if ((a,j) in temp.edges())}
    neighbors = R.union(Ny)

    h = nx.Graph(Ey.union(Exy))
    h2 = h.subgraph(neighbors)
    Ty = dict()
    for i,j in product(Vx.intersection(neighbors), Vy.intersection(neighbors)):
        if (i,j) not in h2.edges():
            Ty[(i,j)] = 0                
            for k in Vy.difference({j}).intersection(neighbors):
                if ((i,k) in h2.edges()) and ((k,j) in h2.edges()):
                    Ty[(i,j)] += 1
    
    return Ty

In [9]:
def compute_Sy(a, Vx, Vy, Ey, Exy, R):
    # Player Y computes the number of two-step paths between neighbors of a that involve only points of Vy.

    temp = nx.Graph(Exy)
    Ny = {j for j in Vy if ((a,j) in temp.edges())}
    neighbors = R.union(Ny)

    h = nx.Graph(Ey.union(Exy))
    h2 = h.subgraph(neighbors)
    T = dict()
    for i,j in combinations(Vy.intersection(neighbors), 2):
        if (i,j) not in h2.edges():
            T[(i,j)] = 1
            for k in neighbors.difference({(i,j)}):
                if ((i,k) in h2.edges()) and ((k,j) in h2.edges()):
                    T[(i,j)] += 1

    values = np.array(list(T.values()))
    Sy = np.sum(1/values)
    
    return Sy

In [10]:
def compute_Sxy(a, Vx, Vy, Ex, Exy, Ty):
    # Player X computes their contribution to Sxy and combines it with Player Y's contribution.
    # To do so, they count the number of two-step paths between (i,j) where i is in Vx, j is in Vy
    # and the intermediate point k is in Vx.


    h = nx.Graph(Ex.union(Exy))
    neighbors = set(g.neighbors(a))
    
    h2 = h.subgraph(neighbors)
    Tx = dict()
    for i,j in product(Vx.intersection(neighbors), Vy.intersection(neighbors)):
        if (i,j) not in h2.edges():
            Tx[(i,j)] = 1
            for k in Vx.difference({i}).intersection(neighbors):
                if ((i,k) in h2.edges()) and ((k,j) in h2.edges()):
                    Tx[(i,j)] += 1
    
    # Recall that Player X knows all of the neighbors of a, but Player Y does not.
    # So, Ty may have some keys for spurrious pairs (i,j).
    # Player X knows that they do not need to use these in their computations.
    # This is the optimization mentioned at the end of section V in the original paper.
    
    for k in Tx.keys():
        if k in Ty:
            Tx[k] += Ty[k]
    values = np.array(list(Tx.values()))
    
    Sxy = np.sum(1/values)
    
    return Sxy

In [11]:
def compute_Sx(a, Vx, Vy, Ex, Exy):
    # Player X computes the number of two-step paths between neighbors of a that involve only points of Vx.

    h = nx.Graph(Ex.union(Exy))
    neighbors = set(h.neighbors(a))
 
    h2 = h.subgraph(neighbors)
    T = dict()
    for i,j in combinations(Vx.intersection(neighbors), 2):
        if (i,j) not in h2.edges():
            T[(i,j)] = 1
            for k in neighbors.difference({(i,j)}):
                if ((i,k) in h2.edges()) and ((k,j) in h2.edges()):
                    T[(i,j)] += 1

    values = np.array(list(T.values()))
    Sx = np.sum(1/values)
    
    return Sx

In [12]:
def nonprivate_protocol(a, Vx, Vy, Ex, Ey, Exy):
    R_star = compute_R_star(a, Vx, Vy, Ex, Exy) # This is computed by Player X
    
    Ty = compute_Ty(a, Vx, Vy, Ey, Exy, R_star) # This is computed by Player Y
    Sy = compute_Sy(a, Vx, Vy, Ey, Exy, R_star) # This is computed by Player Y
    
    Sxy = compute_Sxy(a, Vx, Vy, Ex, Exy, Ty)   # This is computed by Player X
    Sx = compute_Sx(a, Vx, Vy, Ex, Exy)         # This is computed by Player X
    
    nonprivate_ebc = Sx + Sy + Sxy          # This is computed by Player X
    
    return nonprivate_ebc

In [13]:
# Player X computes the EBC using the nonprivate protocol

nonprivate_ebc = nonprivate_protocol(a, Vx, Vy, Ex, Ey, Exy)
print(nonprivate_ebc)

120.0


In [14]:
# Compute the ebc in a simpler way to check our work.

egocentric_betweenness_centrality(a,g)

120.0

## A Privacy-Preserving Protocol

Here we augment the ingerdients that we used to compute the exact EBC with some differentially private release mechanisms.  This allows us to create a differentially-private release mechanism that produces differentially-private estimates of the EBC.

We will use the implementation of the Laplace mechanism provided in the package [relm](https://github.com/anusii/RelM).

We also use a version of the exponential mechanism described by the authors in the paper. We cannot easily use the standard exponential mechanism becuase we need to sample from a large state space and the naive mechanism cannot do so efficiently.  Instead, we implement the authors' suggested stratified sampling strategy sample from this state space.

In [None]:
import relm

In [None]:
epsilon = 2**0

In [None]:
def exponential_mechanism_with_stratified_sampling(a, Vx, Vy, Ex, Exy, R_star):
    # Player X uses this to efficiently use the exponential mechanism to perturb the set
    # of neighbors of a that he will send to Player Y.
    sensitivity = 1
    Nx = len(Vx) - 1

    # Inverse Transform Sampler
    log_probs = np.zeros(Nx+1)
    log_probs[0] = -Nx * np.logaddexp(0, epsilon/(2*sensitivity))
    for i in range(1, Nx+1):
        log_probs[i] = np.log(Nx - i + 1) - np.log(i) + log_probs[i-1] + (epsilon / (2*sensitivity))

    log_cdf = np.logaddexp.accumulate(log_probs)

    U = np.random.random()
    I = np.argmax(log_cdf >= np.log(U))

    # Pick and Flip Sampler
    R = R_star

    vs = np.random.choice(list(Vx.difference({a})), size=Nx - I, replace=False)
    for v in vs:
        if v in R:
            R.difference_update({v})
        else:
            R.update({v})
            
    return R

In [None]:
def private_protocol(a, Vx, Vy, Ex, Ey, Exy):
    # The two players collaborate to compute a differentially private estimate of the EBC of a.
    # Note that at each step, one player performs a computation similar to that required in the
    # exact (non-private) algorithm and then uses a differentially private release mechanism
    # to perturb the result.
    
    # First Player X computes the forward message
    R_star = compute_R_star(a, Vx, Vy, Ex, Exy)
    # Player X uses the exponential mechanism to perturb the set of neighbors R_star
    R = exponential_mechanism_with_stratified_sampling(a, Vx, Vy, Ex, Exy, R_star)
    
    # Next, Player Y computes the backwards messages
    ## Player Y computes a nonprivate contribution to T
    Ty_star = compute_Ty(a, Vx, Vy, Ey, Exy, R)          
    ## Player Y uses the Laplace mechanism to perturb the Ty_star counts
    laplace_mechanism = relm.mechanisms.LaplaceMechanism(epsilon / 2, sensitivity=2*len(R))
    key_list = sorted(list(Ty_star.keys()))
    value_array = np.array([Ty_star[k] for k in key_list], dtype=np.float64)
    perturbed_value_array = laplace_mechanism.release(value_array)
    Ty = {k: perturbed_value_array[i] for i,k in enumerate(key_list)}
    
    ## Player Y computes a nonprivate version of Sy
    Sy_star = compute_Sy(a, Vx, Vy, Ey, Exy, R)
    ## Player Y uses the Laplace mechanism to perturb Sy_star
    temp = nx.Graph(Exy)
    Ny = {j for j in Vy if ((a,j) in temp.edges())}
    laplace_mechanism = relm.mechanisms.LaplaceMechanism(epsilon / 2, sensitivity = len(Ny) - 1)
    Sy = laplace_mechanism.release(np.array([Sy_star]))[0]
    
    # Finally, Player X completes the computation of the private_ebc value for a.
    Sxy = compute_Sxy(a, Vx, Vy, Ex, Exy, Ty)
    Sx = compute_Sx(a, Vx, Vy, Ex, Exy)
    private_ebc = Sx + Sy + Sxy
    
    return private_ebc

In [None]:
# Use the differentially private algorithm to compute an estimate of the EBC of a.

private_ebc = private_protocol(a, Vx, Vy, Ex, Ey, Exy)
print(private_ebc)

In [None]:
# Compute the exact ebc in a simpler way to compare the two results.

egocentric_betweenness_centrality(a,g)