# Edge Privacy
Reference implementations of techniques described in the paper [Smooth Sensitivity and Sampling in Private Data Analysis](https://cs-people.bu.edu/ads22/pubs/NRS07/NRS07-full-draft-v1.pdf) by Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith.

## Library Imports
We use [networkx](https://networkx.org) to perform the required graph computations and the implementation of the Cauchy mechanisms provided by [RelM](https://github.com/anusii/RelM) to release the differentially private query responses.

In [1]:
import numpy as np
import networkx as nx
from relm.mechanisms import CauchyMechanism

## Triangle Counts
We construct a differentially private release mechanism for the number of triangles in a graph. This process is comprised of three steps:
  1. Compute the exact query response,
  2. Compute the smooth sensitivity of the query,
  3. Add noise scaled according to the smooth sensitvity.

### Compute the exact query response

#### Generate a random graph

In [2]:
n = 2**13
p = 0.01
g = nx.random_graphs.gnp_random_graph(n, p)

#### Compute the exact triangle count

In [3]:
triangle_count = np.array([sum(nx.triangles(g).values()) / 3.0])

### Compute the smooth sensitivity of the query

#### Compute the partial count matrices
The algorithm that we use to compute the smooth sensitivity of the query is given in terms of the adjacency matrix $X = \{x_{ij}\}$ where $x_{ij} = \mathbf{1}((i,j) \in E)$. Let the matrix $A$ count the number of triangles that involve each potential edge.  That is, $A = \{a_{ij}\}$ where $a_{ij} = \sum_{k \in [n]} x_{ik} \cdot x_{kj}$. Let the matrix $B$ count the number of half-built triangles that involve each potential edge. That is $B = \{b_{ij}\}$ where $b_{ij} = \sum_{k \in [n]} x_{ik} \oplus x_{kj}$.

In [4]:
# Compute the adjacency matrix
X = nx.linalg.graphmatrix.adjacency_matrix(g).astype(np.int)

# Compute A and B using matrix operations
A = (X @ X).todense()
B = X.sum(0) + X.sum(1) - 2 * A

# Zero out the main diagonal because we are
# interested only in indices (i,j) with i != j
np.fill_diagonal(A, 0)
np.fill_diagonal(B, 0)

#### Compute the local sensitivity at distance $s$ for $0 \leq s \leq n$
We recall from the paper that if $S_{f,\epsilon}^*(G)$ is the smooth sensitivity of a query at an input $G$, $LS_f(H)$ is the local sensitivity of $f$ at an input $H$, and $$A^{(s)}(G) = \max_{H: d(G,H) \leq s} LS_f(H)$$
is the sensitivity of $f$ at distance $s$, then we have
$$S_{f,\epsilon}^*(G) = \max_{0 \leq s \leq n} e^{-\epsilon s} A^{(s)}(G).$$

Furthermore, Nissim et al show that if $f$ counts the number of triangles in a graph then we have
$$A^{(s)}(G) = \max_{0 \leq i \neq j \leq n} c_{ij}(s) \quad \text{where} \quad c_{ij} = \min \left(a_{ij} + \left\lfloor\frac{s + \min(s,b_{ij})}{2}\right\rfloor, n-2 \right).$$

They then describe an $O(n^2 \log n)$ algorithm for computing $A^{(s)}(G)$ for $0 \leq s \leq n$ which works by efficiently identifying the pairs $(a_{ij}, b_{ij})$ needed to compute $c_{ij}(s)$ for all $s$. 

In [5]:
def local_sensitivity_at_all_distances(S, T, n):
    """ Compute the local sensitivity at distance s for 0 <= s <= n. """
    survivors = find_survivors(A, B, n)
    return np.array([local_sensitivity_at_distance(s, survivors) for s in range(n+1)])

def find_survivors(A, B, n):
    """ Find (i,j) used to compute local sensitivity at distance 0 <= s <= n.
    
        First identify all maximal pairs (A[i,j], B[i,j]).
        Then find the maximal pairs that will produce the correct value for the
        local sensitivity at distance s computations.
    """
    S, T = find_maximal_pairs(A, B, n)
    prev_survivor = (S[0], T[0])
    break_points = {prev_survivor: n+1}
    for survivor in zip(S[1:], T[1:]):
        a0, b0 = prev_survivor
        a1, b1 = survivor
        intersection = 2*(a0 - a1) + b0
        if b0 <= intersection <= b1:
            break_points[prev_survivor] = intersection
            break_points[survivor] = n+1
            prev_survivor = survivor
    survivors = sorted(break_points.items(), key=lambda _: _[1])
    return survivors

def find_maximal_pairs(A, B, n):
    """ Find maximal pairs A[i,j], B[i,j].

        For each distinct value set(A), keep only the pair
        (A[i,j], B[i,j]) with the largest value of B[i,j].
    """
    W = np.array(A).flatten()
    X = np.array(B).flatten()
    idx = np.argsort(W)
    Y = W[idx]
    Z = X[idx]
    delta = np.concatenate((np.zeros(1), Y[1:] != Y[:-1]))
    new_val_idxs = np.concatenate((np.array([0]),
                                   np.where(delta)[0],
                                   np.array([len(Y)])))
    S = np.empty((len(new_val_idxs)-1), dtype=np.int)
    T = np.empty((len(new_val_idxs)-1), dtype=np.int)
    for i in range(len(new_val_idxs)-1):
        S[i] = Y[new_val_idxs[i]]
        T[i] = np.max(Z[new_val_idxs[i]:new_val_idxs[i+1]])
    return S[::-1], T[::-1]

def local_sensitivity_at_distance(s, break_list):
    """ Compute the local sensitivity at distance s. """
    a, b = next((k for k,v in break_list if s <= v))
    return np.minimum(a + np.floor((s + np.minimum(s, b)) / 2.0), n - 2)

In [6]:
# Compute the local sensitivity at distance s for 0 <= s <= n
lsd = local_sensitivity_at_all_distances(A, B, n)

# Compute the smooth sensitivity
epsilon = 1.0
smooth_scaling = np.exp(-epsilon * np.arange(n + 1))
smooth_sensitivity = np.max(lsd * smooth_scaling)

#### Add noise scaled according to the smooth sensitvity

In [7]:
# Create a differentially private release mechanism
mechanism = CauchyMechanism(epsilon=epsilon)

# Compute the differentially private query response
dp_triangle_count = mechanism.release(triangle_count, smooth_sensitivity)

#### Display results

In [8]:
print("Exact triangle count = %i" % int(triangle_count))
print("Differentially private triangle count = %f" % dp_triangle_count)

Exact triangle count = 91717
Differentially private triangle count = 91716.161393
