### CS4423 - Networks
Angela Carnevale  
School of Mathematical and Statistical Sciences  
University of Galway

#### 6. Directed Networks

# Week 11, lecture 2:  Hubs and Authorities

## Importance of nodes in directed networks

### Adjacency matrix of a directed network

Let $M = (m_{ij})$ be the **adjacency matrix** of the directed graph
$G = (X, E)$
that is 

$$m_{ij} = \begin{cases} 1 & \text{if } x_j \to x_i,  \\
 0 & \text{ otherwise}\end{cases},$$
where $X = \{x_1, \dots, x_n\}$.

### In-Degree vs. Out-Degree

In a directed network, one can distinguish between **in-degree centrality** $c_i^{D^{\text{in}}}$ and **out-degree centrality** $c_i^{D^{\text{out}}}$
based on the number of arrows that point **into** a node, and the number of arrows pointing
**out** of a node:
$$
c_i^{D^{\text{in}}} = k_i^{\text{in}} = \sum_{j=1}^n m_{ij},
\quad
c_i^{D^{\text{out}}} = k_i^{\text{out}} = \sum_{j=1}^n m_{ji},
$$
where $M = (m_{ij})$ is the adjacency matrix of a directed graph
$G = (X, E)$.

Similarly, one can define and study the corresponding **eigenvector centralities** $c^{E^{\text{in}}}$ and $c^{E^{\text{out}}}$:
$$
A c^{E^{\text{in}}} = \lambda c^{E^{\text{in}}},
\quad
A^{T} c^{E^{\text{out}}} = \lambda c^{E^{\text{out}}}.
$$

Especially with a view towards our main example of a directed network (the WWW), we take here a slightly different approach.

##  Hub Centrality and Authority Centrality

In a network of nodes connected by directed edges, each node can
play two different roles:

* one as a _receiver_ of links, and 

* one as a _sender_ of links.  

A first measure of importance, or recognition, of
a node in this network might be the number of
links it receives, i.e., its **in-degree** in the underlying graph.
If in a collection of web pages relating to a search query on the
subject of "networks", say, a particular page receives a high number
of links, this page might count as an **authority** on that subject,
with **authority score** measured by its in-degree.

In turn, the web pages linking to an authority in some sense know
where to find valuable information and thus serve as good "lists" for
the subject.
A high-value list is called a **hub** for this query.
It makes sense to measure the value of a page as list in
terms of the values of the pages it points at, by assigning to its
**hub score** the sum of the authority scores of the pages it points
at.

![hubs](images/hubs.png)

Now
the authority score of a page  could take the hub scores
of the list pages into account, by using the sum of the hub scores
of the pages that point at it as an updated authority score...

Then again, applying the **Principle of Repeated Improvement**,
the hub scores can be updated on the basis of the new authority scores,
and so on.

This suggests a ranking procedure that tries to estimate, for each page $p$,
its value as an authority and its value as a hub in the form
of numerical scores, $a(p)$ and $h(p)$.

Starting off with values all equal to $1$, the estimates are updated
by applying the following two rules in an alternating fashion.

**Authority Update Rule:**
For each page $p$, update $a(p)$
to be the sum of the hub scores of all the pages pointing to it.

    
**Hub Update Rule:**
For each page $p$,
update $h(p)$
to be the sum of the authority
scores of all the pages
that it points to.


In order to keep the numbers from growing too large,
score vectors should be **normalised** after each step,
in a way that  replaces $h$ by a scalar multiple $\hat{h} = sh$
so that the entries in $\hat{h}$ add up to $100$, say,
representing relative percentage values,
similarly for $a$.

After a number of iterations, the values $a(p)$ and
$h(p)$ stabilise, in the sense that further applications of
the update rules do not yield essentially better relative estimates.

**Example.**
Continuing the example above ...

In [None]:
import networkx as nx

In [None]:
nodes = list(range(1,10)) + ["A%s" % (i+1) for i in range(7)]
print(nodes)

In [None]:
edges = [
    (1,"A1"),(2,"A1"),(3,"A1"),(3,"A2"),(4,"A2"),(5,"A3"),
    (5,"A5"),(6,"A2"),(6,"A4"),(7,"A4"),(7,"A5"),(8,"A4"),
    (8,"A5"),(8,"A6"),(8,"A7"),(9,"A5"),(9,"A6"),(9,"A7")
]

In [None]:
G = nx.DiGraph()
G.add_nodes_from(nodes)
G.add_edges_from(edges)

In [None]:
pos = nx.circular_layout(G)
for i in [1,2,3,4]:
    j = 10 - i
    pos[i], pos[j] = pos[j], pos[i]
colors = 9 * ['y'] + 7 * ['w']

In [None]:
nx.draw(G, with_labels=True, node_color=colors, pos=pos)

Let's use dictionaries, with nodes as keys and hub or authority scores as values.
Here's a way to normalize such a record.

In [None]:
def normalized(d):
    s = sum(d.values())
    return { k: 100/s*v for k, v in d.items() }

Initially, all scores are set to $1$ (and then normalized).

In [None]:
hubs = normalized({ x : 1 for x in G })
auth = normalized({ x : 1 for x in G })
hubs

The update rules can then be implemented as follows.

In [None]:
def HubsUpdate(G, auth):
    h = { x: 0 for x in G }
    for x in G:
        for y in G.successors(x):
            h[x] += auth[y]
    return normalized(h)

def AuthUpdate(G, hubs):
    a = { x: 0 for x in G }
    for x in G:
        for y in G.successors(x):
            a[y] += hubs[x]
    return normalized(a)

Now we can apply the rules. alternating between the two, say 10 times, and observe how the scores stabilize.

In [None]:
for k in range(20):
    auth = AuthUpdate(G, hubs)
    print("auth score A3 = ", auth['A3'])
    hubs = HubsUpdate(G, auth)
    print("hubs score 1 = ", hubs[1])

All in one `python` function:

In [None]:
def HubsAuth(G, k):
    hubs = normalized({ x : 1 for x in G })
    auth = normalized({ x : 1 for x in G })
    for i in range(k):
        auth = AuthUpdate(G, hubs)
        hubs = HubsUpdate(G, auth)
    
    return hubs, auth

In [None]:
hubs, auth = HubsAuth(G, 10)
 

In [None]:
hubs

Finally, let's apply this to a random directed graph.

In [None]:
n, m = 80, 120
G = nx.gnm_random_graph(n, m, directed=True)

In [None]:
hubs, auth = HubsAuth(G, 50)

Let's inspect the top and the bottom 10 scores.

In [None]:
[(k,auth[k]) for k in sorted(auth, key=auth.get, reverse=True)][:10]

In [None]:
[(k,auth[k]) for k in sorted(auth, key=auth.get)][:10]

In [None]:
[(k, hubs[k]) for k in sorted(hubs, key=hubs.get, reverse=True)][:10]

In [None]:
[(k, hubs[k]) for k in sorted(hubs, key=hubs.get)][:10]

In terms of matrix algebra this process can be described as follows.

###  Spectral Analysis of Hubs and Authorities

Let $M = (m_{ij})$ be the **adjacency matrix** of the directed graph
$G = (X, E)$
that is $m_{ij} = 1$ if $x_j \to x_i$ and $m_{ij} = 0$ otherwise,
where $X = \{x_1, \dots, x_n\}$.

We write $h = (h_1, \dots, h_n)$ for a list of hub scores, with $h_i = h(x_i)$,
the hub score of node $x_i$.  Similarly, we write $a = (a_1, \dots, a_n)$ for
a list of authority scores.

The **hub update rule** can now be expressed as
a **matrix multiplication**:
$$
h \gets M^T a
$$
and similarly, the **authority update rule**, using the transpose of the matrix $M$:
$$
a \gets M h
$$

Applying two steps of the procedure at once yields update rules
$$
  h \gets M^T M h
$$
and
$$
  a \gets M M^T \, a
$$
for $h$ and $a$, respectively.  

**In the limit**, one expects
to get vectors $h^{\ast}$ and $a^{\ast}$ whose directions do not change
under the latter rules, i.e.,
$$
  (M^T M) h^{\ast} = c h^{\ast}
$$
and
$$
  (M M^T) a^{\ast} = d a^{\ast}
$$
for certain constants $c$ and $d$, meaning that $h^{\ast}$ and $a^{\ast}$
are **eigenvectors** for the matrices $M^T M$ and $M M^T$,
respectively.

Using the fact that $M^T M$ and $M M^T$ are **symmetric** matrices
($(M^T M)^T = M^T (M^T)^T = M^T M$),
it can indeed be shown that any sequence of hub score vectors
$h$ under repeated application of the above update rule
converges to a real-valued eigenvector $h^{\ast}$ of $M M^T$ for the real eigenvalue $c$.

The argument uses the [Spectral Theorem](https://en.wikipedia.org/wiki/Spectral_theorem)
for [real symmetric matrices](https://en.wikipedia.org/wiki/Symmetric_matrix#Real_symmetric_matrices).


A similar result exists for any sequence of authority score vectors $a$.