# Question 1

Compute the Topic-Specific PageRank for the following link topology. Assume that pages selected for the teleport set are nodes 1 and 2 and that in the teleport set, the weight assigned for node 1 is twice that of node 2. Assume further that the teleport probability, (1 - beta), is 0.3. Which of the following statements is correct?

<img style="float:left" src="https://d396qusza40orc.cloudfront.net/mmds/images/otc_pagerank4.gif"/>
<br clear="all"/>

<ol>
<li>TSPR(1) = .2455
<li>TSPR(2) = .2252
<li>TSPR(3) = .1092
<li>TSPR(2) = .8998
</ol>

In [54]:
import numpy as np

# Link Topology
G = np.array([
        [0.0, 1.0, 0.0, 0.0],
        [0.5, 0.0, 0.0, 0.0],
        [0.5, 0.0, 0.0, 1.0],
        [0.0, 0.0, 1.0, 0.0]
    ])

# Random teleport matrix
N = np.tile(np.array([[2/3, 1/3 ,0, 0]]).T, (1, 4))

# Rank Matrix
r = np.ones((G.shape[0], 1)) / G.shape[0]

# Random jump probability
b = 0.7

for i in range(10):
    r = b * np.dot(G, r) + (1 - b) * np.dot(N, r)
   
# Round the results
r = np.around(r.flatten(), 4)

print("Topic-Specific Page Rank")
print(r)
print()

print("1:", r[0] == 0.2455)
print("2:", r[1] == 0.2252)
print("3:", r[2] == 0.1092)
print("4:", r[3] == 0.8998)

Topic-Specific Page Rank
[ 0.3575  0.2252  0.2462  0.1711]

1: False
2: True
3: False
4: False


# Question 2

The spam-farm architecture described in Section 5.4.1 suffers from the problem that the target page has many links --- one to each supporting page. To avoid that problem, the spammer could use the architecture shown below:

<img style="float:left" src="https://d396qusza40orc.cloudfront.net/mmds/images/otc_spamfarm1.gif"/>
<br clear="all"/>

There, k "second-tier" nodes act as intermediaries. The target page t has only to link to the k second-tier pages, and each of those pages links to m/k of the m supporting pages. Each of the supporting pages links only to t (although most of these links are not shown). Suppose the taxation parameter is β = 0.85, and x is the amount of PageRank supplied from outside to the target page. Let n be the total number of pages in the Web. Finally, let y be the PageRank of target page t. If we compute the formula for y in terms of k, m, and n, we get a formula with the form

- y = ax + bm/n + ck/n

Note: To arrive at this form, it is necessary at the last step to drop a low-order term that is a fraction of 1/n. Determine coefficients a, b, and c, remembering that β is fixed at 0.85. Then, identify the value, correct to two decimal places, for one of these coefficients.

1. c = 0.46
2. b = 0.33
3. b = 0.21
4. c = 0.13

In [58]:
β = 0.85

a = 1.0 / (1 - np.power(β, 3))
b = β / (1.0 + β + np.power(β, 2))
c = np.power(β, 2) / (1.0 + β + np.power(β, 2))

# Round results to 2 decimals
a, b, c = round(a, 2), round(b, 2), round(c, 2)

print("1:", c == 0.46)
print("2:", b == 0.33)
print("3:", b == 0.21)
print("4:", c == 0.13)

1: False
2: True
3: False
4: False
