In [1]:
import numpy as np

## Question 1

Compute the Topic-Specific PageRank for the following link topology. Assume that pages selected for the teleport set are nodes 1 and 2 and that in the teleport set, the weight assigned for node 1 is twice that of node 2. Assume further that the teleport probability, (1 - beta), is 0.3. Which of the following statements is correct?
<img src="images/otc_pagerank4 (1).gif">

TSPR(1) = .3576<br>
TSPR(1) = .4236<br>
TSPR(2) = .7535<br>
TSPR(4) = .4787

<b>PageRank</b>: v' = ßMv + (1 - ß)(e/n) <br>
(see hw1)

<img src="images/tspg_teleport.JPG">

<img src="images/text_5.3.2.JPG">

In [2]:
M1 = np.matrix([
        # 1, 2, 3, 4
        [0., 1., 0., 0.], #1
        [.5, 0., 0., 0.], #2
        [.5, 0., 0., 1.], #3
        [.0, 0., 1., 0.]  #4
    ])
M1

matrix([[ 0. ,  1. ,  0. ,  0. ],
        [ 0.5,  0. ,  0. ,  0. ],
        [ 0.5,  0. ,  0. ,  1. ],
        [ 0. ,  0. ,  1. ,  0. ]])

## let's work through the video eg first

In [3]:
S1 = {1}
beta1 = 0.8
v0 = np.ones((4,1)) * 1./4.

# 'e' is a vector that has 1 in the components in S and 0 in other components
e = np.array([1., 0., 0., 0.]).reshape((4,1))
e

array([[ 1.],
       [ 0.],
       [ 0.],
       [ 0.]])

In [4]:
PR = []
PR.append(v0)
PR

[array([[ 0.25],
        [ 0.25],
        [ 0.25],
        [ 0.25]])]

In [5]:
e = np.ones((4,1))

for j in range(4):
    PR = []
    PR.append(v0)
    for i in range(40):
        v = PR[i]
        v_prime = beta1 * np.dot(M1, v) + (1. - beta1)*e/len(S1)
        leaked = 1. - sum(v_prime)
        re_insert = np.ones((4,1)) * leaked/4.
        v_prime = np.add(v_prime, re_insert)
        PR.append(v_prime)
    
    print np.transpose(e), "beta=" + str(beta1)
    print PR[40]
    idx = j + 1
    e[-idx] = 0

[[ 1.  1.  1.  1.]] beta=0.8
[[ 0.13235294]
 [ 0.10294118]
 [ 0.39705882]
 [ 0.36764706]]
[[ 1.  1.  1.  0.]] beta=0.8
[[ 0.26470588]
 [ 0.20588235]
 [ 0.34966582]
 [ 0.17974595]]
[[ 1.  1.  0.  0.]] beta=0.8
[[ 0.39705882]
 [ 0.30882353]
 [ 0.19117647]
 [ 0.10294118]]
[[ 1.  0.  0.  0.]] beta=0.8
[[ 0.29411765]
 [ 0.11764706]
 [ 0.32680477]
 [ 0.26143052]]


<img src="images/tspr_eg.JPG">

In [91]:
e = np.array([1., 0., 0., 0.]).reshape((4,1))
beta_list = [.9, .8, .7]

for b in beta_list:
    PR = []
    PR.append(v0)
    for i in range(20):
        v = PR[i]
        v_prime = b * np.dot(M1, v) + (1. - b)*e/len(S1)
        leaked = 1. - sum(v_prime)
        re_insert = np.ones((4,1)) * leaked/4.
        v_prime = np.add(v_prime, re_insert)
        PR.append(v_prime)
        
    

    print np.transpose(e), "beta=" + str(b)
    print PR[20]

[[ 1.  0.  0.  0.]] beta=0.9
[[ 0.16807695]
 [ 0.07565095]
 [ 0.40123264]
 [ 0.35503945]]
[[ 1.  0.  0.  0.]] beta=0.8
[[ 0.29411715]
 [ 0.11764855]
 [ 0.32743641]
 [ 0.26079789]]
[[ 1.  0.  0.  0.]] beta=0.7
[[ 0.39735088]
 [ 0.13907293]
 [ 0.27276218]
 [ 0.19081401]]


## now let's target the hw assignment

In [14]:
S1 = {1, 2}
beta1 = .7
v0 = np.ones((4,1)) * 1./4.

# the weight assigned for node 1 is twice that of node 2
# BUT also want e/len(S1) == [2/3, 1/3, 0, 0]
# SO THAT node 1 gets .2 and node 2 gets .1 of teleport probability
e = np.array([4./3, 2./3, 0., 0.]).reshape((4,1))
e

array([[ 1.33333333],
       [ 0.66666667],
       [ 0.        ],
       [ 0.        ]])

In [15]:
PR = []
PR.append(v0)

for i in range(400):
    v = PR[i]
    v_prime = beta1 * np.dot(M1, v) + (1. - beta1)*e/len(S1)
    leaked = 1. - sum(v_prime)
    re_insert = np.ones((4,1)) * leaked/4.
    v_prime = np.add(v_prime, re_insert)
    PR.append(v_prime)
PR[400]

matrix([[ 0.35761589],
        [ 0.22516556],
        [ 0.24542267],
        [ 0.17179587]])

## answer: TSPR(1) = .3576

## Question 2

The spam-farm architecture described in Section 5.4.1 suffers from the problem that the target page has many links --- one to each supporting page. To avoid that problem, the spammer could use the architecture shown below:
<img src="images/otc_spamfarm1.gif">
There, k "second-tier" nodes act as intermediaries. The target page t has only to link to the k second-tier pages, and each of those pages links to m/k of the m supporting pages. Each of the supporting pages links only to t (although most of these links are not shown). Suppose the taxation parameter is β = 0.85, and x is the amount of PageRank supplied from outside to the target page. Let n be the total number of pages in the Web. Finally, let y be the PageRank of target page t. If we compute the formula for y in terms of k, m, and n, we get a formula with the form

<center>y = ax + bm/n + ck/n</center>

<b>Note</b>: To arrive at this form, it is necessary at the last step to drop a low-order term that is a fraction of 1/n. Determine coefficients a, b, and c, remembering that β is fixed at 0.85. Then, identify the value, correct to two decimal places, for one of these coefficients.

c = 0.28<br>
c = 0.13<br>
a = 3.07<br>
b = 0.28

<img src="images/spamfarm.JPG">

In [17]:
beta2 = 0.85
a = 1/(1 - beta2**2)
a

3.6036036036036028

In [2]:
c = beta2/(1 + beta2)
c

0.45945945945945943

In [4]:
b = beta2/(1 + beta2 + beta2**2)
b

0.3304178814382896

In [5]:
b * beta2

0.28085519922254615

<img src="images/explain7b_2.JPG">