## PageRank Simulation
### Charley Zhao

Randomly select a page, p, to start with and call this the current page

ii. Generate a random number, r, from [0,1). (a) If r is less than or equal to  then simulate a
click by randomly selecting a page from amongst all the neighbors of the current page and
make it the current page. (b) If the random number is greater than  then simulate a jump by
randomly selecting from amongst all the pages in the graph and make it the current page

iii. Repeat step ii walk_len times. Whichever page you are on at the end, increment a counter
for that page
To simulate PageRank itself, repeat the above 3 steps N times. Finally divide the counter associated with
each page by N to determine the PageRank. Report the page rank of the various nodes in alphabetical
order.

In [17]:
import random

In [18]:
def read_graph(fname):
    graph = dict()
    with open(fname) as f:
        for line in f:
            curr, *neighbors = line.split()
            graph[curr] = neighbors
    return graph

# which takes the name of a file with the incidence vector representation of
# the graph and returns some python representation. You are free to choose whichever
# representation for a graph you prefer (dictionary, list etc).

In [19]:
read_graph('graph-2.txt')

{'A': ['B', 'C'],
 'B': ['C', 'D', 'E'],
 'C': ['A'],
 'D': ['C', 'E'],
 'E': ['A']}

In [27]:
def random_walk(graph, walk_len=1000, beta=0.85):
    curr = random.choice(list(graph.keys()))
    for i in range(walk_len):
        r = random.random()
        if r <= beta: 
            curr = random.choice(graph[curr])
        else:
            curr = random.choice(list(graph.keys()))
    return curr
    
# which performs the random walk
# described above in steps i—iii. random_walk should return the final node it lands on after
# performing the random walk.


# i. Randomly select a page, p, to start with and call this the current page

# ii. Generate a random number, r, from [0,1). (a) If r is less than or equal to beta then simulate a
# click by randomly selecting a page from amongst all the neighbors of the current page and
# make it the current page. (b) If the random number is greater than beta then simulate a jump by
# randomly selecting from amongst all the pages in the graph and make it the current page

# iii. Repeat step ii walk_len times. Whichever page you are on at the end, increment a counter
# for that page

In [90]:
graph = read_graph('graph-2.txt')
random_walk(graph)

'A'

In [86]:
def simulate_pagerank(fname, walk_len=1000, N=1000, beta=0.85):
    random.seed(1)
    graph = read_graph(fname)
    counter = dict()
    for k in graph.keys():
        counter[k] = 0
    for i in range(N):
        node = random_walk(graph)
        counter[node]+=1
    for node in graph:
        count_val = counter[node]
        freq = count_val/N
        print(str(node) + " " + str(freq))
    
    

# is the main
# driver routine that calls read_graph, and random_walk and calculates the relative frequency
# at which the random walk process ends on a particular node. 

In [87]:
simulate_pagerank("graph-1.txt", walk_len=1000, N=1000, beta=0.85)

A 0.379
B 0.206
C 0.37
D 0.045


In [88]:
simulate_pagerank("graph-2.txt", walk_len=1000, N=1000, beta=0.85)

A 0.362
B 0.169
C 0.27
D 0.071
E 0.128


In [89]:
simulate_pagerank("wikipedia-example.txt", walk_len=1000, N=1000, beta=0.85)

A 0.033
B 0.387
C 0.331
D 0.037
E 0.094
F 0.042
G 0.024
H 0.01
I 0.013
J 0.016
K 0.013
