---

# PageRank Algorithm Implementation

## Problem Statement

Implement Page Rank Algorithm. (Use python or beautiful soup for implementation).


## Background

PageRank is a link analysis algorithm that was developed by Larry Page and Sergey Brin, the founders of Google. It is designed to rank web pages in a way that reflects their importance or influence based on the structure of the web, particularly the number and quality of links pointing to a page.

## Problem Description

The code provided implements the PageRank algorithm on a small 4-page web graph using the following steps:

1. **Web Graph Representation**: The web graph is represented as an adjacency matrix, where rows represent source pages, columns represent target pages, and a link from page i to page j is represented by a 1 in the matrix at (i, j).

2. **Calculating Out-Degree**: The out-degree of each page is calculated, which is the number of links originating from that page.

3. **Initializing PageRank**: PageRank values are initialized uniformly for all pages. In this example, they are set to be equal.

4. **PageRank Damping Factor**: A damping factor is introduced to simulate the behavior of web surfers who might occasionally jump to random pages. In this example, the damping factor is set to 0.85.

5. **Iterative PageRank Calculation**: The PageRank values are iteratively updated. At each iteration, the PageRank of a page is recalculated based on the PageRank values of the pages linking to it and the damping factor.

6. **Final PageRank Values**: The final PageRank values are printed for each page.

### Input

The input for this problem is a web graph represented as an adjacency matrix. Each cell in the matrix indicates whether there is a link from one page to another.

### Output

The output is a set of PageRank values for each page in the web graph. These values indicate the importance and influence of each page within the graph.

### Potential Improvements

In practice, PageRank is applied to much larger web graphs. The algorithm can be improved and optimized for handling real-world datasets. Additionally, link quality and other factors can be incorporated into PageRank calculations for more accurate results.

---


In [1]:
import numpy as np
'''
Example web graph as an adjacency matrix
Rows represent source pages, columns represent target pages
A link from page i to page j is represented by a 1 in the matrix at (i, j)
'''
adjacency_matrix = np.array([
    [0, 1, 1, 0],
    [1, 0, 1, 0],
    [0, 1, 0, 1],
    [0, 0, 1, 0]
])

In [2]:
# Calculating out-degree of each page
out_degree = np.sum(adjacency_matrix, axis=1)

n_pages = len(adjacency_matrix)
initial_pagerank = np.ones(n_pages) / n_pages
damping_factor = 0.85

In [3]:
pagerank = initial_pagerank.copy()
for _ in range(10):
    new_pagerank = np.zeros(n_pages)
    for i in range(n_pages):
        for j in range(n_pages):
            if adjacency_matrix[j, i] == 1:
                new_pagerank[i] += pagerank[j] / out_degree[j]
    pagerank = (1 - damping_factor) * (initial_pagerank) + damping_factor * new_pagerank
    
for i, pr in enumerate(pagerank):
    print(f"Page {i + 1}: {pr:.4f}")

Page 1: 0.1483
Page 2: 0.2674
Page 3: 0.3817
Page 4: 0.2026


1. **Page 1: 0.1483**
   - This value represents the PageRank score for Page 1.
   - Page 1 has a relatively low PageRank score of 0.1483.
   - Page 1 has incoming links from Page 2 and Page 3.

2. **Page 2: 0.2674**
   - This value represents the PageRank score for Page 2.
   - Page 2 has a higher PageRank score than Page 1, indicating that it is considered more important.
   - Page 2 has incoming links from Page 1 and Page 3.

3. **Page 3: 0.3817**
   - This value represents the PageRank score for Page 3.
   - Page 3 has the highest PageRank score among all the pages.
   - Page 3 has incoming links from Page 2 and Page 4.
   - The high score indicates that Page 3 is considered the most important or influential page in the web graph.

4. **Page 4: 0.2026**
   - This value represents the PageRank score for Page 4.
   - Page 4 has a PageRank score of 0.2026, which is lower than Page 2 and Page 3 but higher than Page 1.
   - Page 4 has an incoming link from Page 3.

In the context of the PageRank algorithm:

- Pages with higher PageRank scores are considered more important or influential in the web graph.
- The PageRank algorithm takes into account both the number of incoming links to a page and the importance of the linking pages. A page with high-quality incoming links will have a higher PageRank.
- The damping factor (0.85 in this example) introduces a level of randomness, ensuring that all pages have a chance to receive PageRank, even if they have no incoming links.

The PageRank values represent the importance or influence of each page in the web graph based on the link structure and the PageRank algorithm's principles. Page 3 is the most influential in this specific example.