#  **PageRank**:  _The Math Behind Google Search_

---

# **Itroduction**

## **Abstract**

**How does Google decide which web pages appear first in your search results?**

Answer is **PageRank** algorithm, which is used by Google Search to rank web pages in their search engine results. It's use ideas from graph theory, linear algebra, and probability to model the importance of web pages based on their links.Originally developed by Larry Page and Sergey Brin at Stanford, **PageRank** was the foundation of Google's early success.

This project explores and explain the **mathematics**,**code** and **general idea** behind the algorithm, implements it step by step using **Python**, and visualizes how different web structures influence the final rankings. Along the way, I highlight the elegance of applying **eigenvectors**, **stochastic matrices**, and the **random surfer model** to real-world problems. By demystifying PageRank, we gain insight into the intersection of **math**, **data**, and the **internet**.


---

 ## **Why PageRank is important?**

PageRank was one of the first real-world algorithms to show how *mathematics and graph theory* can be used to navigate the complex structure of the web. It also demonstrates how pure math can have massive practical impact — affecting **billions of users** every day. Understanding **PageRank** gives us a deeper appreciation for applied **mathematics** in **data science**, **search engines**, **recommendation systems**, and **network analysis**.

---

## **The Problem of Web Search Before PageRank?**

Before Google and the introduction of PageRank, web search engines struggled to deliver high-quality, relevant results. As the internet rapidly expanded during the 1990s, the number of web pages grew into the millions, then billions — but search technology failed to keep up with that growth in a meaningful way.

Most early search engines (like AltaVista, Lycos, and Excite) relied heavily on simple keyword matching to retrieve results. They would look for pages that contained the same words as the user's query, and rank them based on factors like:

Frequency of the keyword

Location of the keyword (e.g., title or body)

Basic metadata

While this approach was easy to implement, it had major weaknesses:

It was easy to manipulate (via keyword stuffing).

It treated all pages as equally trustworthy or important.

It didn’t consider how humans value content — through references and links.

There was no good way to determine which pages were actually important or trustworthy. A personal blog and a university website could rank equally if they both mentioned the same keywords.

This meant that relevance and quality often suffered.

Search engines needed a way to:

Filter spammy or low-quality pages

Highlight content that was trusted or referenced by others

Deliver more objective and useful results for users

This set the stage for a fundamentally new idea: using link structure to infer page importance — the idea behind PageRank.

---

## **History of PageRank**

A search engine called **RankDex** from IDD Information Services, designed by **Robin Li** in 1996, developed a strategy for site-scoring and page-ranking.It was one of the first search engines to use **link analysis**-which is ranking the popularity of a web site based on how many other sites had linked to it for ranking pages meaning it also considered hyperlink structure, not just keywords.**RankDex**, the first search engine with page-ranking and site-scoring algorithms, was launched in 1996.**Li** filed a patent for the technology in **RankDex** in 1997.It was granted in 1999.He later used it when he founded Baidu in China in 2000.Google founder **Larry Page** referenced **Li's work** as a citation in some of his U.S. patents for **PageRank**.
**Larry Page** and **Sergey Brin** developed **PageRank** at Stanford University in 1996 as part of a research project about a new kind of search engine. An interview with Héctor García-Molina, Stanford Computer Science professor and advisor to Sergey,provides background into the development of the page-rank algorithm.Sergey Brin had the idea that information on the web could be ordered in a hierarchy by **link popularity**-a page ranks higher as there are more links to it.The system was developed with the help of Scott Hassan and Alan Steremberg, both of whom were cited by Page and Brin as being critical to the development of Google. Rajeev Motwani and Terry Winograd co-authored with Page and Brin the first paper about the project, describing **PageRank** and the initial prototype of the [Google search engine](https://en.wikipedia.org/wiki/Google_Search), published in 1998.
> 📜 [*"The Anatomy of a Large-Scale Hypertextual Web Search Engine"*](http://infolab.stanford.edu/pub/papers/google.pdf)

Shortly after, Page and Brin founded **Google** -the company behind the Google search engine. While just one of many factors that determine the ranking of Google search results, **PageRank** continues to provide the basis for all of Google's web-search tools.The name **PageRank** plays on the name of developer **Larry Page**, as well as of the concept of a web page.The word is a trademark of Google, and the **PageRank** process has been patented assigned to **Stanford University** and not to Google.Google has exclusive license rights on the patent from **Stanford University**. The university received **1.8 million** shares of Google in exchange for use of the patent.It sold the shares in 2005 for **$336 million**.

## 📜 Timeline: Evolution of Web Search and the Birth of PageRank

The history of web search is a fascinating story of innovation, trial and error, and groundbreaking breakthroughs. Below is a simplified timeline showing the key developments that led to the invention of the PageRank algorithm.

---

### 🕰️ Web Search Evolution Timeline

| Year | Milestone | Description |
|------|-----------|-------------|
| **1994** | **AltaVista** | One of the first popular full-text web search engines. Used keyword-based ranking. Powerful, but easily manipulated and not very relevant for many queries. |
| **1996** | **RankDex (Robin Li)** | Introduced the idea of ranking pages based on link analysis. RankDex used the link structure of the web as a signal of relevance—an idea that directly influenced PageRank. |
| **1998** | **PageRank (Page & Brin)** | Developed at Stanford, PageRank formalized a mathematical model using a random surfer and eigenvector centrality to rank web pages based on link authority. |
| **1998** | **Google Founded** | Built on the foundation of PageRank, Google launched with the promise of more relevant and authoritative search results. This approach soon revolutionized the web. |


---

## 🧾 Key Insights from “The Anatomy of a Large-Scale Hypertextual Web Search Engine” (1998)

As I mentioned a little while ago in 1998, Larry Page and Sergey Brin published a paper titled "**The Anatomy of a Large-Scale Hypertextual Web Search Engine.**" This work introduced **Google** to the world and laid the foundation for modern web search engines. At the heart of their prototype was a novel ranking algorithm called **PageRank**, which changed the way search results were ordered on the web.

**In this section, we will introduce you to the most important things from fondational paper.**

Part of their abstract theiy said:

> *“Apart from the problems of scaling
traditional search techniques to data of this magnitude, there are new technical challenges involved
with using the additional information present in hypertext to produce better search results. This
paper addresses this question of how to build a practical large-scale system which can exploit the
additional information present in hypertext. Also we look at the problem of how to effectively deal
with uncontrolled hypertext collections where anyone can publish anything they want.”*

This is their introduction of the project they made:
> *“
The web creates new challenges for information retrieval. The amount of information on the web is
growing rapidly, as well as the number of new users inexperienced in the art of web research. People are
likely to surf the web using its link graph, often starting with high quality human maintained indices
such as Yahoo! or with search engines. Human maintained lists cover popular topics effectively but are
subjective, expensive to build and maintain, slow to improve, and cannot cover all esoteric topics.
Automated search engines that rely on keyword matching usually return too many low quality matches.
To make matters worse, some advertisers attempt to gain people’s attention by taking measures meant to
mislead automated search engines. We have built a large-scale search engine which addresses many of
the problems of existing systems. It makes especially heavy use of the additional structure present in
hypertext to provide much higher quality search results. We chose our system name, Google, because it
is a common spelling of googol, or 10100 and fits well with our goal of building very large-scale search
engines.”*

Their words about making **PageRank** Algorithm:
> *“Creating a search engine which scales even to today’s web presents many challenges. Fast crawling
technology is needed to gather the web documents and keep them up to date. Storage space must be used
efficiently to store indices and, optionally, the documents themselves. The indexing system must process
hundreds of gigabytes of data efficiently. Queries must be handled quickly, at a rate of hundreds to
thousands per second.
These tasks are becoming increasingly difficult as the Web grows. However, hardware performance and
cost have improved dramatically to partially offset the difficulty. There are, however, several notable
exceptions to this progress such as disk seek time and operating system robustness. In designing Google,
we have considered both the rate of growth of the Web and technological changes. Google is designed to
scale well to extremely large data sets. It makes efficient use of storage space to store the index. Its data
structures are optimized for fast and efficient access (see section 4.2). Further, we expect that the cost to
index and store text or HTML will eventually decline relative to the amount that will be available (see
Appendix B). This will result in favorable scaling properties for centralized systems like Google.”*

Here you can see their **Design Goals**:
> *“Our main goal is to improve the quality of web search engines. In 1994, some people believed that a
complete search index would make it possible to find anything easily. According to Best of the Web
1994 -- Navigators, "The best navigation service should make it easy to find almost anything on the
Web (once all the data is entered)." However, the Web of 1997 is quite different. Anyone who has used
a search engine recently, can readily testify that the completeness of the index is not the only factor in
the quality of search results. "Junk results" often wash out any results that a user is interested in. In fact,
as of November 1997, only one of the top four commercial search engines finds itself (returns its own
search page in response to its name in the top ten results). One of the main causes of this problem is that
the number of documents in the indices has been increasing by many orders of magnitude, but the user’s
ability to look at documents has not. People are still only willing to look at the first few tens of results.
Because of this, as the collection size grows, we need tools that have very high precision (number of
relevant documents returned, say in the top tens of results). Indeed, we want our notion of "relevant" to
only include the very best documents since there may be tens of thousands of slightly relevant
documents. This very high precision is important even at the expense of recall (the total number of
relevant documents the system is able to return). There is quite a bit of recent optimism that the use of
more hypertextual information can help improve search and other applications \[Marchiori 97\] \[Spertus
97\] \[Weiss 96\] \[Kleinberg 98\]. In particular, link structure \[Page 98\] and link text provide a lot of
information for making relevance judgments and quality filtering. Google makes use of both link
structure and anchor text (see Sections 2.1 and 2.2).
Another important design goal was to build systems that reasonable numbers of people can actually use.
Usage was important to us because we think some of the most interesting research will involve
leveraging the vast amount of usage data that is available from modern web systems. For example, there
are many tens of millions of searches performed every day. However, it is very difficult to get this data,
mainly because it is considered commercially valuable.
Our final design goal was to build an architecture that can support novel research activities on
large-scale web data. To support novel research uses, Google stores all of the actual documents it crawls
in compressed form. One of our main goals in designing Google was to set up an environment where
other researchers can come in quickly, process large chunks of the web, and produce interesting results
that would have been very difficult to produce otherwise. In the short time the system has been up, there
have already been several papers using databases generated by Google, and many others are underway.
Another goal we have is to set up a Spacelab-like environment where researchers or even students can
propose and do interesting experiments on our large-scale web data. 
> ”*





## **Why PageRank Was Revolutionary?**


How i mentonied before most search engines focused only on matching words, not on evaluating the importance of pages.PageRank introduced a completely new way of thinking about web pages. Instead of treating every page as equal, it looked at how other pages linked to it:
A page is important if important pages link to it.



The PageRank algorithm is based on the idea of a "random surfer" — someone who starts on a web page and randomly follows links:

Sometimes they click a link on the page.

Sometimes they get bored and jump to a random new page.

By simulating this behavior across the web, PageRank calculates the steady-state probability of landing on any given page. That value becomes the page’s ranking score.

## 🌍 **Impact of the PageRank Algorithm**

The introduction of the PageRank algorithm by Larry Page and Sergey Brin was a turning point in the history of the internet. It didn’t just improve search — it transformed how we interact with information, and laid the foundation for many fields in modern computing.

**Points of impact**:


### 1. Revolutionized Search Engines
- **Authority-based ranking** 
- **Harder to manipulate**
- **Better user experience**
---

### 2. Academic & Scientific Influence
- It sparked research in **graph theory**, **stochastic processes**, and **network science**.
- It’s now a standard part of **information retrieval**, **machine learning**, and **data science** curricula.
- It inspired related algorithms based on **eigenvector centrality**, **random walks**, and **spectral graph theory**.
Its mathematical elegance and practical power made it a landmark algorithm.
---

### 3. Birth of the Attention Economy
Webmasters and creators began optimizing their pages for **links from authoritative sites**.
**Search engine optimization (SEO)** became a major industry, focused on visibility and link-building.
This contributed to the rise of the **attention economy**, where visibility equates to value.

---

###  4. Foundation for Ranking Systems Everywhere
PageRank’s influence spread far beyond web search:

- **Recommendation systems**: Early **YouTube** and **Netflix** systems used similar link- or graph-based models.
- **Scientific publishing**: Metrics like **Eigenfactor** and **Article Influence Score** borrow from PageRank.
- **Social media**: Ranking tweets, posts, or influencers often involves centrality and link-based algorithms.
- **E-commerce**: Product rankings, seller reputations, and trust scores all use variants of this idea.

Any domain with a network structure benefits from PageRank-style reasoning.

---

###  5. The Rise of Google as a Tech Giant

At the business level, PageRank changed everything:

- It turned Google from a **PhD research project** into a **global company**.
- It gave Google a technological advantage that allowed it to scale and monetize effectively.
- Its commitment to **relevance over paid placement** earned it widespread trust and adoption.