Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Octocat-spinner-32 README first commit April 15, 2012
Octocat-spinner-32 Search_engine.py first commit April 15, 2012
README
Hello Everyone, I am @dileep98490

This is a simple search engine, I implemented in Python. Thanks to Udacity, which helped me in building it. I blogged my experiences in building this here,
http://buildsearchengine.blogspot.in/

This is a Python console program. The inputs that you have to provide when it asks are as follows:

Seed Page - This is the page, from which it starts crawling the web. Give the web url of a good seed page, that has ample links on it, so that it can crawl into those pages and again crawl from those pages into other pages

eg:http://opencvuser.blogspot.in

Search term - The term you want to search. Soon, I will add support for querying of multiple words, but for now, give a single word

eg:is

Maximum depth - This is the number of links to crawl completely. It would take 30 second for first link and for second link 60 seconds and it keeps on doubling. So, maximum 10 links are more than enough, I would say

eg:10

After you give the above three inputs, the program starts running. It may take a lot of time, to crawl depending on the depth you have specified. So the depth number, will be visible, decrementing itself, when ever a link is completely crawled; so that when it reaches 0 you know the crawling ended.

Also, I used the page rank algorithm (compute_ranks module), which is exactly what has been used in the initial days of google. The page ranks are displayed alongside the links after the search results are shown. Then the program sorts them, and presents the sorted results. For the sake of viewability, I included all these. But you can comment out the print statements in Look_up_new module to remove them.
Something went wrong with that request. Please try again.