Skip to content

Search engine with all features of the famous search engines

Notifications You must be signed in to change notification settings

EngPeterAtef/SearchEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Noodle

Description

Crawler-based search engine that demonstrates the main features of a search engine (web crawling, indexing and ranking) and the interaction between them.

How does a search engine work?

  • Web Crawler (AKA Spider, Robot)

  • A software program that traverses web pages, downloads them and follows the hyperlinks that are referenced on these pages.
  • Indexer

  • Processes the downloaded HTML documents from the crawler, builds a data structure that stores the words contained in each document in the form of (inverted file) and their importance.
  • Ranker

  • Sorts documents based on their popularity and relevance to the search query.

Features

  • Voice recognition search
  • Suggestion mechanism that auto completes your search query
  • Single and multiple words searching
  • Phrase searching
  • Result appears with snippets of the text containing queries words
  • Pagination of results

Implemetation

The main search engine modules (Crawler, Indexer, Ranker) are implemented completely using Java and built using Gradle.
The website is implemented using Java Spring framework and Thymeleaf template engine which leads us to having the whole project written in java.
The repo contains another version of the website written in NodeJS but it isn't updated to our latest ranking algorithms due to time limitations, however, this version was not required, we were just practising NodeJS :)

How To Run

Crawler / Indexer:
  1. Open "Engine" as a project folder using your IDE.
  2. Open build.gradle and click "load gradle changes".
  3. To run the Crawler:
    Run the main function in Crawler.java
    To run the Indexer:
    Run the main function in Indexer.java
Website:
  1. Open Website/Spring as a project folder using your IDE.
  2. Run the main function in Application.java
  3. Go to localhost:8080/ and start searching!

Limitations

  • Order of words in the query doesn't affect the result, ex: (barcelona team) is equal to (team barcelona).
  • Search is slow when query contains multiple words and all the query words exists in a large number of pages.
  • All snippets are shown in lowercase.
  • Snippets in multiple word query may not work properly.
  • Snippets may not contain the query word.
  • Complex phrase searching is not supported, ex: ("Mark Zuckerberg" "Twilight Saga") is treated as ("Mark Zuckerberg Twilight Saga").

Authors

  • Bemoi Erian
  • Mark Yasser
  • Peter Atef
  • Doaa Ashraf

About

Search engine with all features of the famous search engines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •