Skip to content

alaouimehdi1995/simplified-search-engine

Repository files navigation

Simplified Searching Engine

Build Status codecov

that crawls, scraps, indexes data and stores it into a database

The program is written in Python Language, uses regex to parse HTML, and MultiThreading to go faster. The database part is assured by MongoDB The Project contains 4 files:

PersonnalParser.py:

- Contains PersonnalParser class, that gets HTML content, parses it, stores it and starts new PersonnalParser Thread for each link in the page content.

DBManager.py

- Contains DBManager class, which assure the connexion with DB and inserting and/or finding operations.

fill_database.py:

- Contains the general settings like start URL, proxy settings and depth search. The first crawl Thread starts here.

main.py

- Contains the code that gets the user search, gets the database content and sorts the results by relevance.