Skip to content

Text book example of a search engine running on a simple server.

Notifications You must be signed in to change notification settings

MrPekar98/Simple-Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple-search-engine

Instructions

Clone repository with

git clone https://github.com/MrPekar98/Simple-Search-Engine.git

Prerequisites

Install Curl

sudo apt install curl -y && sudo apt install libcurl4-gnutls-dev

Install Microsoft C++ REST SDK

sudo apt-get install libcpprest-dev

To install on a platform different from a Debian-based platform, take a look at the Getting Started section here.

Run the following command to install necessary build tools

sudo apt-get install g++ make cmake git libboost-atomic-dev libboost-thread-dev libboost-system-dev libboost-date-time-dev libboost-regex-dev libboost-filesystem-dev libboost-random-dev libboost-chrono-dev libboost-serialization-dev libwebsocketpp-dev openssl libssl-dev ninja-build

Clone the respository

git clone https://github.com/Microsoft/cpprestsdk.git casablanca

Run the following set of commands (you can specify -DCMAKE_BUILD_TYPE=Release instead to build a release version)

cd casablanca
mkdir build.debug
cd build.debug
cmake -G Ninja .. -DCMAKE_BUILD_TYPE=Debug
ninja
sudo ninja install

Compile

To compile the project, simply run the command

make

Now, an executable search is built in the project root.

Note that you can specify the crawler seed set in src/config.hpp. Choosing a set of web pages with a high out-degree is recommended.

Docker

Alternatively, build the Docker image

docker build -t search .

Run the Docker container

docker run --name search -p <PORT>:<PORT> search

Set <PORT> to the port number specified in src/config.hpp. Add the flag -d to detach from the process.

Communicating With the Search Engine

A simple web page is provided with a search bar. Otherwise, you can send a simple POST request with search keyword in the request body. The search result is a simple plain text with relavant document titles and their URL.