This project allows users to search technical blogs!
There are three core modules:
-
crawler.py
— a web crawler that utilizes therequests
module to crawl entire websites from a seed file indb/seed.yaml
-
indexer.py
— an indexer which exports functionality to store websites in an SQLite database with efficient indexing -
query.py
— a query module which, given a search term, retrieves and ranks websites by three heuristics (frequency, location, distance)
A Flask application in server.py
serves a React application that allows users to interact with the project.
sudo ufw enable
sudo ufw allow 8080
sudo apt update
sudo apt install gunicorn
sudo apt install python3-pip
pip3 install -r requirements.txt
bash setup.sh
Run:
bash start.sh