Skip to content
/ netquery Public

A search engine and web crawler written in Python.

License

Notifications You must be signed in to change notification settings

Jspa2/netquery

Repository files navigation

Netquery

A search engine and web crawler written in Python.

Features

  • Automatic web crawler
  • SQLite index
  • Automatic keyword extraction
  • Automatic description snippet generation
  • Keyword lemmatisation
  • Frontend search engine
  • PageRank algorithm
  • robots.txt caching and compliance
  • Robots meta tag compliance

Running

Install the requirements: pip install -r requirements.txt

Netquery consists of two components - the crawler and the search engine. For the search engine to work properly, you must run the crawler first for a few hours to generate the index. This may require some manual fine-tuning of the constants encoded in crawler/crawler.py.

Additionally, ensure that you have authorisation from your network administrator, and you have enough free space on your disk - the pagerank index can take up a few gigabytes.

The crawler can run at the same time as the search engine.

To run the search engine: python app.py

To run the crawler: python crawler/crawler.py

Screenshot

Screenshot

Licence

This project is licensed under the GNU Affero General Public License.

About

A search engine and web crawler written in Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published