Skip to content

aliyavalieva/searchEngine

Repository files navigation

searchEngine

AUTHORS:

Aliya Valieva 

Haoyuan Bai

LANGUAGE:

Python 3.9

DESCRIPTION:

The project consists of two parts, Indexer.py and SearchEngine.py.
Once you run SearchEngine.py, there should appear your constructed inverted index: DOCINDEX.db


The Indexer.py includes lemmatizing words,  calculating the TF-IDF score and handling HTML parsers.  SearchEngine.py is the main module to run,  it connects to the querying index database and retrieve 20 URLS for each query

Collecting statistics through tools including sql and sqlite.

LIBRARIES:

pip install spacy, parser, bs4, lemmatizer, Flask, sqlite3
python -m spacy download en_core_web_sm

Please download DB Browser for SQLite to have a better user friendly experience for viewing DOCINDEX.db

Running on:  http://127.0.0.1:5000/ 
	(sample interface with 20 fetched URLS)

Screen Shot 2021-03-08 at 4 57 08 AM

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages