Skip to content

A simple search engine that has been implemented for the Information Retrieval course

Notifications You must be signed in to change notification settings

KooroshRH/Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search Engine

This project comporises Indexer, Tokenizer, QueryProcessor parts. Also, it uses a helper code named FileWorker for loading dataset and saving checkpoints for indexer and tokenizer sections.

Method

In QueryProcessor side, we use TF-IDF algorithm for processing every user's query. Also, for determining the similarities between the user query and each document's representation, we use Cosine similarity function in vector space.

NOTE: This project's data preprocessing and augmentation parts are based on persian language.

How it works?

To run this search engine, we have to run main file. First, tokenizer and indexer instances will be created. After that and with initializing the fileWorker instance, we can load dataset with either fileIndex or labeledFileIndex function from fileWorker class.
In the end, after some preprocessings, we define the queryProcessor instance with passing the indexer and the tokenizer to it's constructor. We can write our queries in terminal with calling the startListening function.

Resources

Dependencies (JAR format)

Datasets

About

A simple search engine that has been implemented for the Information Retrieval course

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages