Search Engine

This project comporises Indexer, Tokenizer, QueryProcessor parts. Also, it uses a helper code named FileWorker for loading dataset and saving checkpoints for indexer and tokenizer sections.

Method

In QueryProcessor side, we use TF-IDF algorithm for processing every user's query. Also, for determining the similarities between the user query and each document's representation, we use Cosine similarity function in vector space.

NOTE: This project's data preprocessing and augmentation parts are based on persian language.

How it works?

To run this search engine, we have to run main file. First, tokenizer and indexer instances will be created. After that and with initializing the fileWorker instance, we can load dataset with either fileIndex or labeledFileIndex function from fileWorker class.
In the end, after some preprocessings, we define the queryProcessor instance with passing the indexer and the tokenizer to it's constructor. We can write our queries in terminal with calling the startListening function.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
resources		resources
src		src
.gitignore		.gitignore
IR-project.iml		IR-project.iml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

resources

resources

src

src

.gitignore

.gitignore

IR-project.iml

IR-project.iml

README.md

README.md

Repository files navigation

Search Engine

Method

How it works?

Resources

Dependencies (JAR format)

Datasets

About

Releases

Packages

Languages

KooroshRH/Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Search Engine

Method

How it works?

Resources

Dependencies (JAR format)

Datasets

About

Topics

Resources

Stars

Watchers

Forks

Languages