Skip to content
This repository has been archived by the owner on Jun 6, 2020. It is now read-only.

A simple search engine for documents

License

Notifications You must be signed in to change notification settings

SimoneStefani/simple-search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Royal Institute of Technology KTH - Stockholm

Simple Search Engine

GitHub Actions status

A simple search engine to index a corpus of documents and search for words with specific query paramteres. This project is part of the course ID1020 Algorithms and Data Structures.

This repository contains code written during the fall semester 2016 by Simone Stefani

Structure

alt text

Description

  • Index: a HashMap that contains all the indexed words as word-list_of_postings key-value pairs.
  • ResultDocument: an object that links a word (or a set of word) with a document that contains it. It refers to a specific document and carries properties related to the words such as hits, populairty and relevance (as tf-idf).

The search engine contains other two HashMaps:

  • DocumentsLength: keeps track of the length of each processed document.
  • Cache: contains cached queries

The the postings (resultDocuments) for each word are sorted dynamically at insertion. Consequently they can be retrieved through binary search.

When the user input query string is processed a parsedQuery is returned in the form of nested sub-query objects. Consequently when searching for a complex query, the parsedQuery can be analysed recursively and the fundamental queries can be then combined with operators.

Releases

No releases published

Packages

No packages published

Languages