Skip to content

cosmo-kramer/search_util

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

search_util

An intelligent search utility for big e-libraries based on ranking mechanism

Run shas.py to search through files in test_data folder. You need to first index all files by running page_break.py once on dataset

Indexing Mechanism: It stems each word using porter stemmer algorithm and then stores count of each word in each file and based on length of line it gives importance like short lines are more probable to be a heading and so more importance.

Searching Mechanism: It uses [product_over_all_i (1+lambda*ai)] formula to calculate where ai is importance of ith word in a page and lambda is just a constant so that when product expands as a polynmial the file containing all the words that is no ai=0 will have highest power of lambda thus giving relatively higher weightage.

About

An intelligent search utility for big e-libraries based on ranking mechanism similar to tf/idf

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages