search_util

An intelligent search utility for big e-libraries based on ranking mechanism

Run shas.py to search through files in test_data folder. You need to first index all files by running page_break.py once on dataset

Indexing Mechanism: It stems each word using porter stemmer algorithm and then stores count of each word in each file and based on length of line it gives importance like short lines are more probable to be a heading and so more importance.

Searching Mechanism: It uses [product_over_all_i (1+lambda*ai)] formula to calculate where ai is importance of ith word in a page and lambda is just a constant so that when product expands as a polynmial the file containing all the words that is no ai=0 will have highest power of lambda thus giving relatively higher weightage.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
page_break.py		page_break.py
search.py		search.py
shas.py		shas.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

page_break.py

page_break.py

search.py

search.py

shas.py

shas.py

Repository files navigation

search_util

About

Releases

Packages

Languages

cosmo-kramer/search_util

Folders and files

Latest commit

History

Repository files navigation

search_util

About

Resources

Stars

Watchers

Forks

Languages