Built an inverted index of provided HTML files related to UC Irvines web pages within its domain. Developed a search component where document retrieval was tested with queries based on tf-idf scoring. Integrated components together to develop a complete search engine. Tested a set of 20 queries that deleiverd accurate HTML files based on the calculated tf-idf scoring.
To use the search engine:
- Create an inverted Index saved on your disk by running invertedIndexer.py. Change the saved_folder variable value to a saved path on your PC. Then enter the DEV folder path for the inverted index to be created.
- After the inverted index was created successfully, run the SearchEngine module.
- It will prompt the user if he or she wants to search. Enter ‘y’ or ‘yes’.
- Input the query you want to search for.
- Relevant URLs to the users search query will appear in the console box after pressing ‘Enter’.
- Input ‘q’ or ‘quit’ to exit the search.