This is the Search Engine Project for CS 121, Group 20. This is a search engine for a selection of subdomains that belong to UC Irvine's School of ICS. It instantly (usually within 150 ms) shows top-ranking results among 55,000+ webpages.
Goup 20 members:
The project has several segments:
- text extraction from HTML format
- tokenization and stemming
- index creation
- given a query, rank documents' relevance using Cosine Similarity
Visit this search engine website and simply type in your query and search.
For example, try searching VR gaming
, machine learning
, the name of your favorite professor, and more...
- Download and unzip the directory containing information of all the scraped webpages.
- run
M1
function insidemain.py
to create the full index file. This process may take some time.
- a) Search in Terminal
- run
M2n3
function insidemain.py
and interact with the prompt
- run
- b) Host Webpage and Its Backend
- run the command
python3 flask_server.py
in terminal. You can change the port insideflask_server.py
as you wish. - visit the host machine's
IP_address:port_number
using a browser to access the Web search interface.
- run the command