Search Engine over Wikipedia articles about Algorithms

Project Intro/Objective

This is my final project for the Text Mining course at Harbour.Space University, 2022.

There are a only a few specific categories of algorithms.

Technologies

Libraries used: spacy, nltk, numpy, sklearn, streamlit

Technical Description

Uses TFIDF vectorizer to vectorize documents.
Uses KDtrees to index the vectors and execute queries faster.
Uses Levenshtein Distance and Longest Common Prefix to find the closest words to the words given in the input.

Getting Started

Clone this repo (for help see this tutorial).
Unzip the file wiki_data.zip ZIP file.
Make sure to install streamlit in your environment.
Run from your terminal python -m spacy download en_core_web_sm.
Run streamlit run app.py to start the app. It should open a tab in your browser automatically, where you can search for different algorithms and you'll receive wikipedia links.

Contact

Email: anier.velasco@gmail.com
Telegram: https://t.me/aniervs
Github: https://github.com/aniervs
LinkedIn: https://www.linkedin.com/in/aniervs

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
app.py		app.py
demo.png		demo.png
readme.md		readme.md
search_engine.py		search_engine.py
wiki_data.zip		wiki_data.zip
wikipedia.ipynb		wikipedia.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search Engine over Wikipedia articles about Algorithms

Project Intro/Objective

Technologies

Technical Description

Getting Started

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Search Engine over Wikipedia articles about Algorithms

Project Intro/Objective

Technologies

Technical Description

Getting Started

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages