This repository contains a Python implementation of the Cosine Similarity algorithm without using any machine learning libraries. The implementation ranks documents based on their similarity to a given query using TF-IDF (Term Frequency - Inverse Document Frequency) and cosine similarity.
- Splits documents into tokens and calculates term frequencies.
- Computes inverse document frequency (IDF) for each term.
- Calculates TF-IDF weights for documents and the query.
- Computes cosine similarity scores to rank documents based on relevance.
- Visualizes the similarity scores using a bar chart.
- Clone the repository:
git clone https://github.com/shaheerAlam1/Cosine-Similarity-Algorithm-with-python.git cd Cosine-Similarity-Algorithm-with-python
- Install dependencies (if not already installed):
pip install pandas matplotlib
- Run the notebook or script to see the ranking results.
"stock exchange pakistan"
- "Market of stock exchange is affected by brokers."
- "Pakistan stock market is very popular."
- "Stock exchange Pakistan is in loss nowadays."
The algorithm ranks the documents based on their relevance to the query using cosine similarity, with the highest-ranking document being the most relevant.
- Python 3
pandas
matplotlib