Vector_Space_Model

In this project, I developed a search engine that utilizes the Vector Space Model (VSM) for information retrieval. The VSM is a mathematical model that represents text documents as vectors in a high-dimensional space, where the dimensions correspond to the different terms in the document. The similarity between two documents is then calculated based on the cosine similarity between their corresponding vectors.

To implement the VSM, I first preprocessed the text data by removing stop words and stemming the remaining words to their root forms. I then constructed a document-term matrix, where each row represented a document and each column represented a term. The values in the matrix were the frequency of each term in each document.

Next, I transformed the document-term matrix into a term-frequency inverse-document-frequency (TF-IDF) matrix, which assigns a weight to each term based on its frequency in the document and its rarity across all documents. This helps to reduce the importance of common terms and increase the importance of rare terms.

Finally, I used the TF-IDF matrix to calculate the cosine similarity between the query and each document in the corpus, and ranked the documents based on their similarity scores.

The search engine was evaluated using a test dataset and achieved high precision and recall scores, demonstrating the effectiveness of the VSM for information retrieval.

This project demonstrates my proficiency in natural language processing, information retrieval, and machine learning techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Dataset		Dataset
A2.py		A2.py
CS4051- IR-A2-Spring 2023.pdf		CS4051- IR-A2-Spring 2023.pdf
README.md		README.md
Stopword-List.txt		Stopword-List.txt
Your_img.jpg		Your_img.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset

Dataset

A2.py

A2.py

CS4051- IR-A2-Spring 2023.pdf

CS4051- IR-A2-Spring 2023.pdf

README.md

README.md

Stopword-List.txt

Stopword-List.txt

Your_img.jpg

Your_img.jpg

Repository files navigation

Vector_Space_Model

About

Releases

Packages

Languages

MehwishSameer/Vector_Space_Model

Folders and files

Latest commit

History

Repository files navigation

Vector_Space_Model

About

Resources

Stars

Watchers

Forks

Languages