 Lab Assignment 1: Implementing the Vector Space Model for Information Retrieval
•	Implement a simple search engine using the TF-IDF vectorization method.
•	Use a small dataset of documents and allow the user to input a query.
•	Compute cosine similarity to retrieve the most relevant documents.
•	Use Scikit-learn’s TfidfVectorizer for vectorization.

In [1]:
# Step 1: Import required libraries
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Step 2: Define your small dataset of documents
documents = [
    "The sky is blue and beautiful.",
    "Love this blue and bright sky!",
    "The quick brown fox jumps over the lazy dog.",
    "A king's breakfast has sausages, ham, bacon, eggs, toast and beans",
    "I love green eggs, ham, sausages and bacon!",
    "The brown fox is quick and the blue dog is lazy!",
    "The sky is very blue and the sky is very beautiful today",
    "The dog is lazy but the brown fox is quick."
]

# Step 3: Preprocessing (optional - lowercase, etc.)
processed_docs = [doc.lower() for doc in documents]

# Step 4: Vectorize using TF-IDF
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(processed_docs)

# Step 5: Accept user query
query = input("Enter your search query: ").lower()

# Step 6: Vectorize the query using the same vectorizer
query_vector = vectorizer.transform([query])

# Step 7: Compute cosine similarity
cos_similarities = cosine_similarity(query_vector, tfidf_matrix).flatten()

# Step 8: Rank documents based on similarity score
top_doc_indices = cos_similarities.argsort()[::-1]

# Step 9: Display results
print("\nTop Relevant Documents:")
for idx in top_doc_indices:
    score = cos_similarities[idx]
    if score > 0:
        print(f"Score: {score:.4f} | Document: {documents[idx]}")
    else:
        print("No more relevant documents.")
        break

Enter your search query: How is the sky

Top Relevant Documents:
Score: 0.6914 | Document: The sky is blue and beautiful.
Score: 0.6621 | Document: The sky is very blue and the sky is very beautiful today
Score: 0.5199 | Document: The brown fox is quick and the blue dog is lazy!
Score: 0.5058 | Document: The dog is lazy but the brown fox is quick.
Score: 0.2385 | Document: Love this blue and bright sky!
Score: 0.2334 | Document: The quick brown fox jumps over the lazy dog.
No more relevant documents.
