## Participative Learning / Search Engine
### Movie Review Search & Recommendation Engine

#### Description:

Build a search engine where users enter keywords (e.g., "best comedy", "time travel") and the system retrieves the most relevant movie reviews from a dataset.
Additionally, similar movies can be recommended using cosine similarity.

#### Why It Fits(Information Retrieval & Extraction):

- Uses TF-IDF vectorization for indexing
- Applies cosine similarity for ranking
- Demonstrates real document retrieval
- Easy to explain during presentation

#### Dataset

IMDB Movie Reviews Dataset :
https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.corpus import stopwords
import nltk

In [3]:
df = pd.read_csv("IMDB Dataset.csv")   # Ensure filename is correct
df = df.dropna() 

In [5]:
print(df.head())

                                              review sentiment
0  One of the other reviewers has mentioned that ...  positive
1  A wonderful little production. <br /><br />The...  positive
2  I thought this was a wonderful way to spend ti...  positive
3  Basically there's a family where a little boy ...  negative
4  Petter Mattei's "Love in the Time of Money" is...  positive


In [9]:
stop_words = set(stopwords.words('english'))
def preprocess(text):
    tokens = text.lower().split()
    tokens = [word for word in tokens if word.isalpha() and word not in stop_words]
    return " ".join(tokens)

df['cleaned_review'] = df['review'].apply(preprocess)


In [11]:
# Vectorization (TF-IDF)
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(df['cleaned_review'])

print("TF-IDF indexing completed.")

TF-IDF indexing completed.


In [13]:
# Function: Search and Recommend
def search_and_recommend(query):
    query = preprocess(query)
    query_vector = vectorizer.transform([query])
    
    similarity_scores = cosine_similarity(query_vector, tfidf_matrix).flatten()
    top_indices = similarity_scores.argsort()[-5:][::-1]  # Top 5 results

    print("\n Top Search Results:\n")
    for index in top_indices:
        print(f"Review: {df.iloc[index]['review'][:200]}...")
        print(f"Sentiment: {df.iloc[index]['sentiment']}")
        print("-" * 80)

In [None]:
# User Input
while True:
    user_query = input("\nEnter keywords to search (or type 'exit' to quit): ")
    if user_query.lower() == "exit":
        break
    search_and_recommend(user_query)


Enter keywords to search (or type 'exit' to quit):  best comedy



üîç Top Search Results:

Review: Down To Earth is the best movie!!! It is SO funny, and it's really sweet too. It has a good plot and it's unique. It isn't like those movies that are all the same with the similar story lines, and it'...
Sentiment: positive
--------------------------------------------------------------------------------
Review: My wife and kids was and still is the best comedy series on TV ever made.I really enjoyed it and everyone in the u.k still watch the recaps.The Wayans bros. should all somehow be featuring together in...
Sentiment: positive
--------------------------------------------------------------------------------
Review: I find it hard to believe that this movie has such a low rating. It is arguably one of the best comedies ever made, and surely the best Bollywood comedy of the 90s. The film did not do too well on the...
Sentiment: positive
--------------------------------------------------------------------------------
Review: This is the worst exercise