# **Movies Recommendation System**
## 📌 Overview

The Movie Recommendation System is a data-driven application designed to suggest movies based on user preferences. This system utilizes TF-IDF vectorization, cosine similarity, and interactive widgets to generate personalized recommendations. With a dataset containing 25 million rows, the system efficiently processes large-scale data to provide accurate and relevant movie suggestions.

![alt text](<image/Screenshot 2025-01-28 at 23.03.03.png>)

### **Data Collection** - [Movie Data](https://www.kaggle.com/datasets/parasharmanas/movie-recommendation-system/data)

### **Importing the libraries**

In [12]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import ipywidgets as widgets
from IPython.display import display, clear_output

### **Load Data**

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("parasharmanas/movie-recommendation-system")

print("Path to dataset files:", path)

In [14]:
import os
print(os.listdir(path))  # Lists all files in the directory

['ratings.csv', 'movies.csv']


In [15]:
# Assuming the dataset file 
movies_path = os.path.join(path, 'movies.csv' )
ratings_path = os.path.join(path, 'ratings.csv' )

movies = pd.read_csv(movies_path, skiprows=[1])
ratings = pd.read_csv(ratings_path, skiprows=[1])

In [16]:
# View first rows and columns
movies.head()

Unnamed: 0,movieId,title,genres
0,2,Jumanji (1995),Adventure|Children|Fantasy
1,3,Grumpier Old Men (1995),Comedy|Romance
2,4,Waiting to Exhale (1995),Comedy|Drama|Romance
3,5,Father of the Bride Part II (1995),Comedy
4,6,Heat (1995),Action|Crime|Thriller


In [17]:
# View first rows and columns
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,306,3.5,1147868817
1,1,307,5.0,1147868828
2,1,665,5.0,1147878820
3,1,899,3.5,1147868510
4,1,1088,4.0,1147868495


In [18]:
# Checking Shape of data
print("Shape of the movie data:", movies.shape)
print("Shape of the ratings data:", ratings.shape)

Shape of the movie data: (62422, 3)
Shape of the ratings data: (25000094, 4)


In [19]:
print("Missings values for movie data:",movies.isnull().sum().sum())
print("Missings values for ratings data:",ratings.isnull().sum().sum())

Missings values for movie data: 0
Missings values for ratings data: 0


In [20]:
print("duplicate values for movie data:", movies.duplicated().sum())
print("duplicate values for movie data:", ratings.duplicated().sum())

duplicate values for movie data: 0
duplicate values for movie data: 0


In [21]:
# Compute average rating for each movie
movie_ratings = ratings.groupby("movieId")["rating"].mean().round(1).reset_index()
movies = movies.merge(movie_ratings, on="movieId", how="left")

 **TD-IDF vectorisation from sk-learn will be used to build the search engine**

In [22]:
# Initialize a TF-IDF Vectorizer on the 'title' column
tfidf_vectorizer = TfidfVectorizer(ngram_range=(1, 2))
tfidf_matrix = tfidf_vectorizer.fit_transform(movies["title"])

**Cosine similarity is use to determine the output**

In [23]:
# Function to find similar movies
def search(title):
    query_vector = tfidf_vectorizer.transform([title])  # Transform input title
    similarity = cosine_similarity(query_vector, tfidf_matrix).flatten()  # Compute similarity
    indices = np.argpartition(similarity, -5)[-5:]  # Get indices of top 5 similar movies
    results = movies.iloc[indices].iloc[::-1]  # Retrieve movie details in descending order of similarity
    return results[["movieId", "title", "rating", "genres"]]  # Include genres and ratings in the output


**Finally, ipywidgets was implemented in the search engine as a widget**

In [24]:
# Movie search input widget
movie_input = widgets.Text(
    description='Movie Title:',
    disabled=False
)
movie_list = widgets.Output()

# Callback function for dynamic search
def on_type(data):
    with movie_list:
        clear_output(wait=True)
        title = data["new"]
        if len(title) > 2:  # Ensure input is meaningful
            print(f"Searching for: {title}")
            results = search(title)
            display(results)

# Attach event listener to the movie input widget
movie_input.observe(on_type, names='value')

# Display the search interface
display(movie_input, movie_list)

Text(value='', description='Movie Title:')

Output()