# **Movie Recommendation System**

## <b> 1. Business Understanding </b>


In today’s digital age, streaming platforms have revolutionized the way users consume movies, offering a vast selection of films and TV shows. However, the sheer volume of available content can overwhelm users, making it challenging for them to find movies that align with their personal preferences. An intelligent movie recommendation system can help users discover films that match their tastes, enhancing user satisfaction and platform engagement. This system can boost user retention, increase viewing time, and create personalized experiences, benefiting both users and the platform.

---
## <b>  Problem Statement </b>

The problem at hand is how to help users navigate through an overwhelming collection of movies and TV shows by offering personalized recommendations. The goal is to predict user preferences based on past interactions (such as ratings) and recommend movies they are likely to enjoy. This involves building a collaborative filtering system that leverages user-item interactions, ratings, and predictions to improve user satisfaction by providing accurate and relevant movie suggestions.

## <b>  Objectives Of the Project </b>

1. Develop a Collaborative Filtering System: Build a system that uses collaborative filtering techniques (user-based or item-based) to recommend movies to users based on their previous ratings and the behavior of similar users.


2. Predict Movie Ratings: Use past user ratings to predict ratings for unrated movies. This prediction will be used to recommend movies that a user is likely to rate highly.


3. Provide Personalized Movie Recommendations: Create a personalized movie recommendation list for each user, helping them find movies they are most likely to enjoy, based on their preferences and similarities to other users.


4. Improve User Retention and Satisfaction: By offering more relevant and personalized movie recommendations, the system aims to enhance user experience, encouraging longer viewing sessions and higher satisfaction levels.


5. Evaluate Model Performance: Measure the performance of the recommendation system using metrics like Root Mean Square Error (RMSE) or Mean Absolute Error (MAE) for predicted ratings versus actual user ratings.


6. Generate Other User Recommendations: Expand the system’s capabilities to suggest trending or popular movies among users with similar profiles, ensuring a well-rounded recommendation engine.





## Table of Contents
* [Overview](#Overview)
* [Business Problem](#Business_Problem)
* [Data Understanding](#Data_Understanding)
* [Data Exploration](#Data_Exploration)
* [Data Modeling](#Data_Modeling)    
    * [Binary Predictor Modeling](#Binary)
        * [Baseline Models](#Binary_Baseline)
        * [Tuned Models](#Binary_Tuned)
    * [Multiclass Predictor Modeling](#Multiclass)
        * [Baseline Models](#Multiclass_Baseline)
        * [Tuned Models](#Multiclass_Tuned)
    
* [Final Model](#Final_Model)
* [Results & Evaluation](#Results)
* [Recommendations](#Recommendations)
* [Next Steps](#Next_Steps)
* [Contact Us](#Contact)

# **Import libraries**

In [2]:
# Importing the necessary libraries

import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

#  **Loading Data**

In [17]:
df = pd.read_csv("merged_movie_data.csv")
df.head()

Unnamed: 0,userId,movieId,rating,timestamp_x,title,genres,tag,timestamp_y,imdbId,tmdbId
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,,,114709,862.0
1,1,3,4.0,964981247,Grumpier Old Men (1995),Comedy|Romance,,,113228,15602.0
2,1,6,4.0,964982224,Heat (1995),Action|Crime|Thriller,,,113277,949.0
3,1,47,5.0,964983815,Seven (a.k.a. Se7en) (1995),Mystery|Thriller,,,114369,807.0
4,1,50,5.0,964982931,"Usual Suspects, The (1995)",Crime|Mystery|Thriller,,,114814,629.0


In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 102677 entries, 0 to 102676
Data columns (total 10 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   userId       102677 non-null  int64  
 1   movieId      102677 non-null  int64  
 2   rating       102677 non-null  float64
 3   timestamp_x  102677 non-null  int64  
 4   title        102677 non-null  object 
 5   genres       102677 non-null  object 
 6   tag          3476 non-null    object 
 7   timestamp_y  3476 non-null    float64
 8   imdbId       102677 non-null  int64  
 9   tmdbId       102664 non-null  float64
dtypes: float64(3), int64(4), object(3)
memory usage: 7.8+ MB


In [20]:
#Shape of the dataframe
print("The number of rows: {}".format(df.shape[0]))

print("The number of columns:{}".format(df.shape[1]))

The number of rows: 102677
The number of columns:10


In [21]:
df.describe()


Unnamed: 0,userId,movieId,rating,timestamp_x,timestamp_y,imdbId,tmdbId
count,102677.0,102677.0,102677.0,102677.0,3476.0,102677.0,102664.0
mean,327.761933,19742.712623,3.514813,1209495000.0,1323525000.0,356499.4,20476.871289
std,183.211289,35884.40099,1.043133,217011700.0,173155400.0,629571.7,54097.633332
min,1.0,1.0,0.5,828124600.0,1137179000.0,417.0,2.0
25%,177.0,1199.0,3.0,1019138000.0,1138032000.0,99710.0,710.0
50%,328.0,3005.0,3.5,1186590000.0,1279956000.0,118842.0,6950.0
75%,477.0,8366.0,4.0,1439916000.0,1498457000.0,317248.0,11673.0
max,610.0,193609.0,5.0,1537799000.0,1537099000.0,8391976.0,525662.0


In [23]:
# This function will check the datatypes within the dataframe
def check_data_types(dataframe):
    data_types = dataframe.dtypes
    print(data_types)

# Run the function
check_data_types(df)

userId           int64
movieId          int64
rating         float64
timestamp_x      int64
title           object
genres          object
tag             object
timestamp_y    float64
imdbId           int64
tmdbId         float64
dtype: object


##  Data Cleaning

In [24]:
#Checking for null and misssing values
print("There are", df.isnull().values.sum(), "missing values in the dataset")

There are 198415 missing values in the dataset


In [28]:
# Functions for duplicate values

# A function that checks for duplicate values in a column
def count_duplicates(df, column_name):
    duplicate_count = df.duplicated(subset=column_name).sum()
    return duplicate_count



##   Exploratory Data Analysis

## Univariate Analysis

The exploration commences with a detailed univariate analysis, scrutinizing each variable in isolation to gauge its individual characteristics and distribution. This foundational step is critical for establishing a baseline understanding of the dataset's intrinsic properties, essential for informed hypothesis formulation and subsequent multivariate analyses.