TITLE : Movie Recommendation System
  *   AUTHOR: Arman Shaikh
  *   DOMAIN: DATA SCIENCE
  *   AIM   : To build a model to predict the rating of a movie and estimte the rating accurately

**Install and Import Libraries**
* Install the Surprise library , and then import the necessary Python libraries:

In [1]:
pip install scikit-surprise

Collecting scikit-surprise
  Downloading scikit_surprise-1.1.4.tar.gz (154 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/154.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m153.6/154.4 kB[0m [31m6.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.4/154.4 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (pyproject.toml) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.4-cp310-cp310-linux_x86_64.whl size=2357280 sha256=b560886bb22a995124321470fa8e981217d852fa03a64e3b9f3db0f04485afaf
  Stored in directory: /root/.cache/pip/wheels/4b/3f/df/6acbf0a

In [2]:
import pandas as pd
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy


 **Load and Prepare the Dataset**
 * Load the dataset using pandas and prepare it for the Surprise library:

In [7]:
# Load the dataset
movie = pd.read_csv('/content/IMDb Movies India.csv', encoding='latin-1')

# Display the first few rows of the dataset
print(movie.head())

# Display basic information about the dataset
print(movie.info())


                                 Name    Year Duration            Genre  \
0                                         NaN      NaN            Drama   
1  #Gadhvi (He thought he was Gandhi)  (2019)  109 min            Drama   
2                         #Homecoming  (2021)   90 min   Drama, Musical   
3                             #Yaaram  (2019)  110 min  Comedy, Romance   
4                   ...And Once Again  (2010)  105 min            Drama   

   Rating Votes            Director       Actor 1             Actor 2  \
0     NaN   NaN       J.S. Randhawa      Manmauji              Birbal   
1     7.0     8       Gaurav Bakshi  Rasika Dugal      Vivek Ghamande   
2     NaN   NaN  Soumyajit Majumdar  Sayani Gupta   Plabita Borthakur   
3     4.4    35          Ovais Khan       Prateik          Ishita Raj   
4     NaN   NaN        Amol Palekar  Rajat Kapoor  Rituparna Sengupta   

           Actor 3  
0  Rajendra Bhatia  
1    Arvind Jangid  
2       Roy Angana  
3  Siddhant Kapoor  
4    

In [18]:
# Display the first few rows of the dataset
print(movie.head())

                                 Name    Year Duration            Genre  \
0                                         NaN      NaN            Drama   
1  #Gadhvi (He thought he was Gandhi)  (2019)  109 min            Drama   
2                         #Homecoming  (2021)   90 min   Drama, Musical   
3                             #Yaaram  (2019)  110 min  Comedy, Romance   
4                   ...And Once Again  (2010)  105 min            Drama   

   Rating Votes            Director       Actor 1             Actor 2  \
0     NaN   NaN       J.S. Randhawa      Manmauji              Birbal   
1     7.0     8       Gaurav Bakshi  Rasika Dugal      Vivek Ghamande   
2     NaN   NaN  Soumyajit Majumdar  Sayani Gupta   Plabita Borthakur   
3     4.4    35          Ovais Khan       Prateik          Ishita Raj   
4     NaN   NaN        Amol Palekar  Rajat Kapoor  Rituparna Sengupta   

           Actor 3  
0  Rajendra Bhatia  
1    Arvind Jangid  
2       Roy Angana  
3  Siddhant Kapoor  
4    

**Prepare the Data for the Surprise Library**
* Convert the data into the format required by the Surprise library

In [11]:
# Define the format for reading the data
reader = Reader(rating_scale=(1, 5))  # Assuming ratings are on a scale of 1 to 5

# Load the dataset into Surprise's data structure
data = Dataset.load_from_df(movie[['Name', 'Votes', 'Rating']], reader)


 **Split the Dataset**
* Split the dataset into training and testing sets

In [12]:
# Split the data into training and test sets
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)


**Train the Model**
* Use the Singular Value Decomposition (SVD) algorithm to train the model. SVD is a popular collaborative filtering algorithm for rating prediction

In [13]:
# Initialize the SVD algorithm
model = SVD()

# Train the model on the training set
model.fit(trainset)


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f600f55d4e0>

**Evaluate the Model**
* Evaluate the model’s performance using the test set:

In [14]:
# Make predictions on the test set
predictions = model.test(testset)

# Calculate the Root Mean Squared Error (RMSE)
rmse = accuracy.rmse(predictions)

print(f"Root Mean Squared Error: {rmse}")


RMSE: nan
Root Mean Squared Error: nan


**Make Movie Rating Predictions**
* To predict the rating that a user would give to a specific movie:

In [15]:
# Example: Predict the rating for a specific user and movie
user_id = 1
movie_id = 10

predicted_rating = model.predict(user_id, movie_id)

print(f"Predicted Rating for User {user_id} and Movie {movie_id}: {predicted_rating.est}")


Predicted Rating for User 1 and Movie 10: 5


#Explanation
* Install and Import Libraries: We install the Surprise library and import necessary libraries for data handling and model creation.

* Load and Prepare the Dataset: Load the dataset using pandas and examine its structure. This dataset contains user ratings for various movies.

* Prepare the Data for the Surprise Library: Convert the data into a format that the Surprise library can use. This involves specifying the rating scale and loading the data into the Surprise data structure.

* Split the Dataset: Split the dataset into training and test sets. This helps evaluate the model's performance on unseen data.

* Train the Model: We use the Singular Value Decomposition (SVD) algorithm to train the model on the training set. SVD is effective for collaborative filtering tasks.

* Evaluate the Model: After training, we evaluate the model using the test set. We use the Root Mean Squared Error (RMSE) to measure how well the predicted ratings match the actual ratings.

* Make Movie Rating Predictions: The predict() function is used to estimate the rating a user would give to a particular movie.
