# Movie Rating Prediction

---

## Author Information

- **Author**: Rahul Kumar
- **Batch**: March - April
- **Domain**: Data Science

---

## Task Information

- **Task**: Movie Rating Prediction
- **Description**: Predict the ratings of movies using the IMDb dataset.

---

## Introduction

Movie rating prediction is a task aimed at developing a model to predict the ratings of movies based on various features such as genre, director, and actors. This project utilizes the IMDb dataset to train machine learning models for predicting movie ratings.

---

In [5]:
# Importing necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error

# Loading the dataset
df = pd.read_csv("IMDb Movies India.csv", encoding='latin1')
df.head()

Unnamed: 0,Name,Year,Duration,Genre,Rating,Votes,Director,Actor 1,Actor 2,Actor 3
0,,,,Drama,,,J.S. Randhawa,Manmauji,Birbal,Rajendra Bhatia
1,#Gadhvi (He thought he was Gandhi),(2019),109 min,Drama,7.0,8.0,Gaurav Bakshi,Rasika Dugal,Vivek Ghamande,Arvind Jangid
2,#Homecoming,(2021),90 min,"Drama, Musical",,,Soumyajit Majumdar,Sayani Gupta,Plabita Borthakur,Roy Angana
3,#Yaaram,(2019),110 min,"Comedy, Romance",4.4,35.0,Ovais Khan,Prateik,Ishita Raj,Siddhant Kapoor
4,...And Once Again,(2010),105 min,Drama,,,Amol Palekar,Rajat Kapoor,Rituparna Sengupta,Antara Mali


# Data Preprocessing


In [6]:
# Checking for null values
df.isnull().sum()

# Dropping rows with null values
df.dropna(inplace=True)

# Encoding categorical features
lb = LabelEncoder()
df['Genre'] = lb.fit_transform(df['Genre'])
df['Director'] = lb.fit_transform(df['Director'])
df['Actor 1'] = lb.fit_transform(df['Actor 1'])
df['Actor 2'] = lb.fit_transform(df['Actor 2'])
df['Actor 3'] = lb.fit_transform(df['Actor 3'])

# Separating features (X) and target variable (y)
X = df.drop(['Name', 'Rating', 'Year', 'Duration', 'Votes'], axis='columns')
y = df['Rating']


# Train Test Split


In [7]:
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Hyperparameter Tuning and Best Model Selection


In [8]:
# Model parameters
model_params = {
    'svm': {
        'model': SVR(),
        'params': {'C': [1, 10, 20], 'kernel': ['linear', 'rbf']}
    },
    'Random_forest': {
        'model': RandomForestRegressor(),
        'params': {'n_estimators': [1, 5, 10, 20, 40]}
    },
    'K_neighbors': {
        'model': KNeighborsRegressor(),
        'params': {'n_neighbors': [5, 10, 15]}
    },
    'Decision_tree': {
        'model': DecisionTreeRegressor(),
        'params': {'criterion': ['mse', 'mae']}
    }
}


# Model Training and Evaluation


In [None]:
scores = []

for model_name, mp in model_params.items():
    clf = GridSearchCV(mp['model'], mp['params'], cv=3, scoring='neg_mean_squared_error')
    clf.fit(X_train, y_train)

    y_pred = clf.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)

    scores.append({
        'model': model_name,
        'mse': mse,
        'best_params': clf.best_params_
    })

df_score = pd.DataFrame(scores, columns=['model', 'mse', 'best_params'])
df_score
