<a href="https://colab.research.google.com/github/Nandini-Myakala/Nandini/blob/main/Movie_Rating_Prediction_with_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install pandas scikit-learn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OrdinalEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
import pandas as pd



In [None]:
# Load dataset
df = pd.read_csv("/content/IMDb Movies India.csv", encoding="ISO-8859-1")

In [None]:
# Convert 'Duration' column to numeric, removing 'min' and handling errors
df['Duration'] = pd.to_numeric(df['Duration'].str.replace(' min', ''), errors='coerce')

In [None]:
# Fill missing durations with the median
df["Duration"].fillna(df["Duration"].median(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["Duration"].fillna(df["Duration"].median(), inplace=True)


In [None]:
# Convert 'Year' column to numeric, removing parentheses and handling errors
df['Year'] = pd.to_numeric(df['Year'].str.replace(r'[()]', '', regex=True), errors='coerce')

In [None]:
# Fill missing values for Year with the median
df['Year'].fillna(df['Year'].median(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Year'].fillna(df['Year'].median(), inplace=True)


In [None]:
# Convert 'Votes' column to numeric, removing commas and handling errors
df['Votes'] = pd.to_numeric(df['Votes'].str.replace(',', ''), errors='coerce')

In [None]:
# Fill missing values for Votes with the median
df['Votes'].fillna(df['Votes'].median(), inplace=True)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Votes'].fillna(df['Votes'].median(), inplace=True)


In [None]:
# Fill categorical columns with "Unknown"
categorical_cols = ["Genre", "Director", "Actor 1", "Actor 2", "Actor 3"]
df[categorical_cols] = df[categorical_cols].fillna("Unknown")

In [None]:
# Use Ordinal Encoding to reduce memory usage
encoder = OrdinalEncoder()
df[categorical_cols] = encoder.fit_transform(df[categorical_cols])


In [None]:
# Define feature set
numerical_cols = ["Year", "Duration", "Votes"]
X = df[numerical_cols + categorical_cols]
y = df["Rating"]


In [None]:
# Remove rows with missing target values
X = X[y.notna()]
y = y[y.notna()]

In [None]:
# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Train a RandomForestRegressor model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)


In [None]:
# Predict and evaluate
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error: {rmse}")

Root Mean Squared Error: 1.097585802310846
