# 🎬 Task 2: Movie Rating Prediction

### 👤 Dev Jain  
**Internship Domain:** Data Science  
**Company:** CodSoft  
**Task:** Build a machine learning regression model to predict IMDb movie ratings based on features like genre, votes, and runtime.


### 📥 Step 1: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score


### 📊 Step 2: Load and Preview Dataset

In [None]:
df = pd.read_csv("IMDb Movies India.csv")
df.head()

### 🧹 Step 3: Data Cleaning

In [None]:
# Check for missing values
df.isnull().sum()

In [None]:
# Drop rows with missing target or critical data
df = df.dropna(subset=['Rating', 'Runtime', 'Votes'])

# Drop duplicates
df.drop_duplicates(inplace=True)

### 🧠 Step 4: Feature Engineering

In [None]:
# Convert categorical features like 'Genre' to numeric using one-hot encoding
if 'Genre' in df.columns:
    df = pd.get_dummies(df, columns=['Genre'], drop_first=True)

In [None]:
# Select features (you can update this based on actual columns)
X = df[['Runtime', 'Votes']]  # Add more columns if relevant
y = df['Rating']


### 🔀 Step 5: Train-Test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### ✅ Step 6: Model Training (Random Forest)

In [None]:
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

### 📈 Step 7: Model Evaluation

In [None]:
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("MAE:", mae)
print("RMSE:", rmse)
print("R² Score:", r2)


### 📌 Conclusion

The Random Forest model performed well in predicting IMDb ratings based on available features like runtime and votes. Additional feature engineering (e.g., handling genres, actors) may further improve accuracy.
