<a href="https://colab.research.google.com/github/ShauryaDamathia/Video_Games_Sales/blob/main/Video_Games_Sales.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Overview**

We’re using the Video Game Sales dataset to build a Ridge Regression model that predicts a game's Global Sales based on features like genre, platform, critic/user scores, etc. We preprocess and engineer features, train the model, and evaluate its accuracy using R² score and Mean Squared Error (MSE).

# **Importing Libraries**

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score

# **Loading the Dataset**

In [4]:
df = pd.read_csv("Video_Games_Sales_as_at_22_Dec_2016.csv")
print(df.head())

                       Name Platform  Year_of_Release         Genre Publisher  \
0                Wii Sports      Wii           2006.0        Sports  Nintendo   
1         Super Mario Bros.      NES           1985.0      Platform  Nintendo   
2            Mario Kart Wii      Wii           2008.0        Racing  Nintendo   
3         Wii Sports Resort      Wii           2009.0        Sports  Nintendo   
4  Pokemon Red/Pokemon Blue       GB           1996.0  Role-Playing  Nintendo   

   NA_Sales  EU_Sales  JP_Sales  Other_Sales  Global_Sales  Critic_Score  \
0     41.36     28.96      3.77         8.45         82.53          76.0   
1     29.08      3.58      6.81         0.77         40.24           NaN   
2     15.68     12.76      3.79         3.29         35.52          82.0   
3     15.61     10.93      3.28         2.95         32.77          80.0   
4     11.27      8.89     10.22         1.00         31.37           NaN   

   Critic_Count User_Score  User_Count Developer Rating 

# **Dropping Irrelevant Columns**

In [5]:
df = df.drop(['Name', 'Publisher', 'Developer'], axis=1)

# **Handling Missing and Corrupted Values**

In [6]:
df['User_Score'] = df['User_Score'].replace('tbd', np.nan)
df['User_Score'] = pd.to_numeric(df['User_Score'], errors='coerce')

df = df.dropna(subset=['Global_Sales'])

df['Critic_Score'] = df['Critic_Score'].fillna(df['Critic_Score'].mean())
df['Critic_Count'] = df['Critic_Count'].fillna(df['Critic_Count'].mean())
df['User_Score'] = df['User_Score'].fillna(df['User_Score'].mean())
df['User_Count'] = df['User_Count'].fillna(df['User_Count'].mean())
df['Year_of_Release'] = df['Year_of_Release'].fillna(df['Year_of_Release'].mode()[0])
df['Rating'] = df['Rating'].fillna('Unknown')

# **Feature Engineering**

In [7]:
df['Game_Age'] = 2025 - df['Year_of_Release']

# **Encoding Categorical Variables**

In [8]:
df = pd.get_dummies(df, columns=['Platform', 'Genre', 'Rating'], drop_first=True)

# **Preparing Features and Target**

In [9]:
X = df.drop(['Global_Sales'], axis=1)
y = df['Global_Sales']

# **Feature Scaling**

In [10]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# **Train-Test Split**

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# **Applying Ridge Regression**

In [12]:
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)

# **Model Prediction**

In [13]:
y_pred = ridge_model.predict(X_test)

# **Evaluation Metrics**

In [14]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Ridge Regression Results:")
print(f"Mean Squared Error: {mse}")
print(f"R² Score: {r2}")

Ridge Regression Results:
Mean Squared Error: 2.777418361225939e-05
R² Score: 0.999993271684629
