## 🧠 EEG-Based Game Rating Classification

This notebook focuses on predicting age-based game ratings using features derived from EEG signals. Two rating systems are targeted:

- **PEGI** (Pan-European Game Information): Numeric label (3, 7, 12, 16, 18)
- **ESRB** (Entertainment Software Rating Board): Categorical label (e.g., E, T, M)

The task involves handling missing data, selecting the most relevant EEG features, and building classification models for both targets.


In [None]:
# === Import Libraries ===
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.impute import KNNImputer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.feature_selection import SelectFromModel
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## 🔍 Data Exploration

The dataset contains 2268 EEG features extracted from gameplay. Several features contain missing values. We inspect the structure and statistics of both the EEG features and the target variables.


In [None]:
# === Load Dataset ===
print("Loading EEG dataset...")
df = pd.read_excel("04-EEG-Based Game Rating Classification (PEGI & ESRB).xlsx")
print(f"Initial shape: {df.shape}")

# === Check for missing values ===
missing_info = df.isnull().sum()
missing_cols = missing_info[missing_info > 0]
print("\nColumns with missing values:\n", missing_cols)

# === Separate Features & Targets ===
X = df.drop(columns=["PEGI", "ESRB"])
y_pegi = df["PEGI"]
y_esrb = df["ESRB"]

## 🧼 Preprocessing and Missing Value Handling

Since dropping rows/columns is restricted, we use **KNN imputation** to estimate missing values based on feature similarity. This preserves data integrity and avoids information loss.


In [None]:
# === Impute Missing Values ===
print("\nImputing missing values using KNNImputer (k=5)...")
imputer = KNNImputer(n_neighbors=5)
X_imputed = pd.DataFrame(imputer.fit_transform(X), columns=X.columns)

## ✂️ Feature Selection

To reduce complexity, we use a Gradient Boosting model to rank feature importances. Only the most relevant features (above median importance) are retained for training.


In [None]:
# === Feature Selection via GradientBoostingClassifier ===
print("Performing feature selection based on importance...")
selector_model = GradientBoostingClassifier(random_state=13)
selector_model.fit(X_imputed, y_pegi)
feature_selector = SelectFromModel(selector_model, prefit=True, threshold="median")
X_selected = feature_selector.transform(X_imputed)
print(f"Selected features shape: {X_selected.shape}")

## 🧪 Model Training and Evaluation

We use **Gradient Boosting Classifier** for both PEGI and ESRB prediction. Accuracy is used as the primary evaluation metric, with 70/30 train-test splits and stratified sampling to preserve class balance.


In [None]:
# === Encode ESRB target (PEGI is already numeric) ===
encoder_esrb = LabelEncoder()
y_esrb_encoded = encoder_esrb.fit_transform(y_esrb)

# === Train-test splits ===
X_train_p, X_test_p, y_train_p, y_test_p = train_test_split(
    X_selected, y_pegi, test_size=0.3, random_state=13, stratify=y_pegi)

X_train_e, X_test_e, y_train_e, y_test_e = train_test_split(
    X_selected, y_esrb_encoded, test_size=0.3, random_state=13, stratify=y_esrb_encoded)

# === Train and evaluate function ===
def evaluate_model(X_train, X_test, y_train, y_test, target_name):
    print(f"\nTraining GradientBoostingClassifier for {target_name}...")
    model = GradientBoostingClassifier(random_state=13)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print(f"{target_name} Accuracy: {acc:.4f}")
    return acc

# === Run both tasks ===
acc_pegi = evaluate_model(X_train_p, X_test_p, y_train_p, y_test_p, "PEGI")
acc_esrb = evaluate_model(X_train_e, X_test_e, y_train_e, y_test_e, "ESRB")

## 🧮 Final Scoring

As required, we multiply the individual classification accuracies to obtain a final score:

**Final Score = Accuracy(PEGI) × Accuracy(ESRB)**


In [None]:
final_score = acc_pegi * acc_esrb
print(f"\n🎯 Final Score (PEGI × ESRB): {final_score:.4f}")

## 📈 Accuracy Visualization

The chart below summarizes the classification accuracy of both targets using a single machine learning model.


In [None]:
# === Visualize Results ===
fig, ax = plt.subplots(figsize=(6, 4))
bars = ax.bar(["PEGI", "ESRB"], [acc_pegi, acc_esrb], color=["royalblue", "coral"])
ax.set_ylim(0, 1)
ax.set_ylabel("Accuracy")
ax.set_title("📊 Classification Accuracy for PEGI and ESRB")
for bar in bars:
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
            f"{bar.get_height():.2f}", ha='center')
plt.grid(True, linestyle='--', alpha=0.5, axis='y')
plt.tight_layout()
plt.show()

Loading EEG dataset...
Initial shape: (900, 2270)

Columns with missing values:
 MAX_200_POW.O2.Theta    18
MIN_200_POW.O2.Theta    41
MA_200_POW.O2.Theta     12
dtype: int64

Imputing missing values using KNNImputer (k=5)...
Performing feature selection based on importance...
