## MLDC Mapping

1. **Problem Definition** → Emotion detection from EEG
2. **Data Collection** → Load EEG CSVs from `dataset/features_raw.csv`
3. **Data Processing** → Missing values + scaling
4. **EDA** → Shape, samples, statistics
5. **Feature Engineering** → Raw EEG channels (baseline)
6. **Model Selection** → Linear + Logistic Regression
7. **Deployment** → Exported to web UI in this project


### Dataset Used (dataset/features_raw.csv)
- 32 EEG channels per row (time samples).
- No real labels included → we generate **synthetic** valence/arousal/dominance.


## 1) Problem Definition
We want to predict emotion dimensions from EEG: **valence**, **arousal**, **dominance**.


## 2) Imports


In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, accuracy_score


## 3) Data Loading
Load the EEG data file with channel columns.


In [2]:
data = pd.read_csv("../dataset/features_raw.csv")
# Drop empty column if it exists
if 'Unnamed: 32' in data.columns:
    data = data.drop(columns=['Unnamed: 32'])
data.head()


Unnamed: 0,Fp1,AF3,F3,F7,FC5,FC1,C3,T7,CP5,CP1,...,FC2,Cz,C4,T8,CP6,CP2,P4,P8,PO4,O2
0,0.057813,-1.335266,4.64048,0.219573,7.473817,2.314842,1.918097,-9.257533,9.089943,-7.104519,...,-30.579542,-2.24148,1.415335,2.406646,12.864059,4.021099,-2.828598,-2.588735,2.637905,-5.226618
1,1.367408,10.259654,3.345409,7.897852,-2.446051,-1.655035,-6.301423,-7.290317,-3.546453,-5.705187,...,-1.290516,-2.568397,-5.651418,-0.09673,-4.930759,-1.722504,-6.111309,0.094893,-3.521353,1.887093
2,-1.783132,4.133553,-0.95168,-1.624803,-1.827309,-2.280364,-2.279225,9.151344,-0.239575,-0.057604,...,11.424923,-2.132823,-0.521117,8.605298,-4.499946,-3.232839,-4.249645,-3.687167,-7.383004,-4.489537
3,-3.690217,-0.814,2.295469,0.901445,8.323679,1.127906,6.356886,11.642082,9.354154,-1.662478,...,-14.721411,-0.506117,-1.154866,-3.940251,7.390881,2.129897,-0.794675,-1.959021,2.77453,-6.32306
4,2.137114,6.420466,6.12223,10.015321,3.106394,3.183129,3.658535,4.571793,4.917712,-2.32594,...,-13.81509,1.813907,-6.444635,-27.68088,0.641364,1.996658,-0.445779,2.614021,6.161845,3.308816


## 4) EDA (Basic Exploration)


In [3]:
data.shape


(8064, 32)

In [4]:
data.describe().loc[["mean", "std"]].head()


Unnamed: 0,Fp1,AF3,F3,F7,FC5,FC1,C3,T7,CP5,CP1,...,FC2,Cz,C4,T8,CP6,CP2,P4,P8,PO4,O2
mean,-0.03011,0.049626,-0.000615,0.012063,-0.072324,-0.005855,-0.051846,0.080661,-0.12337,0.048696,...,0.428362,0.022278,0.005247,0.089107,-0.143788,-0.06325,-0.009944,0.085996,-0.150934,-0.025165
std,4.30387,19.05058,4.949803,19.530056,14.974316,5.223861,14.165469,20.447003,24.640017,12.291358,...,87.646383,3.967493,11.566574,23.461405,37.118809,21.967885,13.279541,12.283903,37.205713,9.35117


## 5) Preprocessing
- Fill missing values
- Scale features for ML


In [5]:
data = data.fillna(data.mean())
X = data.values
scaler = StandardScaler()
X = scaler.fit_transform(X)


## 6) Feature Selection / Creation
We use **raw scaled EEG channels** as baseline features.


## 7) Create Synthetic Valence/Arousal/Dominance Labels
Because real emotion labels are missing, we create **pseudo-labels** using EEG patterns.


In [6]:
# Helper: scale any signal to 1–9 range
def scale_1_9(x):
    return (x - x.min()) / (x.max() - x.min()) * 8 + 1

# Valence: frontal asymmetry (right - left)
valence_raw = (data['F4'] + data['Fp2']) - (data['F3'] + data['Fp1'])

# Arousal: overall absolute activity
arousal_raw = data.abs().mean(axis=1)

# Dominance: central + parietal activity (simple heuristic)
dom_channels = ['C3','C4','P3','P4','Pz']
dominance_raw = data[dom_channels].abs().mean(axis=1)

# Scale to 1–9
y_val = scale_1_9(valence_raw)
y_ar = scale_1_9(arousal_raw)
y_dom = scale_1_9(dominance_raw)

# Binary labels (High vs Low)
y_val_bin = (y_val > y_val.median()).astype(int)
y_ar_bin = (y_ar > y_ar.median()).astype(int)
y_dom_bin = (y_dom > y_dom.median()).astype(int)


## 8) Train/Test Split


In [7]:
# Continuous splits
X_train, X_test, yv_train, yv_test = train_test_split(X, y_val, test_size=0.2, random_state=42)
_, _, ya_train, ya_test = train_test_split(X, y_ar, test_size=0.2, random_state=42)
_, _, yd_train, yd_test = train_test_split(X, y_dom, test_size=0.2, random_state=42)

# Binary splits
X_train_b, X_test_b, yv_train_b, yv_test_b = train_test_split(X, y_val_bin, test_size=0.2, random_state=42)
_, _, ya_train_b, ya_test_b = train_test_split(X, y_ar_bin, test_size=0.2, random_state=42)
_, _, yd_train_b, yd_test_b = train_test_split(X, y_dom_bin, test_size=0.2, random_state=42)


## 9) Linear Regression (Intensity Scores)


In [8]:
lin = LinearRegression()

# Valence
lin.fit(X_train, yv_train)
pred_v = lin.predict(X_test)
print('Valence MSE:', mean_squared_error(yv_test, pred_v))

# Arousal
lin.fit(X_train, ya_train)
pred_a = lin.predict(X_test)
print('Arousal MSE:', mean_squared_error(ya_test, pred_a))

# Dominance
lin.fit(X_train, yd_train)
pred_d = lin.predict(X_test)
print('Dominance MSE:', mean_squared_error(yd_test, pred_d))


Valence MSE: 1.0136165715295246e-30
Arousal MSE: 0.45040063607787345
Dominance MSE: 0.4768492922299268


  return X @ coef_ + self.intercept_
  return X @ coef_ + self.intercept_
  return X @ coef_ + self.intercept_
  return X @ coef_ + self.intercept_
  return X @ coef_ + self.intercept_
  return X @ coef_ + self.intercept_
  return X @ coef_ + self.intercept_
  return X @ coef_ + self.intercept_
  return X @ coef_ + self.intercept_


## 10) Logistic Regression (High vs Low)


In [9]:
log = LogisticRegression(max_iter=1000)

# Valence High/Low
log.fit(X_train_b, yv_train_b)
pred_vb = log.predict(X_test_b)
print('Valence Accuracy:', accuracy_score(yv_test_b, pred_vb))

# Arousal High/Low
log.fit(X_train_b, ya_train_b)
pred_ab = log.predict(X_test_b)
print('Arousal Accuracy:', accuracy_score(ya_test_b, pred_ab))

# Dominance High/Low
log.fit(X_train_b, yd_train_b)
pred_db = log.predict(X_test_b)
print('Dominance Accuracy:', accuracy_score(yd_test_b, pred_db))


Valence Accuracy: 0.9764414135151891
Arousal Accuracy: 0.48915065096094235
Dominance Accuracy: 0.4941103533787973


  raw_prediction = X @ weights + intercept
  raw_prediction = X @ weights + intercept
  raw_prediction = X @ weights + intercept
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
  ret = a @ b
  ret = a @ b
  ret = a @ b
  raw_prediction = X @ weights + intercept
  raw_prediction = X @ weights + intercept
  raw_prediction = X @ weights + intercept
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
  ret = a @ b
  ret = a @ b
  ret = a @ b
  raw_prediction = X @ weights + intercept
  raw_prediction = X @ weights + intercept
  raw_prediction = X @ weights + intercept
  grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights
  grad[:n_features] = X.T @ grad