# Naive Bayes Classifier for Stress Detection and Management

This Jupyter Notebook demonstrates the application of the **Naive Bayes Clssifier** to a dataset that includes physiological and behavioral features such as heart rate, skin conductance, EEG, temperature, pupil diameter, and more. The goal is to predict the **Engagement Level** of individuals, which has various applications, particularly in **stress detection** and **management systems**.

## Naive Bayes Model

Naive Bayes is a simple yet effective probabilistic classification algorithm based on Bayes' Theorem. It assumes that features are conditionally independent given the class label, making it computationally efficient and suitable for high-dimensional data.

## The "Naive" Assumption

A key aspect of Naive Bayes is the assumption that all features are independent of each other within each class. While this assumption is rarely true in real-world scenarios, the algorithm often performs well in practice, especially with text and categorical data.

## Gaussian Naive Bayes:

Assumes continuous data follows a normal (Gaussian) distribution.
Effective for datasets with numerical features.

## Advantages
Computationally efficient for large datasets.
Performs well even with small amounts of training data.
Handles high-dimensional data effectively.
Objectives of This Notebook

Model Training: Train a Naive Bayes model using the dataset to classify engagement levels.

Model Evaluation: Assess the model's accuracy, generate confusion matrices, and present a classification report.

Feature Encoding: Preprocess the dataset by encoding categorical features for compatibility with the model.
The goal is to explore the Naive Bayes algorithm's ability to classify engagement levels, highlighting its efficiency and simplicity while evaluating its predictive performance on the dataset.

In [29]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, accuracy_score

# Load the dataset

In [None]:

data_path = '/Users/software/Desktop/Sem1_stevens/Knowledge_discovery_and_Data_mining/final_project/CS513-A/Naive Bayes/NB_Dataset.csv'
df = pd.read_csv(data_path)

# Display the first few rows of the dataset

In [None]:

print("Dataset preview:")
print(df.head())

Dataset preview:
   HeartRate  SkinConductance        EEG  Temperature  PupilDiameter  \
0         61         8.937204  11.794946    36.501723       3.330181   
1         60        12.635397  19.151412    36.618910       3.428995   
2         81         3.660028   6.226098    36.176898       2.819286   
3        119         0.563070   4.542968    37.205293       2.192961   
4        118         0.477378   0.996209    37.248118       2.450139   

   SmileIntensity  FrownIntensity  CortisolLevel  ActivityLevel  \
0        0.689238        0.189024       0.603035            136   
1        0.561056        0.091367       0.566671            155   
2        0.417951        0.227355       1.422475             55   
3        0.140186        0.502965       1.669045             39   
4        0.064471        0.695604       1.854076             10   

   AmbientNoiseLevel  LightingLevel     EmotionalState  ES_disengaged  \
0                 59            394            engaged              0   
1

# Define the target column and features

In [None]:

target_column = 'EngagementLevel'

# Encode categorical variables

In [None]:

label_encoders = {}
for column in df.columns:
    if df[column].dtype == 'object' or column == target_column:
        le = LabelEncoder()
        df[column] = le.fit_transform(df[column])
        label_encoders[column] = le

# Split the dataset into features (X) and target (y)

In [None]:

X = df.drop(columns=[target_column])
y = df[target_column]

# Split into training and testing sets

In [None]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Naive Bayes classifier

In [None]:

nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)

# Make predictions

In [None]:

y_pred = nb_classifier.predict(X_test)


# Evaluate the model

In [None]:

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:")
print(classification_report(y_test, y_pred))

Accuracy: 0.57
Classification Report:
              precision    recall  f1-score   support

           0       0.52      0.71      0.60        91
           1       0.65      0.45      0.53       109

    accuracy                           0.57       200
   macro avg       0.59      0.58      0.57       200
weighted avg       0.59      0.57      0.56       200



# Example: Decoding predictions back to original labels if needed

In [None]:

if target_column in label_encoders:
    original_labels = label_encoders[target_column].inverse_transform(y_pred)
    print("Decoded Predictions:", original_labels)

Decoded Predictions: [1 1 1 2 1 2 2 2 1 1 1 1 1 1 2 1 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1
 1 1 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 1 1 2 2 2 2 2 1 2 1 1
 1 1 1 2 2 2 2 2 2 2 2 1 2 2 1 2 2 2 2 1 1 1 2 1 1 1 1 1 2 1 1 1 2 1 1 1 1
 1 1 1 1 1 2 2 2 2 1 2 1 1 1 1 2 1 1 2 1 1 2 1 1 2 1 1 1 2 2 1 1 1 2 1 1 2
 1 1 2 2 1 1 1 1 1 2 1 2 2 1 2 2 1 1 1 2 2 1 1 1 2 1 1 2 1 2 1 1 2 1 1 2 1
 1 1 2 1 2 2 1 1 1 2 1 1 2 2 2]


## Conclusion:
The K-Nearest Neighbors (KNN) classifier achieved an accuracy of 57% on the test set, indicating that the model correctly predicted the engagement level in 57% of cases. However, a deeper analysis of the classification report reveals the following insights:

### Class 0 Performance:
- **Precision**: 52%, indicating that when the model predicts class 0, it is correct 52% of the time.
- **Recall**: 71%, showing that the model successfully identifies 71% of the actual class 0 instances.
- **F1-Score**: 60%, reflecting a better balance between precision and recall for this class.

### Class 1 Performance:
- **Precision**: 65%, suggesting that when the model predicts class 1, it is correct 65% of the time.
- **Recall**: 45%, highlighting that the model identifies only 45% of the actual class 1 instances.
- **F1-Score**: 53%, indicating a moderate balance between precision and recall.
