## Stress Detection Classifiers

This notebook holds:

* Dummy Classifier
* kNN Classifier (k = 10)
* Naive Bayes Classifier
* Decision Tree Classifier 

The classifiers are created using k-fold cross validation with (k = 10).

In [98]:
import importlib
from tabulate import tabulate
import random

import mysklearn.myutils
importlib.reload(mysklearn.myutils)
import mysklearn.myutils as myutils

import mysklearn.mypytable
importlib.reload(mysklearn.mypytable)
from mysklearn.mypytable import MyPyTable 

import mysklearn.myclassifiers
importlib.reload(mysklearn.myclassifiers)
from mysklearn.myclassifiers import MyKNeighborsClassifier, MyDummyClassifier, MyNaiveBayesClassifier, MyDecisionTreeClassifier

import mysklearn.myevaluation
importlib.reload(mysklearn.myevaluation)
import mysklearn.myevaluation as myevaluation

stress_data = MyPyTable()
stress_data.load_from_file("cleaned_data.csv")

undiscretized_data = MyPyTable()
undiscretized_data.load_from_file("stress_detection.csv")

<mysklearn.mypytable.MyPyTable at 0x7fb4543cc7a0>

## Handling Class Imbalances


### Here we will randomly sample instances of the data to use for the classifier. This will eliminate the class imbalances that the dataset originally contained.

In [99]:
# delete 700 high
# delete 300 low

rows_with_high = [row for row in stress_data.data if row[2] == "high"]
rows_to_drop = random.sample(rows_with_high, k=650)
stress_data.data = [row for row in stress_data.data if row not in rows_to_drop]
rows_with_low = [row for row in stress_data.data if row[2] == "low"]
rows_to_drop = random.sample(rows_with_low, k=275)
stress_data.data = [row for row in stress_data.data if row not in rows_to_drop]

## Summary Statistics of the Stress Detection Data Before Discretization
* We used these summary statistics to decide how to discretize the data.

In [100]:
summary_statistics = undiscretized_data.compute_summary_statistics(undiscretized_data.column_names)
print("Summary Statistics: ")
summary_statistics.pretty_print()

Summary Statistics: 
attribute                  min        max        mid        avg     median
-----------------  -----------  ---------  ---------  ---------  ---------
participant_id      1           100        50.5       50.5       50.5
day                 1            30        15.5       15.5       15.5
PSS_score          10            39        24.5       24.701     25
Openness            1.005         4.9974    3.0012     3.02066    3.05012
Conscientiousness   1.00098       4.99914   3.00006    3.00788    3.02206
Extraversion        1.00058       4.99764   2.99911    3.0021     2.98555
Agreeableness       1.00221       4.99988   3.00104    3.04766    3.09178
Neuroticism         1.00017       4.99641   2.99829    2.96359    2.94095
sleep_time          5.00329       8.99995   7.00162    7.00214    6.97822
wake_time           5.00193       8.99837   7.00015    6.99057    6.98226
sleep_duration      6.00056       8.99906   7.49981    7.47795    7.46342
PSQI_score          1        

## Test Classifier Accuracy Using:
1. Accuracy and error rate
2. Precision, recall, and F1 measure

In [101]:
X = [
    [
        row[15],  # screen_on_time
        row[14],  # num_sms
        row[13],  # num_calls
        row[8]    # sleep_time
    ]
    for row in stress_data.data
]
y = stress_data.get_column("PSS_score")
header = ["screen_on_time", "num_sms", "num_calls", "sleep_time"]
header_map = ["att0", "att1", "att2", "att3"]

knn_avg_acc, knn_error_rate, nb_avg_acc, nb_error_rate, knn_y_actual, knn_y_pred, nb_y_actual, nb_y_pred, knn_binary_ps, nb_binary_ps, knn_recall, nb_recall, knn_f1, nb_f1, dummy_avg_acc, dummy_error_rate, dummy_binary_ps, dummy_recall, dummy_f1, dummy_y_actual, dummy_y_pred = myutils.knn_nb_classifiers(X, y)
tree_avg_acc, tree_error_rate, tree_binary_ps, tree_recall, tree_f1, tree_y_actual, tree_y_pred = myutils.tree_classifier(X, y, header_map)

print(f"10-Fold Cross Validation")
print("________________________")
print()
print(f"Naive Bayes Classifier: accuracy = {nb_avg_acc:.2f}, error rate = {nb_error_rate:.2f}, precision = {nb_binary_ps:.2f}, recall = {nb_recall:.2f}, F1 = {nb_f1:.2f},")
print()
print(f"k Nearest Neighbors Classifier: accuracy = {knn_avg_acc:.2f}, error rate = {knn_error_rate:.2f}, precision = {knn_binary_ps:.2f}, recall = {knn_recall:.2f}, F1 = {knn_f1:.2f},")
print()
print(f"Dummy Classifier: accuracy = {dummy_avg_acc:.2f}, error rate = {dummy_error_rate:.2f}, precision = {dummy_binary_ps:.2f}, recall = {dummy_recall:.2f}, F1 = {dummy_f1:.2f},")
print()
print(f"Decision Tree Classifier: accuracy = {tree_avg_acc:.2f}, error rate = {tree_error_rate:.2f}, precision = {tree_binary_ps:.2f}, recall = {tree_recall:.2f}, F1 = {tree_f1:.2f},")

10-Fold Cross Validation
________________________

Naive Bayes Classifier: accuracy = 0.33, error rate = 0.67, precision = 0.33, recall = 0.48, F1 = 0.39,

k Nearest Neighbors Classifier: accuracy = 0.35, error rate = 0.65, precision = 0.32, recall = 0.32, F1 = 0.32,

Dummy Classifier: accuracy = 0.30, error rate = 0.70, precision = 0.29, recall = 0.26, F1 = 0.28,

Decision Tree Classifier: accuracy = 0.35, error rate = 0.65, precision = 0.32, recall = 0.38, F1 = 0.34,


## Confusion Matrices

In [102]:
print("============================================================")
print("STEP 4: Confusion Matrices")
print("============================================================")
print()
labels = sorted(set(knn_y_actual) | set(knn_y_pred))
labels_strings = list(map(str, labels))
kNN_matrix = myevaluation.confusion_matrix(knn_y_actual, knn_y_pred, labels)
print("kNN Classifier (10-fold Cross Validation Confusion Matrix)")
print()
print("PSS_score")
print(tabulate(kNN_matrix, headers = labels_strings, showindex = labels_strings))
print()
print("------------------------------------------------------------")
print()
print("Dummy Classifier (10-fold Cross Validation Confusion Matrix)")
print()
print("PSS_score")
dummy_matrix = myevaluation.confusion_matrix(dummy_y_actual, dummy_y_pred, labels)
print(tabulate(dummy_matrix, headers = labels_strings, showindex = labels_strings))
print("------------------------------------------------------------")
print()
print("Naive Bayes Classifier (10-fold Cross Validation Confusion Matrix)")
print()
print("PSS_score")
labels = sorted(set(nb_y_actual) | set(nb_y_pred))
labels_strings = list(map(str, labels))
nb_matrix = myevaluation.confusion_matrix(nb_y_actual, nb_y_pred, labels)
print(tabulate(nb_matrix, headers = labels_strings, showindex = labels_strings))
print("------------------------------------------------------------")
print("Decision Tree Classifier (10-fold Cross Validation Confusion Matrix)")
print()
print("PSS_score")
labels = sorted(set(tree_y_actual) | set(tree_y_pred))
labels_strings = list(map(str, labels))
tree_matrix = myevaluation.confusion_matrix(tree_y_actual, tree_y_pred, labels)
print(tabulate(tree_matrix, headers = labels_strings, showindex = labels_strings))
print("------------------------------------------------------------")

STEP 4: Confusion Matrices

kNN Classifier (10-fold Cross Validation Confusion Matrix)

PSS_score
            high    low    moderate
--------  ------  -----  ----------
high         224    222         246
low          231    231         230
moderate     235    195         261

------------------------------------------------------------

Dummy Classifier (10-fold Cross Validation Confusion Matrix)

PSS_score
            high    low    moderate
--------  ------  -----  ----------
high         182    140         370
low          205    130         357
moderate     235    146         310
------------------------------------------------------------

Naive Bayes Classifier (10-fold Cross Validation Confusion Matrix)

PSS_score
            high    low    moderate
--------  ------  -----  ----------
high         333     65         294
low          349     54         289
moderate     340     52         299
------------------------------------------------------------
Decision Tree Classifier (

## Random Forest Classifier

In [103]:
forest_avg_acc, forest_error_rate, forest_binary_ps, forest_recall, forest_f1, forest_y_actual, forest_y_pred = myutils.forest_classifier(X, y)
print()
print(f"Random Forest Classifier: accuracy = {forest_avg_acc:.2f}, error rate = {forest_error_rate:.2f}, precision = {forest_binary_ps:.2f}, recall = {forest_recall:.2f}, F1 = {forest_f1:.2f},")


Random Forest Classifier: accuracy = 0.12, error rate = 0.88, precision = 0.30, recall = 0.02, F1 = 0.04,
