# Assignment 1: Classification

### <span style='background :yellow' > Student Information </span>

**Name     : ALIA MARLIANA BINTI SHAIFUL BAHARI**

Depression is a common but serious mood disorder affecting an estimated of 121 million people worldwide. Emotion can possibly be an indicator of depression. The main objective of this assignment is to recommend the best classification model that can determine if an individual is showing signs of depression given the individual’s emotion pattern in the duration of two weeks. We have worked together to collect a data set containing self-report of the emotions you experienced in a period of two weeks. Each instance includes an individual’s gender and emotion pattern (based on 8 emotion categories: joy, sadness, anger, disgust, fear, surprise, contempt and neutral) and also a class label: YES (showing signs of depression) and NO (not showing signs of depression). 

# <span style='background:cyan'>For Freq-PHO-Binary Dataset</span> 

Total count of each emotion expressed by an individual in a period of 2 weeks. The total count of emotions differs for each individual depending on the number of emotions recorded in a day. An individual who recorded more than one emotion a day would produce a higher frequency number compared to an individual who only recorded one emotion a day. 

## Prepare and Explore Data

In [1]:
# Import pandas library
import pandas as pd

# Read csv data file
freq = pd.read_csv('Freq-PHO-Binary.csv')

In [2]:
# Find out the number of instances and number of attributes
freq.shape

(391, 10)

In [3]:
freq.isnull().sum().sum()

0

In [4]:
# View the first 5 rows
freq.head()

Unnamed: 0,Gender,Emotion_Joy,Emotion_Sadness,Emotion_Anger,Emotion_Disgust,Emotion_Fear,Emotion_Surprise,Emotion_Contempt,Emotion_Neutral,Depression
0,Female,4,3,2,1,0,2,2,1,NO
1,Female,8,0,2,0,1,0,0,4,NO
2,Male,5,0,0,0,14,2,0,15,NO
3,Male,7,0,3,0,0,5,0,0,NO
4,Male,3,2,1,0,2,1,0,6,YES


In [5]:
freq.dtypes

Gender              object
Emotion_Joy          int64
Emotion_Sadness      int64
Emotion_Anger        int64
Emotion_Disgust      int64
Emotion_Fear         int64
Emotion_Surprise     int64
Emotion_Contempt     int64
Emotion_Neutral      int64
Depression          object
dtype: object

In [6]:
print(freq['Depression'].value_counts())

Depression
NO     218
YES    173
Name: count, dtype: int64


In [7]:
ratio = freq['Depression'].value_counts() / freq.shape[0]

print(ratio)

Depression
NO     0.557545
YES    0.442455
Name: count, dtype: float64


In [2]:
# Import LabelEncoder
from sklearn import preprocessing

# Create LabelEncoder
le = preprocessing.LabelEncoder()

freq['Gender'] = le.fit_transform(freq['Gender'])

freq.head()

Unnamed: 0,Gender,Emotion_Joy,Emotion_Sadness,Emotion_Anger,Emotion_Disgust,Emotion_Fear,Emotion_Surprise,Emotion_Contempt,Emotion_Neutral,Depression
0,0,4,3,2,1,0,2,2,1,NO
1,0,8,0,2,0,1,0,0,4,NO
2,1,5,0,0,0,14,2,0,15,NO
3,1,7,0,3,0,0,5,0,0,NO
4,1,3,2,1,0,2,1,0,6,YES


In [3]:
# Indicate the target column
target = freq['Depression']

# Indicate the columns that will serve as features
features = freq.drop('Depression', axis = 1)

## Dummy Classifier

In [62]:
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_val_predict

# Initialize the Dummy Classifier
dummy_clf = DummyClassifier(strategy="uniform")

# Perform cross-validation and get predictions for each fold
dummy_val_predictions = cross_val_predict(dummy_clf, features, target, cv=10)

In [63]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy = accuracy_score(target, dummy_val_predictions)
precision_weighted = precision_score(target, dummy_val_predictions, average='weighted')
recall_weighted = recall_score(target, dummy_val_predictions, average='weighted')
f1_weighted = f1_score(target, dummy_val_predictions, average='weighted')

# Print the weighted average evaluation metrics
print('Validation Accuracy for Dummy Classifier =', accuracy)
print('Validation Precision (Weighted Avg) for Dummy Classifier =', precision_weighted)
print('Validation Recall (Weighted Avg) for Dummy Classifier =', recall_weighted)
print('Validation F1 (Weighted Avg) for Dummy Classifier =', f1_weighted)

Validation Accuracy for Dummy Classifier = 0.4859335038363171
Validation Precision (Weighted Avg) for Dummy Classifier = 0.491552364336602
Validation Recall (Weighted Avg) for Dummy Classifier = 0.4859335038363171
Validation F1 (Weighted Avg) for Dummy Classifier = 0.4876019697152623


## KNN Classifier

In [10]:
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

knn = KNeighborsClassifier()

knn_param = {
    'weights': ['uniform', 'distance'], 
    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
    'p': [1,2,3],
    'leaf_size': list(range(1,50))
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted'),
    'weighted_recall': make_scorer(recall_score, average='weighted'),
    'weighted_f1': make_scorer(f1_score, average='weighted')
}

grid_search = GridSearchCV(knn, knn_param, cv=10, scoring=scoring, refit='accuracy')

In [11]:
grid_search.fit(features, target)

In [12]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for KNN Classifier =", best_params)
print("Best Accuracy Score for KNN Classifier =", best_accuracy)
print("Best Weighted Precision Score for KNN Classifier =", best_weighted_precision)
print("Best Weighted Recall Score for KNN Classifier =", best_weighted_recall)
print("Best Weighted F1 Score for KNN Classifier =", best_weighted_f1)

Best parameter for KNN Classifier = {'algorithm': 'auto', 'leaf_size': 1, 'p': 2, 'weights': 'uniform'}
Best Accuracy Score for KNN Classifier = 0.5755128205128206
Best Weighted Precision Score for KNN Classifier = 0.5718320213749701
Best Weighted Recall Score for KNN Classifier = 0.5755128205128206
Best Weighted F1 Score for KNN Classifier = 0.5675465119987295


In [13]:
knn = KNeighborsClassifier(grid_search.best_estimator_)

knn_param = {
    'n_neighbors': list(range(1,40))
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),    
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(knn, knn_param, cv=10, scoring=scoring, refit='accuracy')

In [14]:
grid_search.fit(features, target)

In [15]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for KNN Classifier =", best_params)
print("Best Accuracy Score for KNN Classifier =", best_accuracy)
print("Best Weighted Precision Score for KNN Classifier =", best_weighted_precision)
print("Best Weighted Recall Score for KNN Classifier =", best_weighted_recall)
print("Best Weighted F1 Score for KNN Classifier =", best_weighted_f1)

Best parameter for KNN Classifier = {'n_neighbors': 39}
Best Accuracy Score for KNN Classifier = 0.6012179487179486
Best Weighted Precision Score for KNN Classifier = 0.6051822059967863
Best Weighted Recall Score for KNN Classifier = 0.6012179487179486
Best Weighted F1 Score for KNN Classifier = 0.5720967818437168


## Decision Tree Classifier

In [7]:
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

dt = DecisionTreeClassifier(random_state=0)

dt_param = {
     'max_depth': [3, 5, 7, 10, 15, None],
     'min_samples_leaf': [1, 3, 5, 10, 15, 20],
     'min_samples_split': [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
     'criterion': ['gini', 'entropy', 'log_loss'],
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted'),
    'weighted_recall': make_scorer(recall_score, average='weighted'),
    'weighted_f1': make_scorer(f1_score, average='weighted')
    
}

grid_search = GridSearchCV(dt, dt_param, cv=10, scoring=scoring, refit='accuracy')

In [8]:
grid_search.fit(features, target)

In [9]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Decision Tree Classifier =", best_params)
print("Best Accuracy Score for Decision Tree Classifier =", best_accuracy)
print("Best Weighted Precision Score for Decision Tree Classifier =", best_weighted_precision)
print("Best Weighted Recall Score for Decision Tree Classifier =", best_weighted_recall)
print("Best Weighted F1 Score for Decision Tree Classifier =", best_weighted_f1)

Best parameter for Decision Tree Classifier = {'criterion': 'entropy', 'max_depth': 15, 'min_samples_leaf': 1, 'min_samples_split': 2}
Best Accuracy Score for Decision Tree Classifier = 0.6035897435897436
Best Weighted Precision Score for Decision Tree Classifier = 0.6030396053688901
Best Weighted Recall Score for Decision Tree Classifier = 0.6035897435897436
Best Weighted F1 Score for Decision Tree Classifier = 0.5991395461648829


## Naive Bayes Classifier

### Gaussian Naive Bayes

In [73]:
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

gnb = GaussianNB()

gnb_param = {
    'var_smoothing': [1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4]  
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted'),
    'weighted_recall': make_scorer(recall_score, average='weighted'),
    'weighted_f1': make_scorer(f1_score, average='weighted')
    
}

grid_search = GridSearchCV(gnb, gnb_param, cv=10, scoring=scoring, refit='accuracy')

In [74]:
grid_search.fit(features, target)

In [75]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Gaussian Naive Bayes =", best_params)
print("Best Accuracy Score for Gaussian Naive Bayes =", best_accuracy)
print("Best Weighted Precision Score for Gaussian Naive Bayes =", best_weighted_precision)
print("Best Weighted Recall Score for Gaussian Naive Bayes =", best_weighted_recall)
print("Best Weighted F1 Score for Gaussian Naive Bayes =", best_weighted_f1)

Best parameter for Gaussian Naive Bayes = {'var_smoothing': 1e-09}
Best Accuracy Score for Gaussian Naive Bayes = 0.5933333333333335
Best Weighted Precision Score for Gaussian Naive Bayes = 0.6298916290637484
Best Weighted Recall Score for Gaussian Naive Bayes = 0.5933333333333335
Best Weighted F1 Score for Gaussian Naive Bayes = 0.5245600411982516


### Bernoulli Naive Bayes

In [76]:
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

bnb = BernoulliNB()

bnb_param = {
    'binarize': [0.0, 0.5, 1.0],  
    'alpha': [0.5, 1.0, 2.0], 
    'force_alpha': [True,False],
    'fit_prior': [True,False]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0),

}

grid_search = GridSearchCV(bnb, bnb_param, cv=10, scoring=scoring, refit='accuracy')

In [77]:
grid_search.fit(features, target)

In [78]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Bernoulli Naive Bayes =", best_params)
print("Best Accuracy Score for Bernoulli Naive Bayes =", best_accuracy)
print("Best Weighted Precision Score for Bernoulli Naive Bayes =", best_weighted_precision)
print("Best Weighted Recall Score for Bernoulli Naive Bayes =", best_weighted_recall)
print("Best Weighted F1 Score for Bernoulli Naive Bayes =", best_weighted_f1)

Best parameter for Bernoulli Naive Bayes = {'alpha': 0.5, 'binarize': 0.0, 'fit_prior': True, 'force_alpha': True}
Best Accuracy Score for Bernoulli Naive Bayes = 0.5932692307692309
Best Weighted Precision Score for Bernoulli Naive Bayes = 0.5975401250372647
Best Weighted Recall Score for Bernoulli Naive Bayes = 0.5932692307692309
Best Weighted F1 Score for Bernoulli Naive Bayes = 0.5896412702055163


### Multinomial Naive Bayes

In [85]:
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

mnb = MultinomialNB()

mnb_param = {
    'alpha': [0.5, 1.0, 2.0, 3.0], 
    'force_alpha': [True,False],
    'fit_prior': [True,False]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)

}

grid_search = GridSearchCV(mnb, mnb_param, cv=10, scoring=scoring, refit='accuracy')

In [86]:
grid_search.fit(features, target)

In [87]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Multinomial Naive Bayes =", best_params)
print("Best Accuracy Score for Multinomial Naive Bayes =", best_accuracy)
print("Best Weighted Precision Score for Multinomial Naive Bayes =", best_weighted_precision)
print("Best Weighted Recall Score for Multinomial Naive Bayes =", best_weighted_recall)
print("Best Weighted F1 Score for Multinomial Naive Bayes =", best_weighted_f1)

Best parameter for Multinomial Naive Bayes = {'alpha': 2.0, 'fit_prior': True, 'force_alpha': True}
Best Accuracy Score for Multinomial Naive Bayes = 0.6240384615384615
Best Weighted Precision Score for Multinomial Naive Bayes = 0.6603815301728332
Best Weighted Recall Score for Multinomial Naive Bayes = 0.6240384615384615
Best Weighted F1 Score for Multinomial Naive Bayes = 0.611185508323137


## SVM Classifier

### Linear Kernel SVM

In [89]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

linear = SVC(kernel = 'linear')

linear_param = {
     'C': [1, 10, 100, 1000]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted'),
    'weighted_recall': make_scorer(recall_score, average='weighted'),
    'weighted_f1': make_scorer(f1_score, average='weighted')
}

grid_search = GridSearchCV(linear, linear_param, cv=10, scoring=scoring, refit='accuracy')

In [90]:
grid_search.fit(features, target)

In [91]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Linear Kernel SVM =", best_params)
print("Best Accuracy Score for Linear Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Linear Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Linear Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Linear Kernel SVM =", best_weighted_f1)

Best parameter for Linear Kernel SVM = {'C': 10}
Best Accuracy Score for Linear Kernel SVM = 0.629102564102564
Best Weighted Precision Score for Linear Kernel SVM = 0.6603815301728332
Best Weighted Recall Score for Linear Kernel SVM = 0.629102564102564
Best Weighted F1 Score for Linear Kernel SVM = 0.584706984213165


### RBF Kernel SVM

In [93]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

rbf = SVC(kernel = 'rbf')

rbf_param = {
     'C': [1,10,100,1000], 
     'gamma': [0.1,1,'scale', 'auto']
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(rbf, rbf_param, cv=10, scoring=scoring, refit='accuracy')

In [94]:
grid_search.fit(features, target)

In [95]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for RBF Kernel SVM =", best_params)
print("Best Accuracy Score for RBF Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for RBF Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for RBF Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for RBF Kernel SVM =", best_weighted_f1)

Best parameter for RBF Kernel SVM = {'C': 1000, 'gamma': 'scale'}
Best Accuracy Score for RBF Kernel SVM = 0.6192307692307694
Best Weighted Precision Score for RBF Kernel SVM = 0.6205976893100426
Best Weighted Recall Score for RBF Kernel SVM = 0.6192307692307694
Best Weighted F1 Score for RBF Kernel SVM = 0.6100169874514775


### Sigmoid Kernel SVM

In [96]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

sigmoid = SVC(kernel = 'sigmoid')

sigmoid_param = {
     'C': [1,10,100,1000], 
     'gamma': [0.1,1,'scale', 'auto'], 
     'coef0': [0, 1, 2]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(sigmoid, sigmoid_param, cv=10, scoring=scoring, refit='accuracy')

In [97]:
grid_search.fit(features, target)

In [98]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Sigmoid Kernel SVM =", best_params)
print("Best Accuracy Score for Sigmoid Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Sigmoid Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Sigmoid Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Sigmoid Kernel SVM =", best_weighted_f1)

Best parameter for Sigmoid Kernel SVM = {'C': 10, 'coef0': 2, 'gamma': 'scale'}
Best Accuracy Score for Sigmoid Kernel SVM = 0.5959615384615384
Best Weighted Precision Score for Sigmoid Kernel SVM = 0.6619354133761093
Best Weighted Recall Score for Sigmoid Kernel SVM = 0.5959615384615384
Best Weighted F1 Score for Sigmoid Kernel SVM = 0.5015432891121969


### Polynomial Kernel SVM

In [13]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

poly = SVC(kernel = 'poly', degree = 3)

poly_param = {
     'C': [1,10,100,1000], 
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(poly, poly_param, cv=10, scoring=scoring, refit='accuracy')

In [14]:
grid_search.fit(features, target)

In [15]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Polynomial Kernel SVM =", best_params)
print("Best Accuracy Score for Polynomial Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Polynomial Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Polynomial Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Polynomial Kernel SVM =", best_weighted_f1)

Best parameter for Polynomial Kernel SVM = {'C': 1000}
Best Accuracy Score for Polynomial Kernel SVM = 0.6140384615384615
Best Weighted Precision Score for Polynomial Kernel SVM = 0.641291632386122
Best Weighted Recall Score for Polynomial Kernel SVM = 0.6140384615384615
Best Weighted F1 Score for Polynomial Kernel SVM = 0.566799395245657


In [4]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

poly = SVC(kernel = 'poly', degree = 3, C = 1000)

poly_param = {
     'coef0': [0, 1, 2]  
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(poly, poly_param, cv=10, scoring=scoring, refit='accuracy')

In [5]:
grid_search.fit(features, target)

In [6]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Polynomial Kernel SVM =", best_params)
print("Best Accuracy Score for Polynomial Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Polynomial Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Polynomial Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Polynomial Kernel SVM =", best_weighted_f1)

Best parameter for Polynomial Kernel SVM = {'coef0': 1}
Best Accuracy Score for Polynomial Kernel SVM = 0.6267307692307693
Best Weighted Precision Score for Polynomial Kernel SVM = 0.6313922360957955
Best Weighted Recall Score for Polynomial Kernel SVM = 0.6267307692307693
Best Weighted F1 Score for Polynomial Kernel SVM = 0.6116837249670092


In [4]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

poly = SVC(kernel = 'poly', degree = 3, C = 1000, coef0 = 1)

poly_param = {
    'degree': [1, 2, 3, 4]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(poly, poly_param, cv=10, scoring=scoring, refit='accuracy')

In [5]:
grid_search.fit(features, target)

In [6]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Polynomial Kernel SVM =", best_params)
print("Best Accuracy Score for Polynomial Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Polynomial Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Polynomial Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Polynomial Kernel SVM =", best_weighted_f1)

Best parameter for Polynomial Kernel SVM = {'degree': 1}
Best Accuracy Score for Polynomial Kernel SVM = 0.629102564102564
Best Weighted Precision Score for Polynomial Kernel SVM = 0.6603815301728332
Best Weighted Recall Score for Polynomial Kernel SVM = 0.629102564102564
Best Weighted F1 Score for Polynomial Kernel SVM = 0.584706984213165


In [4]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

poly = SVC(kernel = 'poly', C = 1000, coef0 = 1, degree = 1)

poly_param = {
    'gamma': [0.1, 1, 'scale', 'auto'],
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(poly, poly_param, cv=10, scoring=scoring, refit='accuracy')

In [5]:
grid_search.fit(features, target)

In [6]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Polynomial Kernel SVM =", best_params)
print("Best Accuracy Score for Polynomial Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Polynomial Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Polynomial Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Polynomial Kernel SVM =", best_weighted_f1)

Best parameter for Polynomial Kernel SVM = {'gamma': 0.1}
Best Accuracy Score for Polynomial Kernel SVM = 0.6317307692307692
Best Weighted Precision Score for Polynomial Kernel SVM = 0.6600658654454038
Best Weighted Recall Score for Polynomial Kernel SVM = 0.6317307692307692
Best Weighted F1 Score for Polynomial Kernel SVM = 0.5908947095838779


# <span style='background:cyan'>For Norm-PHO-Binary Dataset</span>


Emotion counts are normalized so that multiple emotions experienced in a day sums up to 1. Each individual recorded a different number of emotions a day. If the individual expressed two emotions (i.e., joy and sadness) on the same day but at different times of the day, the attribute Emotion_Joy is  assigned a value of 0.5 and the attribute Emotion_Sadness is assigned the value of 0.5. Regardless of the number of emotions recorded in a day, the sum of all emotions is standardized to 15 (from 8 April 2024 – 22 April 2024).

## Prepare and Explore Data

In [16]:
# Import pandas library
import pandas as pd

# Read csv data file
norm = pd.read_csv('Norm-PHO-Binary.csv')

In [11]:
# Find out the number of instances and number of attributes
norm.shape

(391, 10)

In [12]:
norm.isnull().sum().sum()

0

In [13]:
# View the first 5 rows
norm.head()

Unnamed: 0,Gender,Emotion_Joy,Emotion_Sadness,Emotion_Anger,Emotion_Disgust,Emotion_Fear,Emotion_Surprise,Emotion_Contempt,Emotion_Neutral,Depression
0,Female,4.0,3.0,2.0,1.0,0.0,2.0,2.0,1.0,NO
1,Female,8.0,0.0,2.0,0.0,1.0,0.0,0.0,4.0,NO
2,Male,1.67,0.0,0.0,0.0,6.17,0.67,0.0,6.5,NO
3,Male,7.0,0.0,3.0,0.0,0.0,5.0,0.0,0.0,NO
4,Male,3.0,2.0,1.0,0.0,2.0,1.0,0.0,6.0,YES


In [18]:
norm.dtypes

Gender               object
Emotion_Joy         float64
Emotion_Sadness     float64
Emotion_Anger       float64
Emotion_Disgust     float64
Emotion_Fear        float64
Emotion_Surprise    float64
Emotion_Contempt    float64
Emotion_Neutral     float64
Depression           object
dtype: object

In [19]:
# Import LabelEncoder
from sklearn import preprocessing

# Create LabelEncoder
le = preprocessing.LabelEncoder()

norm['Gender'] = le.fit_transform(norm['Gender'])

norm.head()

Unnamed: 0,Gender,Emotion_Joy,Emotion_Sadness,Emotion_Anger,Emotion_Disgust,Emotion_Fear,Emotion_Surprise,Emotion_Contempt,Emotion_Neutral,Depression
0,0,4.0,3.0,2.0,1.0,0.0,2.0,2.0,1.0,NO
1,0,8.0,0.0,2.0,0.0,1.0,0.0,0.0,4.0,NO
2,1,1.67,0.0,0.0,0.0,6.17,0.67,0.0,6.5,NO
3,1,7.0,0.0,3.0,0.0,0.0,5.0,0.0,0.0,NO
4,1,3.0,2.0,1.0,0.0,2.0,1.0,0.0,6.0,YES


In [20]:
# Indicate the target column
target = norm['Depression']

# Indicate the columns that will serve as features
features = norm.drop('Depression', axis = 1)

## Dummy Classifier

In [6]:
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import cross_val_predict

# Initialize the Dummy Classifier
dummy_clf = DummyClassifier(strategy="uniform")

# Perform cross-validation and get predictions for each fold
dummy_val_predictions = cross_val_predict(dummy_clf, features, target, cv=10)

In [7]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy = accuracy_score(target, dummy_val_predictions)
precision_weighted = precision_score(target, dummy_val_predictions, average='weighted')
recall_weighted = recall_score(target, dummy_val_predictions, average='weighted')
f1_weighted = f1_score(target, dummy_val_predictions, average='weighted')

# Print the weighted average evaluation metrics
print('Validation Accuracy for Dummy Classifier =', accuracy)
print('Validation Precision (Weighted Avg) for Dummy Classifier =', precision_weighted)
print('Validation Recall (Weighted Avg) for Dummy Classifier =', recall_weighted)
print('Validation F1 (Weighted Avg) for Dummy Classifier =', f1_weighted)

Validation Accuracy for Dummy Classifier = 0.5038363171355499
Validation Precision (Weighted Avg) for Dummy Classifier = 0.5097191921065872
Validation Recall (Weighted Avg) for Dummy Classifier = 0.5038363171355499
Validation F1 (Weighted Avg) for Dummy Classifier = 0.5054656903059485


## KNN Classifier

In [21]:
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

knn = KNeighborsClassifier()

knn_param = {
    'weights': ['uniform', 'distance'], 
    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
    'p': [1,2,3],
    'leaf_size':list(range(1,50))
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted'),
    'weighted_recall': make_scorer(recall_score, average='weighted'),
    'weighted_f1': make_scorer(f1_score, average='weighted')
}

grid_search = GridSearchCV(knn, knn_param, cv=10, scoring=scoring, refit='accuracy')

In [22]:
grid_search.fit(features, target)

In [23]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for KNN Classifier =", best_params)
print("Best Accuracy Score for KNN Classifier =", best_accuracy)
print("Best Weighted Precision Score for KNN Classifier =", best_weighted_precision)
print("Best Weighted Recall Score for KNN Classifier =", best_weighted_recall)
print("Best Weighted F1 Score for KNN Classifier =", best_weighted_f1)

Best parameter for KNN Classifier = {'algorithm': 'auto', 'leaf_size': 11, 'p': 2, 'weights': 'uniform'}
Best Accuracy Score for KNN Classifier = 0.5830769230769229
Best Weighted Precision Score for KNN Classifier = 0.5806933166234637
Best Weighted Recall Score for KNN Classifier = 0.5830769230769229
Best Weighted F1 Score for KNN Classifier = 0.5790542433967885


In [24]:
knn = KNeighborsClassifier(grid_search.best_estimator_)

knn_param = {
    'n_neighbors': list(range(1,40))
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),    
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(knn, knn_param, cv=10, scoring=scoring, refit='accuracy')

In [25]:
grid_search.fit(features, target)

In [26]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for KNN Classifier =", best_params)
print("Best Accuracy Score for KNN Classifier =", best_accuracy)
print("Best Weighted Precision Score for KNN Classifier =", best_weighted_precision)
print("Best Weighted Recall Score for KNN Classifier =", best_weighted_recall)
print("Best Weighted F1 Score for KNN Classifier =", best_weighted_f1)

Best parameter for KNN Classifier = {'n_neighbors': 37}
Best Accuracy Score for KNN Classifier = 0.6139102564102565
Best Weighted Precision Score for KNN Classifier = 0.6218449784811341
Best Weighted Recall Score for KNN Classifier = 0.6139102564102565
Best Weighted F1 Score for KNN Classifier = 0.6075398584254749


## Decision Tree Classifier

In [13]:
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

dt = DecisionTreeClassifier(random_state=0)

dt_param = {
     'max_depth': [3, 5, 7, 10, 15, None],
     'min_samples_leaf': [1, 3, 5, 10, 15, 20],
     'min_samples_split': [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
     'criterion': ['gini', 'entropy', 'log_loss'],
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted'),
    'weighted_recall': make_scorer(recall_score, average='weighted'),
    'weighted_f1': make_scorer(f1_score, average='weighted')
    
}

grid_search = GridSearchCV(dt, dt_param, cv=10, scoring=scoring, refit='accuracy')

In [14]:
grid_search.fit(features, target)

In [15]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Decision Tree Classifier =", best_params)
print("Best Accuracy Score for Decision Tree Classifier =", best_accuracy)
print("Best Weighted Precision Score for Decision Tree Classifier =", best_weighted_precision)
print("Best Weighted Recall Score for Decision Tree Classifier =", best_weighted_recall)
print("Best Weighted F1 Score for Decision Tree Classifier =", best_weighted_f1)

Best parameter for Decision Tree Classifier = {'criterion': 'gini', 'max_depth': 15, 'min_samples_leaf': 5, 'min_samples_split': 2}
Best Accuracy Score for Decision Tree Classifier = 0.6344871794871795
Best Weighted Precision Score for Decision Tree Classifier = 0.6331943989261435
Best Weighted Recall Score for Decision Tree Classifier = 0.6344871794871795
Best Weighted F1 Score for Decision Tree Classifier = 0.6248197360704765


## Naive Bayes Classifier

### Gaussian Naive Bayes

In [17]:
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

gnb = GaussianNB()

gnb_param = {
    'var_smoothing': [1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4]  
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted'),
    'weighted_recall': make_scorer(recall_score, average='weighted'),
    'weighted_f1': make_scorer(f1_score, average='weighted')
}

grid_search = GridSearchCV(gnb, gnb_param, cv=10, scoring=scoring, refit='accuracy')

In [18]:
grid_search.fit(features, target)

In [19]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Gaussian Naive Bayes =", best_params)
print("Best Accuracy Score for Gaussian Naive Bayes =", best_accuracy)
print("Best Weighted Precision Score for Gaussian Naive Bayes =", best_weighted_precision)
print("Best Weighted Recall Score for Gaussian Naive Bayes =", best_weighted_recall)
print("Best Weighted F1 Score for Gaussian Naive Bayes =", best_weighted_f1)

Best parameter for Gaussian Naive Bayes = {'var_smoothing': 1e-09}
Best Accuracy Score for Gaussian Naive Bayes = 0.6037179487179487
Best Weighted Precision Score for Gaussian Naive Bayes = 0.6072272327236828
Best Weighted Recall Score for Gaussian Naive Bayes = 0.6037179487179487
Best Weighted F1 Score for Gaussian Naive Bayes = 0.5939537704584154


### Bernoulli Naive Bayes

In [20]:
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

bnb = BernoulliNB()

bnb_param = {
    'binarize': [0.0, 0.5, 1.0],  
    'alpha': [0.5, 1.0, 2.0], 
    'force_alpha': [True,False],
    'fit_prior': [True,False]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0),
}

grid_search = GridSearchCV(bnb, bnb_param, cv=10, scoring=scoring, refit='accuracy')

In [21]:
grid_search.fit(features, target)

In [22]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Bernoulli Naive Bayes =", best_params)
print("Best Accuracy Score for Bernoulli Naive Bayes =", best_accuracy)
print("Best Weighted Precision Score for Bernoulli Naive Bayes =", best_weighted_precision)
print("Best Weighted Recall Score for Bernoulli Naive Bayes =", best_weighted_recall)
print("Best Weighted F1 Score for Bernoulli Naive Bayes =", best_weighted_f1)

Best parameter for Bernoulli Naive Bayes = {'alpha': 2.0, 'binarize': 0.0, 'fit_prior': False, 'force_alpha': True}
Best Accuracy Score for Bernoulli Naive Bayes = 0.5982692307692308
Best Weighted Precision Score for Bernoulli Naive Bayes = 0.6111874343979606
Best Weighted Recall Score for Bernoulli Naive Bayes = 0.5982692307692308
Best Weighted F1 Score for Bernoulli Naive Bayes = 0.5969725232269818


### Multinomial Naive Bayes

In [23]:
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

mnb = MultinomialNB()

mnb_param = {
    'alpha': [0.5, 1.0, 2.0, 3.0], 
    'force_alpha': [True,False],
    'fit_prior': [True,False]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(mnb, mnb_param, cv=10, scoring=scoring, refit='accuracy')

In [24]:
grid_search.fit(features, target)

In [25]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Multinomial Naive Bayes =", best_params)
print("Best Accuracy Score for Multinomial Naive Bayes =", best_accuracy)
print("Best Weighted Precision Score for Multinomial Naive Bayes =", best_weighted_precision)
print("Best Weighted Recall Score for Multinomial Naive Bayes =", best_weighted_recall)
print("Best Weighted F1 Score for Multinomial Naive Bayes =", best_weighted_f1)

Best parameter for Multinomial Naive Bayes = {'alpha': 0.5, 'fit_prior': True, 'force_alpha': True}
Best Accuracy Score for Multinomial Naive Bayes = 0.6217948717948719
Best Weighted Precision Score for Multinomial Naive Bayes = 0.6111874343979606
Best Weighted Recall Score for Multinomial Naive Bayes = 0.6217948717948719
Best Weighted F1 Score for Multinomial Naive Bayes = 0.6128609003632841


## SVM Classifier

### Linear Kernel SVM

In [26]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

linear = SVC(kernel = 'linear')

linear_param = {
     'C': [1, 10, 100, 1000]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted'),
    'weighted_recall': make_scorer(recall_score, average='weighted'),
    'weighted_f1': make_scorer(f1_score, average='weighted')
}

grid_search = GridSearchCV(linear, linear_param, cv=10, scoring=scoring, refit='accuracy')

In [27]:
grid_search.fit(features, target)

In [28]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Linear Kernel SVM =", best_params)
print("Best Accuracy Score for Linear Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Linear Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Linear Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Linear Kernel SVM =", best_weighted_f1)

Best parameter for Linear Kernel SVM = {'C': 10}
Best Accuracy Score for Linear Kernel SVM = 0.6140384615384615
Best Weighted Precision Score for Linear Kernel SVM = 0.6199661083267209
Best Weighted Recall Score for Linear Kernel SVM = 0.6140384615384615
Best Weighted F1 Score for Linear Kernel SVM = 0.6036694515349776


### RBF Kernel SVM

In [29]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

rbf = SVC(kernel = 'rbf')

rbf_param = {
     'C': [1,10,100,1000], 
     'gamma': [0.1,1,'scale', 'auto']
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(rbf, rbf_param, cv=10, scoring=scoring, refit='accuracy')

In [30]:
grid_search.fit(features, target)

In [31]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for RBF Kernel SVM =", best_params)
print("Best Accuracy Score for RBF Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for RBF Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for RBF Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for RBF Kernel SVM =", best_weighted_f1)

Best parameter for RBF Kernel SVM = {'C': 10, 'gamma': 'scale'}
Best Accuracy Score for RBF Kernel SVM = 0.634423076923077
Best Weighted Precision Score for RBF Kernel SVM = 0.6398840818089655
Best Weighted Recall Score for RBF Kernel SVM = 0.634423076923077
Best Weighted F1 Score for RBF Kernel SVM = 0.6308991786566579


### Sigmoid Kernel SVM

In [32]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

sigmoid = SVC(kernel = 'sigmoid')

sigmoid_param = {
     'C': [1,10,100,1000], 
     'gamma': [0.1,1,'scale', 'auto'], 
     'coef0': [0, 1, 2]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(sigmoid, sigmoid_param, cv=10, scoring=scoring, refit='accuracy')

In [33]:
grid_search.fit(features, target)

In [34]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Sigmoid Kernel SVM =", best_params)
print("Best Accuracy Score for Sigmoid Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Sigmoid Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Sigmoid Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Sigmoid Kernel SVM =", best_weighted_f1)

Best parameter for Sigmoid Kernel SVM = {'C': 10, 'coef0': 2, 'gamma': 'scale'}
Best Accuracy Score for Sigmoid Kernel SVM = 0.5908974358974359
Best Weighted Precision Score for Sigmoid Kernel SVM = 0.6887552836237048
Best Weighted Recall Score for Sigmoid Kernel SVM = 0.5908974358974359
Best Weighted F1 Score for Sigmoid Kernel SVM = 0.4885388932139062


### Polynomial Kernel SVM

In [4]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

poly = SVC(kernel = 'poly', degree = 3)

poly_param = {
     'C': [1,10,100,1000], 
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(poly, poly_param, cv=10, scoring=scoring, refit='accuracy')

In [5]:
grid_search.fit(features, target)

In [6]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Polynomial Kernel SVM =", best_params)
print("Best Accuracy Score for Polynomial Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Polynomial Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Polynomial Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Polynomial Kernel SVM =", best_weighted_f1)

Best parameter for Polynomial Kernel SVM = {'C': 10}
Best Accuracy Score for Polynomial Kernel SVM = 0.6471794871794871
Best Weighted Precision Score for Polynomial Kernel SVM = 0.6554194134912836
Best Weighted Recall Score for Polynomial Kernel SVM = 0.6471794871794871
Best Weighted F1 Score for Polynomial Kernel SVM = 0.6442918824777149


In [7]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

poly = SVC(kernel = 'poly', degree = 3, C = 10)

poly_param = {
     'coef0': [0, 1, 2]  
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(poly, poly_param, cv=10, scoring=scoring, refit='accuracy')

In [8]:
grid_search.fit(features, target)

In [9]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Polynomial Kernel SVM =", best_params)
print("Best Accuracy Score for Polynomial Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Polynomial Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Polynomial Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Polynomial Kernel SVM =", best_weighted_f1)

Best parameter for Polynomial Kernel SVM = {'coef0': 0}
Best Accuracy Score for Polynomial Kernel SVM = 0.6471794871794871
Best Weighted Precision Score for Polynomial Kernel SVM = 0.6554194134912836
Best Weighted Recall Score for Polynomial Kernel SVM = 0.6471794871794871
Best Weighted F1 Score for Polynomial Kernel SVM = 0.6442918824777149


In [10]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

poly = SVC(kernel = 'poly', degree = 3, C = 10, coef0 = 0)

poly_param = {
    'degree': [1, 2, 3, 4]
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(poly, poly_param, cv=10, scoring=scoring, refit='accuracy')

In [11]:
grid_search.fit(features, target)

In [12]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Polynomial Kernel SVM =", best_params)
print("Best Accuracy Score for Polynomial Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Polynomial Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Polynomial Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Polynomial Kernel SVM =", best_weighted_f1)

Best parameter for Polynomial Kernel SVM = {'degree': 3}
Best Accuracy Score for Polynomial Kernel SVM = 0.6471794871794871
Best Weighted Precision Score for Polynomial Kernel SVM = 0.6554194134912836
Best Weighted Recall Score for Polynomial Kernel SVM = 0.6471794871794871
Best Weighted F1 Score for Polynomial Kernel SVM = 0.6442918824777149


#### Manually search for best gamma value rather than using GridSearch as it takes longer time

##### Gamma = 0.1

In [4]:
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_predict

poly = SVC(kernel = 'poly', C = 10, coef0 = 0, degree = 3, gamma = 0.1)

# Perform cross-validation and get predictions for each fold
poly_val_predictions = cross_val_predict(poly, features, target, cv=10)

In [5]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy = accuracy_score(target, poly_val_predictions)
precision_weighted = precision_score(target, poly_val_predictions, average='weighted')
recall_weighted = recall_score(target, poly_val_predictions, average='weighted')
f1_weighted = f1_score(target, poly_val_predictions, average='weighted')

# Print the weighted average evaluation metrics
print('Validation Accuracy for Polynomial Kernel SVM =', accuracy)
print('Validation Precision (Weighted Avg) for Polynomial Kernel SVM =', precision_weighted)
print('Validation Recall (Weighted Avg) for Polynomial Kernel SVM =', recall_weighted)
print('Validation F1 (Weighted Avg) for Polynomial Kernel SVM =', f1_weighted)

Validation Accuracy for Polynomial Kernel SVM = 0.5703324808184144
Validation Precision (Weighted Avg) for Polynomial Kernel SVM = 0.5713831011367901
Validation Recall (Weighted Avg) for Polynomial Kernel SVM = 0.5703324808184144
Validation F1 (Weighted Avg) for Polynomial Kernel SVM = 0.5707985223074736


##### Gamma = 1

In [4]:
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_predict

poly = SVC(kernel = 'poly', C = 10, coef0 = 0, degree = 3, gamma = 1)

# Perform cross-validation and get predictions for each fold
poly_val_predictions = cross_val_predict(poly, features, target, cv=10)

In [5]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy = accuracy_score(target, poly_val_predictions)
precision_weighted = precision_score(target, poly_val_predictions, average='weighted')
recall_weighted = recall_score(target, poly_val_predictions, average='weighted')
f1_weighted = f1_score(target, poly_val_predictions, average='weighted')

# Print the weighted average evaluation metrics
print('Validation Accuracy for Polynomial Kernel SVM =', accuracy)
print('Validation Precision (Weighted Avg) for Polynomial Kernel SVM =', precision_weighted)
print('Validation Recall (Weighted Avg) for Polynomial Kernel SVM =', recall_weighted)
print('Validation F1 (Weighted Avg) for Polynomial Kernel SVM =', f1_weighted)

Validation Accuracy for Polynomial Kernel SVM = 0.5447570332480819
Validation Precision (Weighted Avg) for Polynomial Kernel SVM = 0.5469882404078614
Validation Recall (Weighted Avg) for Polynomial Kernel SVM = 0.5447570332480819
Validation F1 (Weighted Avg) for Polynomial Kernel SVM = 0.5456464144070564


##### Gamma = scale (default value)

In [6]:
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_predict

poly = SVC(kernel = 'poly', C = 10, coef0 = 0, degree = 3, gamma = 'scale')

# Perform cross-validation and get predictions for each fold
poly_val_predictions = cross_val_predict(poly, features, target, cv=10)

In [7]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy = accuracy_score(target, poly_val_predictions)
precision_weighted = precision_score(target, poly_val_predictions, average='weighted')
recall_weighted = recall_score(target, poly_val_predictions, average='weighted')
f1_weighted = f1_score(target, poly_val_predictions, average='weighted')

# Print the weighted average evaluation metrics
print('Validation Accuracy for Polynomial Kernel SVM =', accuracy)
print('Validation Precision (Weighted Avg) for Polynomial Kernel SVM =', precision_weighted)
print('Validation Recall (Weighted Avg) for Polynomial Kernel SVM =', recall_weighted)
print('Validation F1 (Weighted Avg) for Polynomial Kernel SVM =', f1_weighted)

Validation Accuracy for Polynomial Kernel SVM = 0.6470588235294118
Validation Precision (Weighted Avg) for Polynomial Kernel SVM = 0.6479595481468368
Validation Recall (Weighted Avg) for Polynomial Kernel SVM = 0.6470588235294118
Validation F1 (Weighted Avg) for Polynomial Kernel SVM = 0.6474416433239962


##### Gamma = auto

In [8]:
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_predict

poly = SVC(kernel = 'poly', C = 10, coef0 = 0, degree = 3, gamma = 'auto')

# Perform cross-validation and get predictions for each fold
poly_val_predictions = cross_val_predict(poly, features, target, cv=10)

In [9]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy = accuracy_score(target, poly_val_predictions)
precision_weighted = precision_score(target, poly_val_predictions, average='weighted')
recall_weighted = recall_score(target, poly_val_predictions, average='weighted')
f1_weighted = f1_score(target, poly_val_predictions, average='weighted')

# Print the weighted average evaluation metrics
print('Validation Accuracy for Polynomial Kernel SVM =', accuracy)
print('Validation Precision (Weighted Avg) for Polynomial Kernel SVM =', precision_weighted)
print('Validation Recall (Weighted Avg) for Polynomial Kernel SVM =', recall_weighted)
print('Validation F1 (Weighted Avg) for Polynomial Kernel SVM =', f1_weighted)

Validation Accuracy for Polynomial Kernel SVM = 0.5652173913043478
Validation Precision (Weighted Avg) for Polynomial Kernel SVM = 0.5668259998808438
Validation Recall (Weighted Avg) for Polynomial Kernel SVM = 0.5652173913043478
Validation F1 (Weighted Avg) for Polynomial Kernel SVM = 0.5658895578637742


#### Code to search for best gamma value using GridSearch (unable to run it)

In [18]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

poly = SVC(kernel = 'poly', C = 10, coef0 = 0, degree = 3)

poly_param = {
    'gamma': [0.1, 1, 'scale', 'auto'],
}

scoring = {
    'accuracy': 'accuracy',
    'weighted_precision': make_scorer(precision_score, average='weighted', zero_division=0),
    'weighted_recall': make_scorer(recall_score, average='weighted', zero_division=0),
    'weighted_f1': make_scorer(f1_score, average='weighted', zero_division=0)
}

grid_search = GridSearchCV(poly, poly_param, cv=10, scoring=scoring, refit='accuracy')

In [None]:
grid_search.fit(features, target)

In [None]:
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
best_weighted_precision = grid_search.cv_results_['mean_test_weighted_precision'][grid_search.best_index_]
best_weighted_recall = grid_search.cv_results_['mean_test_weighted_recall'][grid_search.best_index_]
best_weighted_f1 = grid_search.cv_results_['mean_test_weighted_f1'][grid_search.best_index_]

print("Best parameter for Polynomial Kernel SVM =", best_params)
print("Best Accuracy Score for Polynomial Kernel SVM =", best_accuracy)
print("Best Weighted Precision Score for Polynomial Kernel SVM =", best_weighted_precision)
print("Best Weighted Recall Score for Polynomial Kernel SVM =", best_weighted_recall)
print("Best Weighted F1 Score for Polynomial Kernel SVM =", best_weighted_f1)

# Best Model 

SVM (kernel = poly) using Norm Dataset

In [24]:
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_predict

poly = SVC(kernel = 'poly', C = 10, coef0 = 0, degree = 3, gamma = 'scale')

# Perform cross-validation and get predictions for each fold
poly_val_predictions = cross_val_predict(poly, features, target, cv=10)

In [25]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy = accuracy_score(target, poly_val_predictions)
precision_weighted = precision_score(target, poly_val_predictions, average='weighted')
recall_weighted = recall_score(target, poly_val_predictions, average='weighted')
f1_weighted = f1_score(target, poly_val_predictions, average='weighted')

# Print the weighted average evaluation metrics
print('Validation Accuracy for Polynomial Kernel SVM =', accuracy)
print('Validation Precision (Weighted Avg) for Polynomial Kernel SVM =', precision_weighted)
print('Validation Recall (Weighted Avg) for Polynomial Kernel SVM =', recall_weighted)
print('Validation F1 (Weighted Avg) for Polynomial Kernel SVM =', f1_weighted)

Validation Accuracy for Polynomial Kernel SVM = 0.6470588235294118
Validation Precision (Weighted Avg) for Polynomial Kernel SVM = 0.6479595481468368
Validation Recall (Weighted Avg) for Polynomial Kernel SVM = 0.6470588235294118
Validation F1 (Weighted Avg) for Polynomial Kernel SVM = 0.6474416433239962


In [26]:
# Train the final model on the entire dataset
poly.fit(features, target)

In [27]:
# Import pickle
import pickle

# Specify the file name to save the model
# Use filename='freq_model.sav' for Freq-PHO-Binary
# Use filename='norm_model.sav' for Norm-PHO-Binary
filename='norm_model.sav'
# Open the file name in write mode. Pass the filename and model.
# Replace modelname with the name of your model
pickle.dump(poly, open(filename, 'wb'))