In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
import pandas as pd

df = pd.read_csv(r'/content/drive/MyDrive/My/mxmh_survey_results.csv')
df.head()

Unnamed: 0,Timestamp,Age,Primary streaming service,Hours per day,While working,Instrumentalist,Composer,Fav genre,Exploratory,Foreign languages,...,Frequency [R&B],Frequency [Rap],Frequency [Rock],Frequency [Video game music],Anxiety,Depression,Insomnia,OCD,Music effects,Permissions
0,8/27/2022 19:29:02,18.0,Spotify,3.0,Yes,Yes,Yes,Latin,Yes,Yes,...,Sometimes,Very frequently,Never,Sometimes,3.0,0.0,1.0,0.0,,I understand.
1,8/27/2022 19:57:31,63.0,Pandora,1.5,Yes,No,No,Rock,Yes,No,...,Sometimes,Rarely,Very frequently,Rarely,7.0,2.0,2.0,1.0,,I understand.
2,8/27/2022 21:28:18,18.0,Spotify,4.0,No,No,No,Video game music,No,Yes,...,Never,Rarely,Rarely,Very frequently,7.0,7.0,10.0,2.0,No effect,I understand.
3,8/27/2022 21:40:40,61.0,YouTube Music,2.5,Yes,No,Yes,Jazz,Yes,Yes,...,Sometimes,Never,Never,Never,9.0,7.0,3.0,3.0,Improve,I understand.
4,8/27/2022 21:54:47,18.0,Spotify,4.0,Yes,No,No,R&B,Yes,No,...,Very frequently,Very frequently,Never,Rarely,7.0,2.0,5.0,9.0,Improve,I understand.


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 736 entries, 0 to 735
Data columns (total 33 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Timestamp                     736 non-null    object 
 1   Age                           735 non-null    float64
 2   Primary streaming service     735 non-null    object 
 3   Hours per day                 736 non-null    float64
 4   While working                 733 non-null    object 
 5   Instrumentalist               732 non-null    object 
 6   Composer                      735 non-null    object 
 7   Fav genre                     736 non-null    object 
 8   Exploratory                   736 non-null    object 
 9   Foreign languages             732 non-null    object 
 10  BPM                           629 non-null    float64
 11  Frequency [Classical]         736 non-null    object 
 12  Frequency [Country]           736 non-null    object 
 13  Frequ

# Cleaning

In [7]:
# round all values

df = df.round()

In [8]:
# drop out of bounds bpm

df = df[(df['BPM'].isnull()) | (df['BPM'] >= 20) & (df['BPM'] <= 500)]

In [9]:
df.isna().sum()

Timestamp                         0
Age                               1
Primary streaming service         1
Hours per day                     0
While working                     3
Instrumentalist                   4
Composer                          1
Fav genre                         0
Exploratory                       0
Foreign languages                 4
BPM                             107
Frequency [Classical]             0
Frequency [Country]               0
Frequency [EDM]                   0
Frequency [Folk]                  0
Frequency [Gospel]                0
Frequency [Hip hop]               0
Frequency [Jazz]                  0
Frequency [K pop]                 0
Frequency [Latin]                 0
Frequency [Lofi]                  0
Frequency [Metal]                 0
Frequency [Pop]                   0
Frequency [R&B]                   0
Frequency [Rap]                   0
Frequency [Rock]                  0
Frequency [Video game music]      0
Anxiety                     

We can use Simple Imputer to fix this later.

# Splitting data

In [10]:
df['Fav genre'].value_counts()

Rock                188
Pop                 114
Metal                86
Classical            53
Video game music     43
EDM                  36
Hip hop              35
R&B                  34
Folk                 29
K pop                26
Country              25
Rap                  22
Jazz                 20
Lofi                 10
Gospel                5
Latin                 3
Name: Fav genre, dtype: int64

We have *very* limited data for certain genres. We could create an "Other" category for all genres below a certain threshold (or even drop those instances) to improve performance. However, I have decided against this, solely because I want users to have a wide array of choices when using this model. 

(Online learning could also be set up to improve performance, as more data is acquired.)

For now, I'll be splitting the train and test sets by stratifying over the Favorite genre feature, such that they have equal proportions. This ensures "rarer" genres exist in both sets, so we do not have a dimension mismatch later on.

I also decide to use a 85/15 split (rather than 80/20) to maximize allotted training data.

In [11]:
import numpy as np
from sklearn.model_selection import train_test_split

X = df.drop(['Depression', 'Anxiety', 'Insomnia', 'OCD'], axis=1)
y1 = df['Anxiety']
y2 = df['Depression']
y3 = df['Insomnia']
y4 = df['OCD']

y = np.column_stack((y1, y2, y3, y4))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, stratify=df['Fav genre'], random_state=42)

Let's make sure the Favorite genre feature of both sets have the same number of values:

In [12]:
len(X_train['Fav genre'].value_counts())

16

In [13]:
len(X_test['Fav genre'].value_counts())

16

# Custom transformer

In [14]:
from sklearn.base import BaseEstimator, TransformerMixin

class Remover(BaseEstimator, TransformerMixin):

    def __init__(self, useless):
        self.useless = useless

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X_copy = X.copy()

        X_copy = X_copy.drop(self.useless, axis=1)

        return X_copy

# Pipeline

In [15]:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer

In [16]:
useless = ["Timestamp", "Permissions"]

cat_stuff = X_train.columns[X_train.dtypes == 'object']
cat_stuff = cat_stuff.difference(useless)
cat_stuff = cat_stuff.tolist()

num_stuff = X_train.columns.difference(useless).difference(cat_stuff)
num_stuff = num_stuff.tolist()

In [17]:
num_pipeline = Pipeline ([
    ("imputer", SimpleImputer(strategy="mean")),
    ("std", StandardScaler())
])

cat_pipeline = Pipeline ([
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("one_hot", OneHotEncoder())
])

core_pipeline = ColumnTransformer([
    ("cat", cat_pipeline, cat_stuff),
    ("num", num_pipeline, num_stuff)
])

full_pipeline = Pipeline ([
    ("remover", Remover(useless)),
    ("core_pipeline", core_pipeline)
])

In [18]:
X_train = full_pipeline.fit_transform(X_train)
X_test = full_pipeline.fit_transform(X_test)

In [19]:
X_train.shape, X_test.shape

((619, 102), (110, 102))

In [20]:
y_train.shape

(619, 4)

Dimensions look good!

# Model 1A: Complete Rankings, MultiOutput + AdaBoost + RFC 

In [21]:
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier

multi = MultiOutputClassifier(estimator=AdaBoostClassifier(base_estimator=RandomForestClassifier()))
multi.get_params().keys()

dict_keys(['estimator__algorithm', 'estimator__base_estimator__bootstrap', 'estimator__base_estimator__ccp_alpha', 'estimator__base_estimator__class_weight', 'estimator__base_estimator__criterion', 'estimator__base_estimator__max_depth', 'estimator__base_estimator__max_features', 'estimator__base_estimator__max_leaf_nodes', 'estimator__base_estimator__max_samples', 'estimator__base_estimator__min_impurity_decrease', 'estimator__base_estimator__min_samples_leaf', 'estimator__base_estimator__min_samples_split', 'estimator__base_estimator__min_weight_fraction_leaf', 'estimator__base_estimator__n_estimators', 'estimator__base_estimator__n_jobs', 'estimator__base_estimator__oob_score', 'estimator__base_estimator__random_state', 'estimator__base_estimator__verbose', 'estimator__base_estimator__warm_start', 'estimator__base_estimator', 'estimator__estimator', 'estimator__learning_rate', 'estimator__n_estimators', 'estimator__random_state', 'estimator', 'n_jobs'])

In [22]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
import warnings

warnings.filterwarnings("ignore")

model = MultiOutputClassifier(AdaBoostClassifier(base_estimator=RandomForestClassifier()))

param_distributions = {
    "estimator__learning_rate": uniform(0.01, 1.0),
    "estimator__n_estimators": [50, 100, 150],
    "estimator__base_estimator__n_estimators": [100, 200, 300, 400],
    "estimator__base_estimator__criterion": ['gini', 'entropy']
}

random_search = RandomizedSearchCV(model, param_distributions, cv=5, scoring="f1_weighted", return_train_score=True, n_iter=20)

random_search.fit(X_train, y_train)

print(random_search.best_params_)

best_model = random_search.best_estimator_

{'estimator__base_estimator__criterion': 'gini', 'estimator__base_estimator__n_estimators': 400, 'estimator__learning_rate': 0.9512214491460526, 'estimator__n_estimators': 150}


In [23]:
# recreate model from Version 1

best_model = MultiOutputClassifier(
    estimator=AdaBoostClassifier(
        base_estimator=RandomForestClassifier(
            criterion='gini',
            n_estimators=200
        ),
        learning_rate=0.48508192499490754,
        n_estimators=100
    )
)

In [24]:
best_model.fit(X_train, y_train)

y_pred = best_model.predict(X_test)

In [25]:
y_pred[:5]

array([[ 7.,  0.,  0.,  0.],
       [10.,  8.,  6.,  0.],
       [ 5.,  4.,  2.,  0.],
       [ 8.,  7.,  0.,  0.],
       [ 7.,  0.,  2.,  0.]])

In [26]:
y_test[:5]

array([[ 4.,  2.,  0.,  0.],
       [ 3.,  2.,  5.,  6.],
       [ 5.,  3.,  5.,  3.],
       [ 5.,  1.,  1.,  9.],
       [ 3., 10.,  4.,  1.]])

## Model 1A Evaluation

In [27]:
base_estimator = best_model.estimators_[0]
feature_importances = base_estimator.feature_importances_

feature_importances

array([0.00847484, 0.00923253, 0.01007318, 0.00968657, 0.00486262,
       0.00318199, 0.00409214, 0.00323188, 0.00097977, 0.00371406,
       0.0020656 , 0.00321917, 0.00039543, 0.00188886, 0.00711971,
       0.00840723, 0.00373841, 0.00324453, 0.01121512, 0.00416779,
       0.01127267, 0.01153866, 0.01021828, 0.01258935, 0.01053004,
       0.00772212, 0.01235939, 0.01093377, 0.00838463, 0.00500417,
       0.01256083, 0.011684  , 0.0096854 , 0.00838661, 0.01160581,
       0.01226493, 0.01062565, 0.00678067, 0.00997233, 0.00836466,
       0.00537689, 0.00144044, 0.00972803, 0.01229961, 0.01087242,
       0.00734305, 0.01301074, 0.01283556, 0.01040052, 0.00534634,
       0.01127243, 0.01061441, 0.00625961, 0.00538623, 0.01126544,
       0.01028605, 0.00619101, 0.00406673, 0.01162776, 0.01202205,
       0.01038951, 0.00712525, 0.01196106, 0.01107904, 0.01151774,
       0.00974576, 0.00592934, 0.01119266, 0.01270389, 0.01144423,
       0.01091374, 0.01251895, 0.01151378, 0.00779541, 0.01016

In [28]:
import numpy as np
from sklearn.metrics import precision_score, recall_score, f1_score

def multi_classification_report(y_test, y_pred):

    avg_precision = 0
    avg_recall = 0
    avg_f1 = 0
    support_list = []
    
    for i in range(y_test.shape[1]):
        precision = precision_score(y_test[:, i], y_pred[:, i], average='weighted')
        recall = recall_score(y_test[:, i], y_pred[:, i], average='weighted')
        f1 = f1_score(y_test[:, i], y_pred[:, i], average='weighted')
        
        avg_precision += precision
        avg_recall += recall
        avg_f1 += f1
        support_list.append(precision_score(y_test[:, i], y_pred[:, i], average='weighted', zero_division=1))
    
    avg_precision /= y_test.shape[1]
    avg_recall /= y_test.shape[1]
    avg_f1 /= y_test.shape[1]
    
    report = (f"Precision: {avg_precision:.5f}\n"
             f"Recall: {avg_recall:.5f}\n"
             f"F1 score: {avg_f1:.5f}\n"
             f"Support: {support_list}\n")
             
    return report

In [29]:
report = multi_classification_report(y_test, y_pred)
print(report)

Precision: 0.19395
Recall: 0.22727
F1 score: 0.16882
Support: [0.2559243275152366, 0.20582664884135476, 0.3841322314049586, 0.41174617461746177]



In [30]:
from sklearn.metrics import jaccard_score

def jscore(y_test, y_pred):

    scores = [jaccard_score(y_test[:, i], y_pred[:, i], average='weighted') for i in range(y_test.shape[1])]

    avg_score = np.mean(scores)

    print(f"Avg Jaccard similarity coefficient: {avg_score:.5f}")

In [31]:
jscore(y_test, y_pred)

Avg Jaccard similarity coefficient: 0.10277


As expected, this model has significantly low performance. Most trends I extracted in the EDA are the result of noisy data and do not scale well.

## Saving model 1A & pipeline

In [32]:
import pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(best_model, f)

In [33]:
with open("pipeline.pkl", "wb") as f:
    pickle.dump(full_pipeline, f)

# Model 1B: Binary Classification, MultiOutput + AdaBoost + RFC 
This model will follow a similar architecture to Model 1A. However, this time, we'll be converting 1-10 rankings to a binary classification task.

### Convert values

In [34]:
y

array([[ 3.,  0.,  1.,  0.],
       [ 7.,  2.,  2.,  1.],
       [ 7.,  7., 10.,  2.],
       ...,
       [ 2.,  2.,  2.,  2.],
       [ 2.,  3.,  2.,  1.],
       [ 2.,  2.,  2.,  5.]])

In [35]:
mask = y > 4
y_b = np.where(mask, 1, 0)
y_b

array([[0, 0, 0, 0],
       [1, 0, 0, 0],
       [1, 1, 1, 0],
       ...,
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 1]])

In [36]:
x_X_train, x_X_test, y_b_train, y_b_test = train_test_split(X, y_b, test_size=0.15, stratify=df['Fav genre'], random_state=42)

### Model

In [37]:
model = MultiOutputClassifier(AdaBoostClassifier(base_estimator=RandomForestClassifier()))

param_distributions = {
    "estimator__learning_rate": uniform(0.01, 1.0),
    "estimator__n_estimators": [50, 100, 150],
    "estimator__base_estimator__n_estimators": [100, 200, 300, 400],
    "estimator__base_estimator__criterion": ['gini', 'entropy']
}

random_search = RandomizedSearchCV(model, param_distributions, cv=3, scoring="f1_weighted", return_train_score=True, n_iter=20)

random_search.fit(X_train, y_b_train)

print(random_search.best_params_)

best_model = random_search.best_estimator_

{'estimator__base_estimator__criterion': 'entropy', 'estimator__base_estimator__n_estimators': 200, 'estimator__learning_rate': 0.4030983796807208, 'estimator__n_estimators': 50}


In [38]:
best_model.fit(X_train, y_b_train)

y_pred = best_model.predict(X_test)

In [39]:
y_pred[:5]

array([[1, 0, 0, 0],
       [1, 1, 1, 1],
       [1, 0, 0, 0],
       [1, 1, 0, 0],
       [1, 0, 0, 0]])

In [40]:
y_b_test[:5]

array([[0, 0, 0, 0],
       [0, 0, 1, 1],
       [1, 0, 1, 0],
       [1, 0, 0, 1],
       [0, 1, 0, 0]])

### Model 1B Evaluation

In [41]:
base_estimator = best_model.estimators_[0]
feature_importances = base_estimator.feature_importances_

feature_importances

array([0.0072744 , 0.00814782, 0.0097565 , 0.00931733, 0.00540352,
       0.00322558, 0.00558339, 0.00285817, 0.00122516, 0.00449271,
       0.00281799, 0.00339371, 0.00052625, 0.0012845 , 0.00647942,
       0.00845717, 0.00479324, 0.0053759 , 0.01091739, 0.00538113,
       0.00973429, 0.01076863, 0.01008647, 0.01279706, 0.00882464,
       0.00878655, 0.01207772, 0.01126529, 0.00755765, 0.00423196,
       0.01123441, 0.01048542, 0.00806958, 0.00725332, 0.01331842,
       0.01117993, 0.00820246, 0.00705966, 0.00856441, 0.00836253,
       0.00484738, 0.00139614, 0.00923928, 0.01125801, 0.00975201,
       0.00739508, 0.01113636, 0.01114686, 0.00955907, 0.00576669,
       0.01173485, 0.00930847, 0.00678303, 0.00611266, 0.01077923,
       0.00961587, 0.00637693, 0.00280262, 0.01019488, 0.01140357,
       0.00917941, 0.00792294, 0.01398104, 0.00958775, 0.01137474,
       0.0075531 , 0.00785316, 0.00990626, 0.01182117, 0.00993264,
       0.00946265, 0.01097923, 0.01116792, 0.00820795, 0.01191

In [42]:
report = multi_classification_report(y_test, y_pred)
print(report)

Precision: 0.06863
Recall: 0.18636
F1 score: 0.09610
Support: [0.9064935064935066, 0.8093502593502593, 0.671900826446281, 0.5958833619210978]



In [43]:
jscore(y_test, y_pred)

Avg Jaccard similarity coefficient: 0.06097


### Saving Model1B

In [44]:
with open('modelb.pkl', 'wb') as f:
    pickle.dump(best_model, f)

# Model 2: Complete Rankings, LazyPredict + XGBoost
Now, we will create 4 individual models for each MH category, before wrapping them in MultiOutputClassifier(). <br/>

In [45]:
!pip install lazypredict

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting lazypredict
  Downloading lazypredict-0.2.12-py2.py3-none-any.whl (12 kB)
Installing collected packages: lazypredict
Successfully installed lazypredict-0.2.12


In [46]:
def clean(y2):
    # separate labels and features
    X = df.drop(['Depression', 'Anxiety', 'Insomnia', 'OCD'], axis=1)
    
    # split train and test
    X_train2, X_test2, y_train2, y_test2 = train_test_split(X, y2, test_size=0.15, stratify=df['Fav genre'], random_state=42)

    # pipeline    
    X_train2 = full_pipeline.fit_transform(X_train2)
    X_test2 = full_pipeline.fit_transform(X_test2)

    X_train2 = X_train2.toarray()
    X_test2 = X_test2.toarray()
    
    return X_train2, X_test2, y_train2, y_test2

In [47]:
from lazypredict.Supervised import LazyClassifier
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)

def lazy_predict(X_train2, X_test2, y_train2, y_test2):
  
    models, predictions = clf.fit(X_train2, X_test2, y_train2, y_test2)
    
    results_df = pd.DataFrame(models)
    results_df = results_df[["F1 Score", "Balanced Accuracy", "Time Taken"]]
    results_df = results_df.sort_values(by='F1 Score', ascending=False)
    print(results_df[:5])

In [48]:
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report
from xgboost import XGBClassifier

xgb_model = XGBClassifier()

def xg(X_train2, X_test2, y_train2, y_test2):
    
    xgb_model.fit(X_train2, y_train2)   
    print(classification_report(y_test2, xgb_model.predict(X_test2)))
       
    f1 = f1_score(y_test2, xgb_model.predict(X_test2), average='weighted')
    print(f"F1 score: {f1:.2f}")

### Anxiety models

In [49]:
lazy_predict(*clean(df['Anxiety']))

100%|██████████| 29/29 [00:36<00:00,  1.26s/it]

                        F1 Score  Balanced Accuracy  Time Taken
Model                                                          
LinearSVC                   0.13               0.12        2.08
LogisticRegression          0.13               0.12        0.47
LGBMClassifier              0.12               0.13        1.71
AdaBoostClassifier          0.12               0.13        0.26
DecisionTreeClassifier      0.12               0.11        0.03





In [50]:
xg(*clean(df['Anxiety']))

              precision    recall  f1-score   support

         0.0       0.00      0.00      0.00         8
         1.0       0.17      0.25      0.20         4
         2.0       0.25      0.09      0.13        11
         3.0       0.08      0.06      0.07        16
         4.0       0.00      0.00      0.00         8
         5.0       0.00      0.00      0.00         6
         6.0       0.08      0.06      0.07        17
         7.0       0.17      0.24      0.20        17
         8.0       0.21      0.50      0.29        10
         9.0       0.00      0.00      0.00         5
        10.0       0.11      0.12      0.12         8

    accuracy                           0.13       110
   macro avg       0.10      0.12      0.10       110
weighted avg       0.11      0.13      0.11       110

F1 score: 0.11


### Depression models

In [51]:
lazy_predict(*clean(df['Depression']))

'tuple' object has no attribute '__name__'
Invalid Classifier(s)


100%|██████████| 29/29 [00:13<00:00,  2.23it/s]

                        F1 Score  Balanced Accuracy  Time Taken
Model                                                          
DecisionTreeClassifier      0.16               0.14        0.02
NuSVC                       0.16               0.11        0.14
BaggingClassifier           0.15               0.12        0.10
ExtraTreeClassifier         0.14               0.15        0.02
SVC                         0.14               0.11        0.09





In [52]:
xg(*clean(df['Depression']))

              precision    recall  f1-score   support

         0.0       0.44      0.20      0.28        20
         1.0       0.00      0.00      0.00         7
         2.0       0.12      0.12      0.12        17
         3.0       0.10      0.08      0.09        13
         4.0       0.08      0.12      0.10         8
         5.0       0.00      0.00      0.00         7
         6.0       0.05      0.14      0.07         7
         7.0       0.17      0.18      0.17        11
         8.0       0.00      0.00      0.00         9
         9.0       0.00      0.00      0.00         5
        10.0       0.17      0.17      0.17         6

    accuracy                           0.11       110
   macro avg       0.10      0.09      0.09       110
weighted avg       0.15      0.11      0.12       110

F1 score: 0.12


### Insomnia model

In [53]:
lazy_predict(*clean(df['Insomnia']))

'tuple' object has no attribute '__name__'
Invalid Classifier(s)


100%|██████████| 29/29 [00:12<00:00,  2.28it/s]

                            F1 Score  Balanced Accuracy  Time Taken
Model                                                              
RandomForestClassifier          0.18               0.10        0.32
KNeighborsClassifier            0.16               0.12        0.02
DecisionTreeClassifier          0.15               0.12        0.02
LinearDiscriminantAnalysis      0.15               0.13        0.07
SGDClassifier                   0.14               0.15        0.14





In [54]:
xg(*clean(df['Insomnia']))

              precision    recall  f1-score   support

         0.0       0.24      0.32      0.27        28
         1.0       0.00      0.00      0.00        18
         2.0       0.28      0.33      0.30        15
         3.0       0.00      0.00      0.00         9
         4.0       0.00      0.00      0.00         5
         5.0       0.00      0.00      0.00         9
         6.0       0.00      0.00      0.00         5
         7.0       0.09      0.20      0.13         5
         8.0       0.00      0.00      0.00         8
         9.0       1.00      0.50      0.67         2
        10.0       0.00      0.00      0.00         6

    accuracy                           0.15       110
   macro avg       0.15      0.12      0.12       110
weighted avg       0.12      0.15      0.13       110

F1 score: 0.13


### OCD model

In [55]:
lazy_predict(*clean(df['OCD']))

'tuple' object has no attribute '__name__'
Invalid Classifier(s)


100%|██████████| 29/29 [00:13<00:00,  2.16it/s]

                      F1 Score  Balanced Accuracy  Time Taken
Model                                                        
RidgeClassifier           0.25               0.13        0.04
RidgeClassifierCV         0.23               0.11        0.04
KNeighborsClassifier      0.23               0.13        0.02
Perceptron                0.22               0.11        0.04
LinearSVC                 0.21               0.13        1.19





In [56]:
xg(*clean(df['OCD']))

              precision    recall  f1-score   support

         0.0       0.33      0.61      0.43        38
         1.0       0.15      0.11      0.12        19
         2.0       0.00      0.00      0.00        15
         3.0       0.00      0.00      0.00         8
         4.0       0.00      0.00      0.00         5
         5.0       0.14      0.25      0.18         4
         6.0       0.00      0.00      0.00         7
         7.0       0.00      0.00      0.00         3
         8.0       0.00      0.00      0.00         6
         9.0       0.00      0.00      0.00         2
        10.0       0.00      0.00      0.00         3

    accuracy                           0.24       110
   macro avg       0.06      0.09      0.07       110
weighted avg       0.15      0.24      0.18       110

F1 score: 0.18


# Reflection

**Model1**

This was my first experience building a model that was wrapped like this! It was also my first time building a multiclass multioutput model (outside of studying or following tutorials).

While I did expect poor performance, I was disappointed with the evaluation results for both Model1A and Model1B. This could suggest that no significant trend with high predictive power exists in the data. However, it is also possible that the model architecture I selected was not the optimal fit for this task. 

I also found it surprisingly that condensing the labels to a binary classification task (1B) did not lead to improvement in performance.


**Model2**

Next, I considered the case that a single model was not suitable in predicting all four MH categories. In Model2, I decided to use 4 different models for each MH category. I used Lazy Predict for the first time to quickly test many different models' performance. I also compared the performace with XGBoost. The optimal models varied for each MH category. These insights could inform what estimators to use for an ensemble learning approach.

However, like Model1, this approach did not yield promising F1 or Jaccard scores.

**Conclusion**

The most meaningful conclusions from this dataset can be derived from the EDA NB. This notebook suggests that we cannot reliably predict mental health rankings from music taste alone.

However, I am curious to see how neural networks might on this data anyway. I plan to run another notebook to check for a significant change in performance. After experimenting with a deep learning approach, my next steps will be comparing performance between these models, and deploying the highest scoring model.