<a href="https://www.kaggle.com/code/sebastinconcha/mental-health-categorical?scriptVersionId=186599817" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session




In [None]:

# Importar librerias / Import libraries

# Saltaremos los warnings / skip the code warnings
import warnings
warnings.filterwarnings('ignore', category=UserWarning)

import tensorflow as tf

from keras import Sequential, Input
from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV

from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss, recall_score

from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

In [None]:

# Configuracion de Panda / Pandas Configuration
pd.set_option('display.max_columns', None)

# Definimos el dataset y lo grabamos en variable "df" / Define dataset and save it into variable called "df":
df = pd.read_csv("/kaggle/input/mental-health-dataset/Mental Health Dataset.csv")

# Revisar rápidamente los elementos del dataframe / Quickyl check the elements of the dataframe
df.head(5)


In [None]:
# Se revisará más información del data frame / Check for extra info of the DF
print(df.info())

In [None]:
# Se dropearán los valores NA, debido a que representan un porcentaje ínfimo al total (1.78%)
# Drop NA values, as they are only 5202 from the main set (1.78%)
df.dropna(inplace=True)

In [None]:
# Se dropea la columna de timestamp, ya que no nos interesa el valor de tiempo
# Dropping TIMESTAMP as we are not interested in the time of recording (only a two year range)
df.drop(labels= 'Timestamp', axis=1, inplace=True)

In [None]:
# Aplicamos onehoteencoding, para que nuestros atributos obtengan valores categoricos
# Applying ONEHOTENCODING (OHE) in categorical features
df_ohe = pd.get_dummies(df, columns = ['Growing_Stress', 'care_options', 'Country', 'Gender','Occupation', 'Changes_Habits', 'Mental_Health_History', 'Work_Interest', 'Mood_Swings', 'Social_Weakness', 'mental_health_interview', 'Days_Indoors'])


In [None]:
# Convertimos los atributos que nos faltan también en valores categoricos. True o "Yes" se convierte en 1. False o "No" se convierte en 0.
# Then convert the True or 'Yes' options to 1, and False or 'No' to 0 with a boolean mapping.
boolean_mapping = {False: 0, True: 1, 'No': 0, 'Yes': 1}
df_BM = df_ohe.applymap(lambda x: boolean_mapping.get(x))

In [None]:
#Revision rapida de nuestro dataframe hasta ahora / Quick review of our dataframe so far
df_BM.head(10)

In [None]:
# Separamos atributos y resultados
# Define X(features) and Y(results)
y = df_BM.pop('treatment')
X = df_BM

In [None]:
# Obtenemos numero de columnas
# Number of columns
n_cols = X.shape[1]

In [None]:
# Separamos el dataset en 80% para entrenar y 20% para probar
# Splitting the dataset in train and test (80% train and 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

**Antes de hacer un análisis, se haran pruebas sobre los métodos más precisos para este caso particular. Nos enfocaremos en la métrica de recall debido a que nos es más importante detectar los falsos negativos que los falsos positivos, es decir, capturar cuando NO se recomienda un atributo de visita a un centro de salud mental cuando SI es recomendable ir. Las consecuencias pueden ser peores, y en temas de salud se intentará maximizar éste parámetro.**

**Before the actual analysis, we will try different aproaches between Deep Networks and Machine Learning. The focus will be set into the "recall" metric, it is far more important for us to detect false negatives than false positives, as in terms of healthcare, we must prioritize and focus in not giving wrong instructions.**

In [None]:
# Definimos una metrica de recall customizada (parámetro que nos interesa por sobre la precisión)
# Custom recall metric 
def recall(y_true, y_pred):
    true_positives = tf.reduce_sum(tf.round(tf.clip_by_value(y_true * y_pred, 0, 1)))
    possible_positives = tf.reduce_sum(tf.round(tf.clip_by_value(y_true, 0, 1)))
    recall = true_positives / (possible_positives + tf.keras.backend.epsilon())
    return recall


**Primero se probará lo que es Redes Neuronales
First we will test Deep Neural Network**

In [None]:
# Definimos el modelo, optimizando nuestro parámetro de recall lo más posible
# Define model, optimizing the recall parameter
def sequential_testing(units1=16, units2=32, units3=64, optimizer='adam'):
    model = Sequential()
    model.add(Input(shape=(n_cols, )))
    model.add(Dense(units=units1, activation='relu'))
    model.add(Dense(units=units2, activation='relu'))
    model.add(Dense(units=units3, activation='relu'))
    model.add(Dense(units=1, activation='sigmoid'))
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy',recall])
    return model

In [None]:
# Parámetros que se probarán
# Parameters for testing
parameters = {
    'units1': [8, 16, 32],
    'units2': [16, 32, 64],
    'units3': [32, 64, 128],
    'batch_size': [10, 50, 100, 250, 500],
    'epochs': [3, 5, 7, 9],
    'optimizer': ['adam', 'rmsprop', 'sgd']
}

In [None]:
# Código para encontrar los mejores hiper parámetros de la red neuronal
# Find best hyperparameters for the DNN

# Creamos modelo de kerasclassifier, usando la funcion sequential_testing ta creada / Create a keras classifier model, with the sequential_testing function
model = KerasClassifier(build_fn=sequential_testing, verbose=0)

# Pasamos el modelo dentro de un randomizedsearch, y la lista de los parámetros. Tomará 10 iteraciones random, diviendo nuestro espacio de trabajo en 3
# We pass the model into a randomizedsearch, with the following parameters, we take 10 random iterations and divide our dataspace into 3
random_search = RandomizedSearchCV(estimator=model, param_distributions=parameters, n_iter=10, cv=3, verbose=2, random_state=8, n_jobs=-1)
random_search_result = random_search.fit(X_train, y_train)

# Mostrar el historial del mejor modelo
# We then show the best scores and parameters
print(f'Best Score: {random_search_result.best_score_}')
print(f'Best Params: {random_search_result.best_params_}')

# Capturar y mostrar loss y recall del mejor modelo
# For the best model, we will then show the loss and recall
best_model = random_search_result.best_estimator_.model
history = best_model.history.history
print(f"Training Loss: {history['loss'][-1]}")
print(f"Training Recall: {history['recall'][-1]}")

**Se procede a comparar distintos resultados con elementos de Machine Learning 
Machine Learning methods**

In [None]:
# Lista de distintos métodos de aprendizaje automatico (diccionario)
# List of differents ML methods (dictionary)

model_list = {
    'Random Forest Classifier': RandomForestClassifier(),
    'AdaBoost Classifier': AdaBoostClassifier(),
    'Logistic Regression': LogisticRegression(),
    'XGBoost': XGBClassifier(),
    'Naive Bayes': GaussianNB(),
    'Support Vector Machines': SVC(),
}

In [None]:
# Clase que correrá e iterará por cada elemento de la lista anterior
# Class that will iterate between each ML method

class model_testing():
    def __init__(self, model_list):
        for model_name, initialize in model_list.items():
            model = initialize
            model.fit(X_train, y_train)
            prediction = model.predict(X_test)
            model_recall = recall_score(y_test, prediction)
            model_loss = log_loss(y_test, prediction)
            print(f'{model_name} loss: {model_loss}')
            print(f'{model_name} recall: {model_recall}')

results = model_testing(model_list)

In [None]:
# Crear modelo de Random Forest
importance_model = XGBClassifier()

# Entrenar el modelo
importance_model.fit(X, y)

# Obtener la importancia de características
feature_importance = importance_model.feature_importances_

# Crear un DataFrame para mostrar las importancias de las características
importance_df = pd.DataFrame({'Feature': X.columns, 'Importance': feature_importance})
importance_df = importance_df.sort_values(by='Importance', ascending=False)

print(importance_df.head(10))

La idea principal es ingresar nuevas encuestas con las preguntas de nuestros atributos, y dejar que nuestro algoritmo decida si la persona está (o debiera estar) bajo un tratamiento de salud mental. Hablé con esta actual encuesta con psicólogos y coinciden en que las encuestas que realizan, se relaciona bien con los parámetros de importancia que fueron seleccionados.

Con esto, tenemos dos opciones, seleccionando en primera instancia la Red Neuronal (con los parámetros establecidos como los mejores) tanto como XGBoost, ámbos análisis se pueden usar en conjunto para un mejor resultado.

The main idea is to input new surveys with this question, and let te algorithm check if the person is currently (or should be) in a mental health process/check. I talked about this survey with actual psychologists and told me that the algorithm is calculating fine the impact of the features.

With this, we now have two options, a Deep Neuronal Network, and a ML method XGBOOST, both ways can be use un conjuction for a better result.