# AI Mental Fitness Tracker
-> A mental fitness tracker that uses AI to track your mental fitness and provide you with a score and tips to improve your mental fitness.

-> This notebook deals with the data collection and training of the model using different Machine Learning Algorithms to predict the mental fitness score.

### STEP 1: IMPORT THE NECESSARY LIBRARIES

In [1]:
#import necessary libraries
import pandas as pd #data processing, CSV file I/O (e.g. pd.read_csv)
import numpy as np #linear algebra
import matplotlib.pyplot as plt #plotting

### About the Dataset:
-> The dataset used is a merge of two datasets:
 namely
 * mental-and-substance-use-as-share-of-disease
 * prevalence-by-mental-and-substance-use-disorder

### STEP 2: READ THE DATA FROM THE CSV FILES AND MERGE THEM

##### Load and prepare data

In [2]:
# read and load the dataset
data1=pd.read_csv("/content/mental-and-substance-use-as-share-of-disease.csv")
data2=pd.read_csv("/content/prevalence-by-mental-and-substance-use-disorder.csv")

##### Checking Dataset: mental-and-substance-use-as-share-of-disease

In [3]:
# print the first 8 rows of the dataset
data1.head(8)

Unnamed: 0,Entity,Code,Year,DALYs (Disability-Adjusted Life Years) - Mental disorders - Sex: Both - Age: All Ages (Percent)
0,Afghanistan,AFG,1990,1.69667
1,Afghanistan,AFG,1991,1.734281
2,Afghanistan,AFG,1992,1.791189
3,Afghanistan,AFG,1993,1.776779
4,Afghanistan,AFG,1994,1.712986
5,Afghanistan,AFG,1995,1.738272
6,Afghanistan,AFG,1996,1.778098
7,Afghanistan,AFG,1997,1.781815


##### Checking Dataset: prevalence-by-mental-and-substance-use-disorder

In [4]:
# print the first 5 rows of the dataset
data2.head()

Unnamed: 0,Entity,Code,Year,Prevalence - Schizophrenia - Sex: Both - Age: Age-standardized (Percent),Prevalence - Bipolar disorder - Sex: Both - Age: Age-standardized (Percent),Prevalence - Eating disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Anxiety disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Drug use disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Depressive disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Alcohol use disorders - Sex: Both - Age: Age-standardized (Percent)
0,Afghanistan,AFG,1990,0.228979,0.721207,0.131001,4.835127,0.454202,5.125291,0.444036
1,Afghanistan,AFG,1991,0.22812,0.719952,0.126395,4.821765,0.447112,5.116306,0.44425
2,Afghanistan,AFG,1992,0.227328,0.718418,0.121832,4.801434,0.44119,5.106558,0.445501
3,Afghanistan,AFG,1993,0.226468,0.717452,0.117942,4.789363,0.435581,5.100328,0.445958
4,Afghanistan,AFG,1994,0.225567,0.717012,0.114547,4.784923,0.431822,5.099424,0.445779


#### MERGING TWO DATASETS

In [5]:
data = pd.merge(data1, data2)
# print the shape (rows, columns) of the dataset
print("New Dataframe Shape (Rows, Columns)=",data.shape)
# print the first 5 rows of the dataset
data.head()

New Dataframe Shape (Rows, Columns)= (6840, 11)


Unnamed: 0,Entity,Code,Year,DALYs (Disability-Adjusted Life Years) - Mental disorders - Sex: Both - Age: All Ages (Percent),Prevalence - Schizophrenia - Sex: Both - Age: Age-standardized (Percent),Prevalence - Bipolar disorder - Sex: Both - Age: Age-standardized (Percent),Prevalence - Eating disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Anxiety disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Drug use disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Depressive disorders - Sex: Both - Age: Age-standardized (Percent),Prevalence - Alcohol use disorders - Sex: Both - Age: Age-standardized (Percent)
0,Afghanistan,AFG,1990,1.69667,0.228979,0.721207,0.131001,4.835127,0.454202,5.125291,0.444036
1,Afghanistan,AFG,1991,1.734281,0.22812,0.719952,0.126395,4.821765,0.447112,5.116306,0.44425
2,Afghanistan,AFG,1992,1.791189,0.227328,0.718418,0.121832,4.801434,0.44119,5.106558,0.445501
3,Afghanistan,AFG,1993,1.776779,0.226468,0.717452,0.117942,4.789363,0.435581,5.100328,0.445958
4,Afghanistan,AFG,1994,1.712986,0.225567,0.717012,0.114547,4.784923,0.431822,5.099424,0.445779


In [6]:
# Set simplified column names
data = data.set_axis(['Country','Code','Year','DALY','Schizophrenia', 'Bipolar_disorder',
                  'Eating_disorder','Anxiety','Drug_usage','Depression','Alcohol'],
            axis='columns')#, copy=False)
data.head()

Unnamed: 0,Country,Code,Year,DALY,Schizophrenia,Bipolar_disorder,Eating_disorder,Anxiety,Drug_usage,Depression,Alcohol
0,Afghanistan,AFG,1990,1.69667,0.228979,0.721207,0.131001,4.835127,0.454202,5.125291,0.444036
1,Afghanistan,AFG,1991,1.734281,0.22812,0.719952,0.126395,4.821765,0.447112,5.116306,0.44425
2,Afghanistan,AFG,1992,1.791189,0.227328,0.718418,0.121832,4.801434,0.44119,5.106558,0.445501
3,Afghanistan,AFG,1993,1.776779,0.226468,0.717452,0.117942,4.789363,0.435581,5.100328,0.445958
4,Afghanistan,AFG,1994,1.712986,0.225567,0.717012,0.114547,4.784923,0.431822,5.099424,0.445779


### STEP 3: DATA CLEANING
Checking the merged dataset for null values and removing them.

In [7]:
data.isnull().sum()

Country               0
Code                690
Year                  0
DALY                  0
Schizophrenia         0
Bipolar_disorder      0
Eating_disorder       0
Anxiety               0
Drug_usage            0
Depression            0
Alcohol               0
dtype: int64

* As we can see, there are 690 null values in the Code Column of the dataset. We will drop the Code Column as we do not need it for our analysis.

In [8]:
# drop the Code Column from the dataset
data.drop('Code',axis=1,inplace=True)

In [9]:
# View the first 5 rows of the dataset
data.head()

Unnamed: 0,Country,Year,DALY,Schizophrenia,Bipolar_disorder,Eating_disorder,Anxiety,Drug_usage,Depression,Alcohol
0,Afghanistan,1990,1.69667,0.228979,0.721207,0.131001,4.835127,0.454202,5.125291,0.444036
1,Afghanistan,1991,1.734281,0.22812,0.719952,0.126395,4.821765,0.447112,5.116306,0.44425
2,Afghanistan,1992,1.791189,0.227328,0.718418,0.121832,4.801434,0.44119,5.106558,0.445501
3,Afghanistan,1993,1.776779,0.226468,0.717452,0.117942,4.789363,0.435581,5.100328,0.445958
4,Afghanistan,1994,1.712986,0.225567,0.717012,0.114547,4.784923,0.431822,5.099424,0.445779


##### Observe the data types of the columns

In [10]:
# Observe the data types of the columns
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6840 entries, 0 to 6839
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Country           6840 non-null   object 
 1   Year              6840 non-null   int64  
 2   DALY              6840 non-null   float64
 3   Schizophrenia     6840 non-null   float64
 4   Bipolar_disorder  6840 non-null   float64
 5   Eating_disorder   6840 non-null   float64
 6   Anxiety           6840 non-null   float64
 7   Drug_usage        6840 non-null   float64
 8   Depression        6840 non-null   float64
 9   Alcohol           6840 non-null   float64
dtypes: float64(8), int64(1), object(1)
memory usage: 587.8+ KB


In [11]:
df1 = data['DALY'].groupby(data["Country"])
df1.mean()

Country
Afghanistan                       2.553085
African Region (WHO)              1.940398
Albania                           5.276702
Algeria                           6.451224
American Samoa                    4.529481
                                    ...   
World Bank Lower Middle Income    3.207812
World Bank Upper Middle Income    5.006917
Yemen                             3.470172
Zambia                            1.664278
Zimbabwe                          1.743918
Name: DALY, Length: 228, dtype: float64

### STEP 5: DATA PREPROCESSING

##### Making a copy of the dataset for preprocessing for use in testing and analysis of different models.

In [12]:
df = data.copy()
df.head()

Unnamed: 0,Country,Year,DALY,Schizophrenia,Bipolar_disorder,Eating_disorder,Anxiety,Drug_usage,Depression,Alcohol
0,Afghanistan,1990,1.69667,0.228979,0.721207,0.131001,4.835127,0.454202,5.125291,0.444036
1,Afghanistan,1991,1.734281,0.22812,0.719952,0.126395,4.821765,0.447112,5.116306,0.44425
2,Afghanistan,1992,1.791189,0.227328,0.718418,0.121832,4.801434,0.44119,5.106558,0.445501
3,Afghanistan,1993,1.776779,0.226468,0.717452,0.117942,4.789363,0.435581,5.100328,0.445958
4,Afghanistan,1994,1.712986,0.225567,0.717012,0.114547,4.784923,0.431822,5.099424,0.445779


In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6840 entries, 0 to 6839
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Country           6840 non-null   object 
 1   Year              6840 non-null   int64  
 2   DALY              6840 non-null   float64
 3   Schizophrenia     6840 non-null   float64
 4   Bipolar_disorder  6840 non-null   float64
 5   Eating_disorder   6840 non-null   float64
 6   Anxiety           6840 non-null   float64
 7   Drug_usage        6840 non-null   float64
 8   Depression        6840 non-null   float64
 9   Alcohol           6840 non-null   float64
dtypes: float64(8), int64(1), object(1)
memory usage: 587.8+ KB


##### Label Encoding the Country Column (Categorical Values) to labels as it is a label and not for analysis.

In [14]:
from sklearn.preprocessing import LabelEncoder
l=LabelEncoder()
for i in df.columns:
    if df[i].dtype == 'object':
        df[i]=l.fit_transform(df[i])

country_dict = dict(zip(l.classes_, range(len(l.classes_))))

##### Testing Part for generating a country thesaurus

In [15]:
l.classes_

array(['Afghanistan', 'African Region (WHO)', 'Albania', 'Algeria',
       'American Samoa', 'Andorra', 'Angola', 'Antigua and Barbuda',
       'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan',
       'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus',
       'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Brunei',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Cape Verde', 'Central African Republic', 'Chad',
       'Chile', 'China', 'Colombia', 'Comoros', 'Congo', 'Cook Islands',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Cyprus',
       'Czechia', 'Democratic Republic of Congo', 'Denmark', 'Djibouti',
       'Dominica', 'Dominican Republic', 'East Asia & Pacific (WB)',
       'Eastern Mediterranean Region (WHO)', 'Ecuador', 'Egypt',
       'El Salvador', 'England', 'Equatorial Guinea', 'Eritrea',
       'Estonia', 'Eswatini', 'Ethiopia', 'Eu

In [16]:
country_dict = dict(zip(l.classes_, range(len(l.classes_))))

In [17]:
country_dict["India"]

88

In [18]:
df.head(100)

Unnamed: 0,Country,Year,DALY,Schizophrenia,Bipolar_disorder,Eating_disorder,Anxiety,Drug_usage,Depression,Alcohol
0,0,1990,1.696670,0.228979,0.721207,0.131001,4.835127,0.454202,5.125291,0.444036
1,0,1991,1.734281,0.228120,0.719952,0.126395,4.821765,0.447112,5.116306,0.444250
2,0,1992,1.791189,0.227328,0.718418,0.121832,4.801434,0.441190,5.106558,0.445501
3,0,1993,1.776779,0.226468,0.717452,0.117942,4.789363,0.435581,5.100328,0.445958
4,0,1994,1.712986,0.225567,0.717012,0.114547,4.784923,0.431822,5.099424,0.445779
...,...,...,...,...,...,...,...,...,...,...
95,3,1995,5.192361,0.261108,0.793378,0.195046,4.857070,0.367362,4.398319,0.439044
96,3,1996,5.460333,0.261122,0.793524,0.195062,4.858773,0.371340,4.392500,0.439044
97,3,1997,5.505331,0.261182,0.793696,0.194193,4.861148,0.374676,4.389075,0.439293
98,3,1998,5.729461,0.261284,0.793916,0.193329,4.864047,0.378961,4.384181,0.439626


##### Bifurcating the dataset into features and target and further splitting training and testing sets

In [19]:
# Split the data into features and target
X = df.drop('DALY',axis=1)
y = df['DALY']

# Split the data into training and testing sets
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.2, random_state=2)

## APPLYING MACHINE LEARNING ALGORITHMS

### STEP 6: Fit the Custom CNN Regressor Model and evaluate its performance.

TODO: to convert 1d file for conv1d use this https://stackoverflow.com/questions/59756806/tensorflow-how-do-i-use-make-a-convolutional-layer-for-tabular-1-d-features

In [33]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv1D, Dropout, Input, Lambda, Flatten
from keras import backend as B

model = Sequential()

model.add(Dense(64, input_dim=9, kernel_initializer='normal', activation='relu'))
model.add(Lambda(lambda x: B.expand_dims(x, axis=-1)))
model.add(Conv1D(64, 3,kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.4))
model.add(Conv1D(128, 3, kernel_initializer='normal', activation='tanh'))
model.add(Dropout(0.3))
model.add(Conv1D(128, 3, kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(128, kernel_initializer='normal', activation='tanh'))
model.add(Flatten())
model.add(Dense(1, kernel_initializer='normal'))

model.compile(loss='mean_squared_error', optimizer='adam')

model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_6 (Dense)             (None, 64)                640       
                                                                 
 lambda_2 (Lambda)           (None, 64, 1)             0         
                                                                 
 conv1d_6 (Conv1D)           (None, 62, 64)            256       
                                                                 
 dropout_6 (Dropout)         (None, 62, 64)            0         
                                                                 
 conv1d_7 (Conv1D)           (None, 60, 128)           24704     
                                                                 
 dropout_7 (Dropout)         (None, 60, 128)           0         
                                                                 
 conv1d_8 (Conv1D)           (None, 58, 128)          

In [34]:
# Fitting the CNN to the Training set
history = model.fit(xtrain, ytrain, epochs = 1000, validation_split=0.2, verbose=1)

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
E

In [35]:
from datetime import datetime
now = datetime.now()
dt_string = now.strftime("%d-%m-%Y %H:%M:%S")
model.save("custom_cnn_"+dt_string+".h5")

  saving_api.save_model(


In [36]:
model.evaluate(xtest,ytest,verbose=1)



1.3201212882995605

###TEST BELOW

In [37]:
from sklearn.metrics import mean_squared_error, r2_score
# model evaluation for training set
ytrain_pred = model.predict(xtrain)
mse = mean_squared_error(ytrain, ytrain_pred)
rmse = (np.sqrt(mean_squared_error(ytrain, ytrain_pred)))
r2 = r2_score(ytrain, ytrain_pred)

print("The model performance for training set")
print("--------------------------------------")
print('MSE is {}'.format(mse))
print('RMSE is {}'.format(rmse))
print('R2 score is {}'.format(r2))
print("\n")

# model evaluation for testing set
ytest_pred = model.predict(xtest)
mse = mean_squared_error(ytest, ytest_pred)
rmse = (np.sqrt(mean_squared_error(ytest, ytest_pred)))
r2 = r2_score(ytest, ytest_pred)

print("The model performance for testing set")
print("--------------------------------------")
print('MSE is {}'.format(mse))
print('RMSE is {}'.format(rmse))
print('R2 score is {}'.format(r2))

The model performance for training set
--------------------------------------
MSE is 1.5348830076090596
RMSE is 1.2389039541502238
R2 score is 0.7143538753309504


The model performance for testing set
--------------------------------------
MSE is 1.3201212889705671
RMSE is 1.1489653123443575
R2 score is 0.7255709324619115


### STEP 10: PROGRAM to PREDICT Disability-Adjusted Life Years (Loss in Life Expectancy).

In [64]:
user_data = np.asarray(user_data).astype(np.float32)
user_data

array([[8.800000e+01, 2.019000e+03, 2.836390e-01, 3.596770e-01,
        8.717300e-02, 3.019089e+00, 4.398300e-01, 4.036884e+00,
        1.618483e+00]], dtype=float32)

In [73]:
print("Welcome to Mental Fitness Tracker!\n",
      "Fill the details (in %) to check your mental fitness! \n")

# Take user inputs
country = input("Enter the Country: ").lower().title()
year = input("Enter Year : ")
Schizophrenia = float(input("Enter the Schizophrenia: "))
Bipolar_disorder = float(input("Enter the Bipolar_disorder: "))
Eating_disorder = float(input("Enter the Eating_disorder: "))
Anxiety = float(input("Enter the Anxiety: "))
Drug_usage = float(input("Enter the Drug_usage: "))
Depression = float(input("Enter the Depression: "))
Alcohol = float(input("Enter the Alcohol: "))

# Selection of relevant features for optimum results
select = ["Country","year","Schizophrenia","Bipolar_disorder","Eating_disorder","Anxiety","Drug_usage","Depression","Alcohol"]
user_data = [country_dict[country],year,Schizophrenia,Bipolar_disorder,Eating_disorder,Anxiety,Drug_usage,Depression,Alcohol]
user_data = pd.DataFrame([user_data], columns=select)
user_data = np.asarray(user_data).astype(np.float32)

Welcome to Mental Fitness Tracker!
 Fill the details (in %) to check your mental fitness! 

Enter the Country: India
Enter Year : 2019
Enter the Schizophrenia: 0.283639
Enter the Bipolar_disorder: 0.359677
Enter the Eating_disorder: 0.087173
Enter the Anxiety: 3.019089
Enter the Drug_usage: 0.439830
Enter the Depression: 4.036884
Enter the Alcohol: 1.618483


In [74]:
# Predict the target(DALY) on user data using the best model
y_pred = model.predict(user_data)
# print(forest_y_pred)
print("Your Mental Fitness Slack Score is {:.2f}".format(list(y_pred)[0][0]))
# Test Inputs
# India 2019 0.283639	0.359677	0.087173	3.019089	0.439830	4.036884	1.618483
# Expected Mental Fitness Slack Score is 2.6

Your Mental Fitness Slack Score is 2.66


### Result:
* The model predicts the Disability-Adjusted Life Years (Loss in Life Expectancy) based on the user inputs.