<h1 style='color: green;'><center>Obesity Classification DataSet</center></h1>

<em><p>The Obesity Classification project uses deep learning to identify obesity levels based on input features such as Gender, Age, Weight, Height, and lifestyle habits. By leveraging neural networks, the model predicts obesity categories and provides personalized healthy tips like balanced diet, regular exercise, and portion control to reduce obesity.</p></em>

<h4>Here are brief explanations for each feature in the obesity classification model dataset:</h4>

- Gender: The biological sex of the individual (Male/Female).
- Age: The age of the individual in years.
- Height: The height of the individual in meters.
- Weight: The weight of the individual in kilograms.
- family_history_with_overweight: Indicates if the individual has a family history of overweight issues (Yes/No).
- FAVC: Frequency of high-calorie food consumption (Yes/No).
- FCVC: Frequency of vegetable consumption on a scale.
- NCP: Number of main meals consumed per day.
- CAEC: Frequency of food consumption between meals.
- SMOKE: Indicates if the individual smokes (Yes/No).
- CH2O: Daily water consumption in liters.
- SCC: Frequency of calorie consumption monitoring (Yes/No).
- FAF: Frequency of physical activity per week.
- TUE: Time spent using electronic devices daily.
- CALC: Frequency of alcohol consumption.
- MTRANS: Primary mode of transportation (e.g., Walking, Public Transport).
- NObeyesdad: Obesity classification label of the individual.

In [3]:
## Analyzing Dataset 

## Importing some important library

import sqlite3 as sql 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns
from keras.src.utils import to_categorical
import warnings 
warnings.filterwarnings('ignore')

In [4]:
## connecting to database

conn=sql.connect('../DataBase/TrainingData.db')

query = "SELECT * FROM HealthData"
df=pd.read_sql_query(query, conn)


In [5]:
df.iloc[:,:-1]

Unnamed: 0,Gender,Age,Height,Weight,family_history_with_overweight,FAVC,FCVC,NCP,CAEC,SMOKE,CH2O,SCC,FAF,TUE,CALC,MTRANS
0,Female,21.000000,1.620000,64.000000,yes,no,2.0,3.0,Sometimes,no,2.000000,no,0.000000,1.000000,no,Public_Transportation
1,Female,21.000000,1.520000,56.000000,yes,no,3.0,3.0,Sometimes,yes,3.000000,yes,3.000000,0.000000,Sometimes,Public_Transportation
2,Male,23.000000,1.800000,77.000000,yes,no,2.0,3.0,Sometimes,no,2.000000,no,2.000000,1.000000,Frequently,Public_Transportation
3,Male,27.000000,1.800000,87.000000,no,no,3.0,3.0,Sometimes,no,2.000000,no,2.000000,0.000000,Frequently,Walking
4,Male,22.000000,1.780000,89.800000,no,no,2.0,1.0,Sometimes,no,2.000000,no,0.000000,0.000000,Sometimes,Public_Transportation
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2106,Female,20.976842,1.710730,131.408528,yes,yes,3.0,3.0,Sometimes,no,1.728139,no,1.676269,0.906247,Sometimes,Public_Transportation
2107,Female,21.982942,1.748584,133.742943,yes,yes,3.0,3.0,Sometimes,no,2.005130,no,1.341390,0.599270,Sometimes,Public_Transportation
2108,Female,22.524036,1.752206,133.689352,yes,yes,3.0,3.0,Sometimes,no,2.054193,no,1.414209,0.646288,Sometimes,Public_Transportation
2109,Female,24.361936,1.739450,133.346641,yes,yes,3.0,3.0,Sometimes,no,2.852339,no,1.139107,0.586035,Sometimes,Public_Transportation


In [3]:
df.columns

Index(['Gender', 'Age', 'Height', 'Weight', 'family_history_with_overweight',
       'FAVC', 'FCVC', 'NCP', 'CAEC', 'SMOKE', 'CH2O', 'SCC', 'FAF', 'TUE',
       'CALC', 'MTRANS', 'NObeyesdad'],
      dtype='object')

In [4]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2111 entries, 0 to 2110
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Gender                          2111 non-null   object 
 1   Age                             2111 non-null   float64
 2   Height                          2111 non-null   float64
 3   Weight                          2111 non-null   float64
 4   family_history_with_overweight  2111 non-null   object 
 5   FAVC                            2111 non-null   object 
 6   FCVC                            2111 non-null   float64
 7   NCP                             2111 non-null   float64
 8   CAEC                            2111 non-null   object 
 9   SMOKE                           2111 non-null   object 
 10  CH2O                            2111 non-null   float64
 11  SCC                             2111 non-null   object 
 12  FAF                             21

In [5]:
## Since the target varibale have 7 classes, so its is multi-class classification problem.
df['NObeyesdad'].unique()

array(['Normal_Weight', 'Overweight_Level_I', 'Overweight_Level_II',
       'Obesity_Type_I', 'Insufficient_Weight', 'Obesity_Type_II',
       'Obesity_Type_III'], dtype=object)

In [14]:
weight_category_rank = {
    'Normal_Weight': 0,
    'Insufficient_Weight': 1,
    'Overweight_Level_I': 2,
    'Overweight_Level_II': 3,
    'Obesity_Type_I': 4,
    'Obesity_Type_II': 5,
    'Obesity_Type_III': 6
}

y = df['NObeyesdad'].map(weight_category_rank)
y.shape

(2111,)

In [None]:
y= to_categorical(y, 7)


array([[[[[[0., 1., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           ...,
           [1., 0., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.]],

          [[1., 0., 0., ..., 0., 0., 0.],
           [0., 1., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           ...,
           [1., 0., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.]],

          [[0., 1., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           ...,
           [1., 0., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.]],

          ...,

          [[0., 1., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           [1., 0., 0., ..., 0., 0., 0.],
           ...,
           [1., 0., 0., ..., 0.,

In [None]:
## ranking output feature to convert into numeric values.

weight_category_rank = {
    'Normal_Weight': 1,
    'Insufficient_Weight': 2,
    'Overweight_Level_I': 3,
    'Overweight_Level_II': 4,
    'Obesity_Type_I': 5,
    'Obesity_Type_II': 6,
    'Obesity_Type_III': 7
}

In [14]:
cat_col = [col for col in df.columns if df[col].dtype == 'object']

for col in cat_col:
    print(col)
    print('Unique values: -',df[col].unique())
    print('\n')

Gender
Unique values: - ['Female' 'Male']


family_history_with_overweight
Unique values: - ['yes' 'no']


FAVC
Unique values: - ['no' 'yes']


CAEC
Unique values: - ['Sometimes' 'Frequently' 'Always' 'no']


SMOKE
Unique values: - ['no' 'yes']


SCC
Unique values: - ['no' 'yes']


CALC
Unique values: - ['no' 'Sometimes' 'Frequently' 'Always']


MTRANS
Unique values: - ['Public_Transportation' 'Walking' 'Automobile' 'Motorbike' 'Bike']


NObeyesdad
Unique values: - ['Normal_Weight' 'Overweight_Level_I' 'Overweight_Level_II'
 'Obesity_Type_I' 'Insufficient_Weight' 'Obesity_Type_II'
 'Obesity_Type_III']




- As we can see that our data set have three type of data, numerical (e.g; float or int), nominal and ordinal features.
- So we try to transform the above data set. For ordinal feature we use OrdinalEncoding and for nominal features we use One-Hot Encoder and for traget variable we use LabelEncoder.
- For any missing values, we use SimpleImputer, median for numerical and most frequent for categorical features.

In [16]:
## Defining the numerical, ordinal and nominal features.

num_cols = [col for col in df.columns if df[col].dtype != 'object']

nominal_cols = ['Gender', 'family_history_with_overweight', 'FAVC', 'SMOKE', 'SCC', 'MTRANS']

ordinal_cols = [ 'CALC', 'CAEC']

target_col = ['NObeyesdad']

In [6]:
import numpy as np

# Example softmax output
softmax_output = np.array( [[8.5265714e-01, 1.0824224e-01, 3.9035656e-02, 6.4914253e-05, 2.5329070e-09,
  3.2961884e-09 ,2.6731266e-08]]

)

# Get the predicted class
predicted_class = np.argmax(softmax_output, axis=1)

print("Predicted Class:", predicted_class[0])

Predicted Class: 0
