<h1 style='color: green;'><center>Obesity Classification DataSet</center></h1>

<em><p>The Obesity Classification project uses deep learning to identify obesity levels based on input features such as Gender, Age, Weight, Height, and lifestyle habits. By leveraging neural networks, the model predicts obesity categories and provides personalized healthy tips like balanced diet, regular exercise, and portion control to reduce obesity.</p></em>

<h4>Here are brief explanations for each feature in the obesity classification model dataset:</h4>

- Gender: The biological sex of the individual (Male/Female).
- Age: The age of the individual in years.
- Height: The height of the individual in meters.
- Weight: The weight of the individual in kilograms.
- family_history_with_overweight: Indicates if the individual has a family history of overweight issues (Yes/No).
- FAVC: Frequency of high-calorie food consumption (Yes/No).
- FCVC: Frequency of vegetable consumption on a scale.
- NCP: Number of main meals consumed per day.
- CAEC: Frequency of food consumption between meals.
- SMOKE: Indicates if the individual smokes (Yes/No).
- CH2O: Daily water consumption in liters.
- SCC: Frequency of calorie consumption monitoring (Yes/No).
- FAF: Frequency of physical activity per week.
- TUE: Time spent using electronic devices daily.
- CALC: Frequency of alcohol consumption.
- MTRANS: Primary mode of transportation (e.g., Walking, Public Transport).
- NObeyesdad: Obesity classification label of the individual.

In [5]:
## Analyzing Dataset 

## Importing some important library

import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns
import warnings 
warnings.filterwarnings('ignore')

In [16]:
df=pd.read_csv('ObesityDataSet_raw_and_data_sinthetic.csv')
df.head()

Unnamed: 0,Gender,Age,Height,Weight,family_history_with_overweight,FAVC,FCVC,NCP,CAEC,SMOKE,CH2O,SCC,FAF,TUE,CALC,MTRANS,NObeyesdad
0,Female,21.0,1.62,64.0,yes,no,2.0,3.0,Sometimes,no,2.0,no,0.0,1.0,no,Public_Transportation,Normal_Weight
1,Female,21.0,1.52,56.0,yes,no,3.0,3.0,Sometimes,yes,3.0,yes,3.0,0.0,Sometimes,Public_Transportation,Normal_Weight
2,Male,23.0,1.8,77.0,yes,no,2.0,3.0,Sometimes,no,2.0,no,2.0,1.0,Frequently,Public_Transportation,Normal_Weight
3,Male,27.0,1.8,87.0,no,no,3.0,3.0,Sometimes,no,2.0,no,2.0,0.0,Frequently,Walking,Overweight_Level_I
4,Male,22.0,1.78,89.8,no,no,2.0,1.0,Sometimes,no,2.0,no,0.0,0.0,Sometimes,Public_Transportation,Overweight_Level_II


In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2111 entries, 0 to 2110
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Gender                          2111 non-null   object 
 1   Age                             2111 non-null   float64
 2   Height                          2111 non-null   float64
 3   Weight                          2111 non-null   float64
 4   family_history_with_overweight  2111 non-null   object 
 5   FAVC                            2111 non-null   object 
 6   FCVC                            2111 non-null   float64
 7   NCP                             2111 non-null   float64
 8   CAEC                            2111 non-null   object 
 9   SMOKE                           2111 non-null   object 
 10  CH2O                            2111 non-null   float64
 11  SCC                             2111 non-null   object 
 12  FAF                             21

In [18]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,2111.0,24.3126,6.345968,14.0,19.947192,22.77789,26.0,61.0
Height,2111.0,1.701677,0.093305,1.45,1.63,1.700499,1.768464,1.98
Weight,2111.0,86.586058,26.191172,39.0,65.473343,83.0,107.430682,173.0
FCVC,2111.0,2.419043,0.533927,1.0,2.0,2.385502,3.0,3.0
NCP,2111.0,2.685628,0.778039,1.0,2.658738,3.0,3.0,4.0
CH2O,2111.0,2.008011,0.612953,1.0,1.584812,2.0,2.47742,3.0
FAF,2111.0,1.010298,0.850592,0.0,0.124505,1.0,1.666678,3.0
TUE,2111.0,0.657866,0.608927,0.0,0.0,0.62535,1.0,2.0


In [None]:
## Since the target varibale have 7 classes, so its is multi-class classification problem.
df['NObeyesdad'].unique()

array(['Normal_Weight', 'Overweight_Level_I', 'Overweight_Level_II',
       'Obesity_Type_I', 'Insufficient_Weight', 'Obesity_Type_II',
       'Obesity_Type_III'], dtype=object)

In [22]:
## By using sklearn, we converting object typed column into numerical column.

from sklearn.preprocessing import LabelEncoder

## Selecting all columns which have type = object.
cat_col=[col for col in df.columns if df[col].dtype == 'object']

encoder=LabelEncoder()
for col in cat_col:
    
    df[col]=encoder.fit_transform(df[col])

In [21]:
## 
df.head()

Unnamed: 0,Gender,Age,Height,Weight,family_history_with_overweight,FAVC,FCVC,NCP,CAEC,SMOKE,CH2O,SCC,FAF,TUE,CALC,MTRANS,NObeyesdad
0,0,21.0,1.62,64.0,1,0,2.0,3.0,2,0,2.0,0,0.0,1.0,3,3,1
1,0,21.0,1.52,56.0,1,0,3.0,3.0,2,1,3.0,1,3.0,0.0,2,3,1
2,1,23.0,1.8,77.0,1,0,2.0,3.0,2,0,2.0,0,2.0,1.0,1,3,1
3,1,27.0,1.8,87.0,0,0,3.0,3.0,2,0,2.0,0,2.0,0.0,1,4,5
4,1,22.0,1.78,89.8,0,0,2.0,1.0,2,0,2.0,0,0.0,0.0,2,3,6


<h2>Encoder Dictionary Representation</h2>

<table>
    <thead>
        <tr>
            <th>Column</th>
            <th>Category</th>
            <th>Encoded Value</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td rowspan="2">Gender</td>
            <td>Female</td>
            <td>0</td>
        </tr>
        <tr>
            <td>Male</td>
            <td>1</td>
        </tr>

 <tr>
            <td rowspan="2">family_history_with_overweight</td>
            <td>no</td>
            <td>0</td>
        </tr>
        <tr>
            <td>yes</td>
            <td>1</td>
        </tr>
        <tr>
            <td rowspan="2">FAVC</td>
            <td>no</td>
            <td>0</td>
        </tr>
        <tr>
            <td>yes</td>
            <td>1</td>
        </tr>

<tr>
            <td rowspan="4">CAEC</td>
            <td>Always</td>
            <td>0</td>
        </tr>
        <tr>
            <td>Frequently</td>
            <td>1</td>
        </tr>
        <tr>
            <td>Sometimes</td>
            <td>2</td>
        </tr>
        <tr>
            <td>no</td>
            <td>3</td>
        </tr>

<tr>
            <td rowspan="2">SMOKE</td>
            <td>no</td>
            <td>0</td>
        </tr>
        <tr>
            <td>yes</td>
            <td>1</td>
        </tr>

 <tr>
            <td rowspan="2">SCC</td>
            <td>no</td>
            <td>0</td>
        </tr>
        <tr>
            <td>yes</td>
            <td>1</td>
        </tr>

 <tr>
            <td rowspan="4">CALC</td>
            <td>Always</td>
            <td>0</td>
        </tr>
        <tr>
            <td>Frequently</td>
            <td>1</td>
        </tr>
        <tr>
            <td>Sometimes</td>
            <td>2</td>
        </tr>
        <tr>
            <td>no</td>
            <td>3</td>
        </tr>

<tr>
            <td rowspan="5">MTRANS</td>
            <td>Automobile</td>
            <td>0</td>
        </tr>
        <tr>
            <td>Bike</td>
            <td>1</td>
        </tr>
        <tr>
            <td>Motorbike</td>
            <td>2</td>
        </tr>
        <tr>
            <td>Public_Transportation</td>
            <td>3</td>
        </tr>
        <tr>
            <td>Walking</td>
            <td>4</td>
        </tr>

<tr>
            <td rowspan="7">NObeyesdad</td>
            <td>Insufficient_Weight</td>
            <td>0</td>
        </tr>
        <tr>
            <td>Normal_Weight</td>
            <td>1</td>
        </tr>
        <tr>
            <td>Obesity_Type_I</td>
            <td>2</td>
        </tr>
        <tr>
            <td>Obesity_Type_II</td>
            <td>3</td>
        </tr>
        <tr>
            <td>Obesity_Type_III</td>
            <td>4</td>
        </tr>
        <tr>
            <td>Overweight_Level_I</td>
            <td>5</td>
        </tr>
        <tr>
            <td>Overweight_Level_II</td>
            <td>6</td>
        </tr>
    </tbody>
</table>

In [25]:
## Saving the clean dataset file 
import os 

file_name='Clean_Raw_Data'
data_folder='../Data/'

if not os.path.exists(data_folder):
    os.makedirs(data_folder)



df.to_csv(os.path.join(data_folder, file_name))