# Obesity Detection

## About Data

This dataset contains information about the obesity classification of individuals. The data was collected from a variety of sources, including medical records, surveys, and self-reported data. The goal is to analyze and classify individuals into different obesity categories using the provided data.

## Source

This data is available in Kaggele in the following link:
> https://www.kaggle.com/datasets/sujithmandala/obesity-classification-dataset/data


## Data Dictionary

* **ID**: A unique identifier for each individual. It contains numeric data.
* **Age**: The age of the individual. It contains numeric data.
* **Gender**: The gender of the individual. It contains categotical binary data.(Male, Female)
* **Height**: The height of the individual in centimeters(cm). It contains numeric data.
* **Weight**: The weight of the individual in kilograms(KG.). It contains numeric data.
* **BMI**: The body mass index of the individual, calculated as weight divided by height squared. It contains numeric data.
* **Label**: The obesity classification of the individual. This is the target variable. (Normal Weight, Overweight, Obese, Underweight)

## Problem Statement

1. **Model Training**: The objective of model training is to train a machine learning model with this dataset so that it can be used to predict the class of obesity of a person.
2. **Model Evaluation**: The objective of model evaluation is to evaluate the performance of the trained model using different metrics such as accuracy, precision, recall and F1 score.
3. **Model Optimization**: The objective of model optimization is to fined the optimal model by tuning the hyperparameter of the model.

### Load Libraries

In [54]:
# General
import pandas as pd
import numpy as np
import warnings
import os
import pickle

# Preprocessing
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

# Models
from sklearn.tree import DecisionTreeClassifier

# Evaluation
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Hyperparameter tuning
from sklearn.model_selection import GridSearchCV

### Settings

In [57]:
# Warning
warnings.filterwarnings("ignore")

# Data file path
data_path = "../data"
file_path = os.path.join(data_path, "obesity_classification_encoded.csv")

# Model path
model_path = "../models"

### Load Data

In [4]:
df = pd.read_csv(file_path)

In [5]:
# Check Data
df.head()

Unnamed: 0,Age,Gender,Height,Weight,BMI,Label
0,25,1,175,80,25.3,1
1,30,0,160,60,22.5,1
2,35,1,180,90,27.3,2
3,40,0,150,50,20.0,0
4,45,1,190,100,31.2,3


In [6]:
# Separate Input and Target Features
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

In [15]:
# Split Train and Test data
X_train,X_test, y_train,y_test = train_test_split(X, y, test_size = 0.3, random_state=42)

### Scaling the Data

In [16]:
# Scale the data to standardize it
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

In [49]:
# Train the model and evaluate using metrics
def train_evaluate(model):
    # Train the model
    model.fit(X_train_s, y_train)

    # Predict train data
    y_train_pred = model.predict(X_train_s)
    # predict test data
    y_test_pred = model.predict(X_test_s)

    # Print train evaluation metrics
    print("=" * 70)
    print("EDVALUATION METRICS ON TRAIN DATA")
    print("=" * 70)
    print(f"Accuracy Score: {accuracy_score(y_train, y_train_pred): .2f}")
    print(f"Precision Score: {precision_score(y_train, y_train_pred, average='macro'): .2f}")
    print(f"Recall Score: {recall_score(y_train, y_train_pred, average='macro'): .2f}")
    print(f"F1 Score: {f1_score(y_train, y_train_pred, average='macro'): .2f}")
    # Print test evaluation metrics
    print("=" * 70)
    print("EDVALUATION METRICS ON TEST DATA")
    print("=" * 70)
    print(f"Accuracy Score: {accuracy_score(y_test, y_test_pred): .2f}")
    print(f"Precision Score: {precision_score(y_test, y_test_pred, average='macro'): .2f}")
    print(f"Recall Score: {recall_score(y_test, y_test_pred, average='macro'): .2f}")
    print(f"F1 Score: {f1_score(y_test, y_test_pred, average='macro'): .2f}")
    

In [50]:
# Train the Decision Tree model with train dataset and evaluate the model
dt = DecisionTreeClassifier()
train_evaluate(dt)

EDVALUATION METRICS ON TRAIN DATA
Accuracy Score:  1.00
Precision Score:  1.00
Recall Score:  1.00
F1 Score:  1.00
EDVALUATION METRICS ON TEST DATA
Accuracy Score:  1.00
Precision Score:  1.00
Recall Score:  1.00
F1 Score:  1.00


In [53]:
test = {
        "Age": 46,
        "Gender": 1,
        "Height": 184,
        "Weight": 87,
        "BMI": 28.1
}
test_arr = np.array([[test["Age"], test["Gender"], test["Height"], test["Weight"], test["BMI"]]])
tscaled = scaler.transform(test_arr)
p = dt.predict(tscaled)
print(p)

[2]


### Conclusion

* All metrics of this model is 100%.

### Save the Model

In [59]:
model_file = os.path.join(model_path, "obesity_detector.pkl")
with open(model_file, "wb") as file_model:
    pickle.dump(dt, file_model)