# Obesity Detection

## About Data

This dataset contains information about the obesity classification of individuals. The data was collected from a variety of sources, including medical records, surveys, and self-reported data. The goal is to analyze and classify individuals into different obesity categories using the provided data.

## Source

This data is available in Kaggele in the following link:
> https://www.kaggle.com/datasets/sujithmandala/obesity-classification-dataset/data


## Data Dictionary

* **ID**: A unique identifier for each individual. It contains numeric data.
* **Age**: The age of the individual. It contains numeric data.
* **Gender**: The gender of the individual. It contains categotical binary data.(Male, Female)
* **Height**: The height of the individual in centimeters(cm). It contains numeric data.
* **Weight**: The weight of the individual in kilograms(KG.). It contains numeric data.
* **BMI**: The body mass index of the individual, calculated as weight divided by height squared. It contains numeric data.
* **Label**: The obesity classification of the individual. This is the target variable. (Normal Weight, Overweight, Obese, Underweight)

## Problem Statement

1. **Feature Engineering**: The objective of feature engineering is to encode the categorical features into numerical using an appropriate encoding technique.
2. **Feature Selection**: The objective of feature selection is to select the most significant features for detecting the level of obesity.

### Load Necessary Libraries

In [1]:
# General Libraries
import pandas as pd
import numpy as np
import warnings
import os

### Settings

In [4]:
# Warning setting
warnings.filterwarnings("ignore")

# Datapath setting
data_path = "../data"
file_path = os.path.join(data_path, "obesity_classification_cleaned.csv")

### Load Data

In [5]:
df = pd.read_csv(file_path)

In [6]:
# Check data
df.head()

Unnamed: 0,Age,Gender,Height,Weight,BMI,Label
0,25,Male,175,80,25.3,Normal Weight
1,30,Female,160,60,22.5,Normal Weight
2,35,Male,180,90,27.3,Overweight
3,40,Female,150,50,20.0,Underweight
4,45,Male,190,100,31.2,Obese


### Feature Engineering

In [7]:
# Encode the categorical features so that they can be used in training the model

# Encode Gender by replacing male with 1 and female with 0
df["Gender"] = df["Gender"].map({"Male": 1, "Female": 0})

# Encode Label by replacing Underweight with 0, Normal Weight with 1, Overweight with 2 and Obese with 3
df["Label"] = df["Label"].replace({"Underweight": 0, "Normal Weight": 1, "Overweight": 2, "Obese": 3})

In [8]:
# Sanity check
df.head()

Unnamed: 0,Age,Gender,Height,Weight,BMI,Label
0,25,1,175,80,25.3,1
1,30,0,160,60,22.5,1
2,35,1,180,90,27.3,2
3,40,0,150,50,20.0,0
4,45,1,190,100,31.2,3


In [9]:
# Save encoded data
file_path = os.path.join(data_path, "obesity_classification_encoded.csv")
df.to_csv(file_path, index= False)