Multi-Class Prediction of Obesity Risk

This project notebook leverages machine learning models to classify individuals based on obesity risk levels. The classification is built upon a Kaggle dataset containing various health and lifestyle attributes.

Dataset

Source: Obesity or CVD Risk - Classify/Regressor/Cluster Dataset: https://www.kaggle.com/datasets/aravindpcoder/obesity-or-cvd-risk-classifyregressorcluster
Attributes:
- Eating habits: Frequency of high-calorie food consumption (FAVC), vegetable consumption (FCVC), meal frequency (NCP), snacking habits (CAEC), water intake (CH20), alcohol consumption (CALC).
- Physical condition: Calories monitoring (SCC), physical activity frequency (FAF), technology use time (TUE), transportation type (MTRANS).
- Demographic data: Gender, age, height, and weight.
Target Variable: Obesity levels defined as follows: -Underweight (<18.5 BMI) -Normal (18.5–24.9 BMI) -Overweight (25.0–29.9 BMI) -Obesity I (30.0–34.9 BMI) -Obesity II (35.0–39.9 BMI) -Obesity III (>40 BMI)

Project Workflow

Data Exploration:
- Basic statistics and data insights (e.g., mean, median).
- Data quality checks, including handling missing values and data types.
Data Visualization:
- Visualization of key features to reveal potential patterns, correlations, and class imbalances.
- Use of histograms, scatter plots, and correlation heatmaps to illustrate relationships between health metrics and risk factors.
Data Manipulation:
- Data distribution and outliers detection.
- Data transformations manually to reduce ouliers and improve subsequent interpretation by the models.
Feature Engineering:
- Encoding categorical features and scaling numerical features using scikit-learn
- Feature importance analysis to understand the impact of each variable on the predictions (Importance permutation).
Modeling:
- Various machine learning models are evaluated, including:
  - XGBoost
  - LightGBM
  - Deep Learning Model with TensorFlow and Keras
- Hyperparameter tuning is performed with Optuna for optimization.
- Probability thresholding is applied to optimize classification outcomes, adjusting the threshold for each class to improve sensitivity or specificity depending on the model's predictions.
Model Evaluation:
- Evaluation metrics such as accuracy and AUC are used.
- ROC curve analysis to assess the model performance.

Requirements

Python (>= 3.7) Libraries: pandas, numpy, seaborn, matplotlib, scikit-learn, xgboost, lightgbm, tensorflow, optuna & keras_tuner

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
obesity-risk-notebook.ipynb		obesity-risk-notebook.ipynb
readme.md		readme.md
submission.csv		submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Class Prediction of Obesity Risk

Dataset

Project Workflow

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Class Prediction of Obesity Risk

Dataset

Project Workflow

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages