Skip to content

Latest commit

 

History

History
85 lines (62 loc) · 3.71 KB

README.md

File metadata and controls

85 lines (62 loc) · 3.71 KB

Obese-tree - Support Vector Machine (SVM) for Obesity Level Estimation

This README file provides an overview of the project that applies a Support Vector Machine (SVM) model to a dataset for estimating obesity levels based on eating habits and physical condition. The dataset contains various independent variables, including Gender, Age, Height, Weight, family_history_with_overweight, FAVC, FCVC, NCP, CAEC, Smoking, CH2O, SCC, FAF, TUE, CALC, Mode of Transport, and the dependent variable, Obesity Category.

Dataset Description

The dataset includes the following columns:

  • Gender: Gender of the individuals (Categorical: 'Female' or 'Male').
  • Age: Age of the individuals.
  • Height: Height of the individuals.
  • Weight: Weight of the individuals.
  • family_history_with_overweight: Family history of overweight (Categorical: 'yes' or 'no').
  • FAVC: Frequent consumption of high caloric food (Categorical: 'no' or 'yes').
  • FCVC: Frequency of consumption of vegetables.
  • NCP: Number of main meals.
  • CAEC: Consumption of food between meals (Categorical: 'Sometimes', 'Frequently', 'Always', or 'no').
  • Smoking: Smoking habits (Categorical: 'no' or 'yes').
  • CH2O: Daily water consumption.
  • SCC: Calories consumption monitoring (Categorical: 'no' or 'yes').
  • FAF: Physical activity frequency.
  • TUE: Time using technology devices.
  • CALC: Consumption of alcohol (Categorical: 'no', 'Sometimes', 'Frequently', or 'Always').
  • MTRANS: Mode of transportation (Categorical: 'Public_Transportation', 'Walking', 'Automobile', 'Motorbike', 'Bike').
  • Obesity Category: Dependent variable with categories: 'Normal_Weight', 'Overweight_Level_I', 'Overweight_Level_II', 'Obesity_Type_I', 'Insufficient_Weight', 'Obesity_Type_II', 'Obesity_Type_III'.

Data Preprocessing

  • The dataset was split into a training set and a test set.
  • Categorical data in columns such as 'Gender', 'family_history_with_overweight', 'FAVC', 'CAEC', 'Smoking', 'SCC', 'CALC', and 'MTRANS' were label encoded.

Feature Scaling

  • The dataset was standardized using StandardScaler to ensure that features had similar scales, which is important for SVM.

Model Training

  • An SVM model was trained with the linear kernel.
  • The random state was set to 0 for reproducibility.

Model Evaluation

  • A confusion matrix was generated to assess model performance.
[[56  0  0  0  0  0  0]
 [ 5 53  0  0  0  4  0]
 [ 0  0 75  2  0  0  1]
 [ 0  0  1 57  0  0  0]
 [ 0  0  0  0 63  0  0]
 [ 0  2  0  0  0 52  2]
 [ 0  0  0  0  0  2 48]]
  • Accuracy score: 0.9550827423167849

Classification Report

              precision    recall  f1-score   support

           0       0.92      1.00      0.96        56
           1       0.96      0.85      0.91        62
           2       0.99      0.96      0.97        78
           3       0.97      0.98      0.97        58
           4       1.00      1.00      1.00        63
           5       0.90      0.93      0.91        56
           6       0.94      0.96      0.95        50

    accuracy                           0.96       423
   macro avg       0.95      0.96      0.95       423
weighted avg       0.96      0.96      0.95       423

K-Fold Cross Validation

  • K-fold cross-validation was used, resulting in a mean accuracy of 94.20% and a standard deviation of 1.34%.

Grid Search for Hyperparameter Tuning

  • Grid Search was performed to find the best hyperparameters, yielding the following result:
    • Best Accuracy: 94.20%
    • Best Parameters: {'C': 1, 'kernel': 'linear'}

This project demonstrates the application of SVM for obesity level estimation, achieving a high level of accuracy and providing insights into the factors influencing obesity.