
# Interactive Feature Importance by Category

This visualization represents the feature importance scores for different variables in our dataset, categorized into **Demographics, Body Measurements** (e.g., Weight, Height) ,** and **Lifestyle Factors** (e.g., Eating habits, Physical Activity, Technology Usage) . The interactive nature of the chart allows users to explore the relative impact of each feature on the model’s predictions.

## Key Insights
- **Weight** is the most influential feature with a score of **0.25**, followed by **Age (0.15)** and **Height (0.10)**.
- **Eating habits**, such as **FCVC (Frequency of Vegetable Consumption) and NCP (Number of Main Meals per Day)**, also play a crucial role.
- **other Lifestyle factors**, such as **CH2O (Daily Water Intake) and CALC_sometimes (Occasional Alcohol Consumption)**, contribute to the model but have lower importance scores.
- **Physical Activity (FAF - Physical Activity Frequency)** has a minor influence compared to other factors.

## Features and Categories
- **Demographics (Blue):** Age, Gender
- **Body measurement (Red):** Weight,Height
- **Lifestyle (green):** CH2O, TUE (Technology Use), CALC_sometimes,FAF,FCVC,NCP

## Interpretation
Understanding the importance of these features helps in identifying key factors influencing the outcome and allows for targeted interventions or recommendations. The interactive elements enable further exploration by hovering over bars for exact values.

---
**Created using Plotly for interactive visualization.**


[Interactive Feature Importance Chart](../images/interactive_feature_importance_by_category.html)


In [14]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.io as pio


# Load dataset
df = pd.read_csv("G:\DSI\Project\Team3_Estimation_of_Obesity_Levels\data\Cleaned_ObesityDataSet_raw_and_data_sinthetic.csv") 

df.sample(5)

Unnamed: 0,Gender,Age,Height,Weight,family_history_with_overweight,FAVC,FCVC,NCP,CAEC,SMOKE,CH2O,SCC,FAF,TUE,CALC,MTRANS,NObeyesdad
1487,Male,18.88061,1.80416,104.40682,yes,yes,2.0,3.0,Sometimes,no,3.0,no,2.2405,0.0,no,Public_Transportation,Obesity_Type_I
1188,Male,39.759575,1.792507,101.780099,yes,yes,2.33361,2.113575,Sometimes,no,2.504136,no,2.998981,1.0,Sometimes,Automobile,Obesity_Type_I
836,Male,28.825223,1.765874,82.045045,yes,yes,1.064162,3.98955,Sometimes,no,2.028426,no,0.81517,0.894678,Sometimes,Public_Transportation,Overweight_Level_I
244,Female,20.0,1.65,75.0,yes,yes,3.0,1.0,Sometimes,no,2.0,no,1.0,1.0,no,Public_Transportation,Overweight_Level_II
1664,Male,24.149036,1.824901,120.805715,yes,yes,2.225149,3.0,Sometimes,no,2.357978,no,1.943743,0.682128,Sometimes,Public_Transportation,Obesity_Type_II


In [15]:
# Define target variable
target = 'NObeyesdad'

# Define feature columns
num_features = ['Age', 'Height', 'Weight', 'FCVC', 'NCP', 'CH2O', 'FAF', 'TUE']
cat_features = ['Gender', 'family_history_with_overweight', 'FAVC', 'CAEC', 
                'SMOKE', 'SCC', 'CALC', 'MTRANS']

# Define X and y
X = df.drop(columns=[target])  # Features
y = df[target]  # Target

# Preprocessing for numerical data
num_transformer = Pipeline(steps=[
    ('scaler', StandardScaler())  # Normalize features
])

# Preprocessing for categorical data
cat_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore'))  # Encode categories
])

# Combine both transformations
preprocessor = ColumnTransformer(transformers=[
    ('num', num_transformer, num_features),
    ('cat', cat_transformer, cat_features)
])

In [16]:
# Define the model pipeline
model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(random_state=42))
])

# Train the model
model.fit(X, y)

# Get feature importances
importances = model.named_steps['classifier'].feature_importances_
feature_names = (
    num_features + 
    list(model.named_steps['preprocessor'].transformers_[1][1].named_steps['onehot'].get_feature_names_out(cat_features))
)

# Create DataFrame for feature importance
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': importances})
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)

# Display top features
print(feature_importance_df.head(10))

         Feature  Importance
2         Weight    0.285148
0            Age    0.092262
3           FCVC    0.086520
1         Height    0.082262
4            NCP    0.053154
5           CH2O    0.045313
7            TUE    0.044781
6            FAF    0.044349
8  Gender_Female    0.034467
9    Gender_Male    0.031502


In [17]:

# Sample Data with Descriptions
data = {
    'Feature': ['Weight', 'Age', 'FCVC', 'Height', 'Gender', 'NCP', 'CH2O', 'TUE', 'FAF', 'CALC_sometimes'],
    'Importance': [0.25, 0.15, 0.12, 0.10, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03],
    'Category': ['Body Measurements', 'Demographics', 'Lifestyle Factors', 'Body Measurements', 'Demographics',
                 'Lifestyle Factors', 'Lifestyle Factors', 'Lifestyle Factors', 'Lifestyle Factors', 'Lifestyle Factors'],
    'Description': [
        'Body weight, an indicator of health and body composition.',  # Weight
        'Age of the individual.',  # Age
        'Frequency of vegetable consumption (Eating Habit).',  # FCVC
        'Height, a measure of body stature.',  # Height
        'Biological gender of the individual.',  # Gender
        'Number of main meals per day (Eating Habit).',  # NCP
        'Daily water intake, essential for hydration and metabolism.',  # CH2O
        'Time spent using technology (e.g., screens, devices).',  # TUE
        'Frequency of physical activity or exercise.',  # FAF
        'Alcohol consumption frequency (e.g., sometimes, never).',  # CALC_sometimes
    ]
}

# Create DataFrame
feature_importance_df = pd.DataFrame(data)

# Create Interactive Bar Chart
fig = px.bar(
    feature_importance_df, 
    x='Importance', 
    y='Feature', 
    color='Category', 
    orientation='h', 
    title='Interactive Feature Importance by Category',
    labels={'Importance': 'Feature Importance Score', 'Feature': 'Features'},
    category_orders={'Category': ['Demographics', 'Body Measurements', 'Lifestyle Factors']},
    text_auto=True,  # Show values on bars
    hover_data={'Feature': True, 'Importance': True, 'Description': True}  # Add custom hover text
)

# Invert y-axis for better readability
fig.update_layout(yaxis={'categoryorder': 'total ascending'})

# Show the figure
fig.show()


In [19]:
fig.write_html("feature_importance_by_category.html")
