# Bellabeat Case Study â€” Phase 2.5 (Optional)

### Notebook 04: Transform Data (Feature Engineering)

In this notebook, we will:
- Load the cleaned dataset (`bellabeat_clean.csv`)
- Add extra features for easier analysis:
  - Weekday / Weekend
  - Sleep hours
  - Total active minutes
  - Sedentary hours
  - Sleep efficiency
  - BMI categories
- Save the enhanced dataset as `bellabeat_analysis_ready.csv`

This step is optional but makes analysis richer for SQL, Power BI, and EDA.

In [None]:
import pandas as pd
import os

# Paths
PROCESSED_PATH = r"D:/Projects/Bellabeat/Data/Processed/"
INPUT_FILE = os.path.join(PROCESSED_PATH, "bellabeat_clean.csv")
OUTPUT_FILE = os.path.join(PROCESSED_PATH, "bellabeat_analysis_ready.csv")

# Load dataset
df = pd.read_csv(INPUT_FILE)
print("Clean dataset loaded!")
print("Shape:", df.shape)
df.head()

âœ… Clean dataset loaded!
Shape: (940, 24)


Unnamed: 0,id,activity_date,total_steps,total_distance,tracker_distance,logged_activities_distance,very_active_distance,moderately_active_distance,light_active_distance,sedentary_active_distance,...,calories,total_sleep_records,total_minutes_asleep,total_time_in_bed,weight_kg,weight_pounds,fat,b_m_i,is_manual_report,log_id
0,1503960366,2016-04-12,13162,8.5,8.5,0.0,1.88,0.55,6.06,0.0,...,1985,1.0,327.0,346.0,,,,,,
1,1503960366,2016-04-13,10735,6.97,6.97,0.0,1.57,0.69,4.71,0.0,...,1797,2.0,384.0,407.0,,,,,,
2,1503960366,2016-04-14,10460,6.74,6.74,0.0,2.44,0.4,3.91,0.0,...,1776,,,,,,,,,
3,1503960366,2016-04-15,9762,6.28,6.28,0.0,2.14,1.26,2.83,0.0,...,1745,1.0,412.0,442.0,,,,,,
4,1503960366,2016-04-16,12669,8.16,8.16,0.0,2.71,0.41,5.04,0.0,...,1863,2.0,340.0,367.0,,,,,,


## Step 1: Add Weekday & Weekend Features

In [2]:
# Convert to datetime
df["activity_date"] = pd.to_datetime(df["activity_date"])

# Extract weekday
df["weekday"] = df["activity_date"].dt.day_name()
df["is_weekend"] = df["activity_date"].dt.weekday >= 5

## Step 2: Add Sleep Hours

In [3]:
if "total_minutes_asleep" in df.columns:
    df["sleep_hours"] = df["total_minutes_asleep"] / 60

## Step 3: Add Total Active Minutes & Sedentary Hours

In [4]:
active_cols = ["very_active_minutes", "fairly_active_minutes", "lightly_active_minutes"]

# Total active minutes
df["total_active_minutes"] = df[active_cols].sum(axis=1, skipna=True)

# Sedentary hours
if "sedentary_minutes" in df.columns:
    df["sedentary_hours"] = df["sedentary_minutes"] / 60

## Step 4: Add Sleep Efficiency

In [5]:
if "total_time_in_bed" in df.columns and "total_minutes_asleep" in df.columns:
    df["sleep_efficiency"] = df["total_minutes_asleep"] / df["total_time_in_bed"]

## Step 5: Add BMI Categories

In [6]:
if "bmi" in df.columns:
    def bmi_category(bmi):
        if pd.isna(bmi): return None
        if bmi < 18.5: return "Underweight"
        elif 18.5 <= bmi < 24.9: return "Normal"
        elif 25 <= bmi < 29.9: return "Overweight"
        else: return "Obese"
    df["bmi_category"] = df["bmi"].apply(bmi_category)

## Step 6: Save Analysis-Ready Dataset

In [8]:
df.to_csv(OUTPUT_FILE, index=False)
print(f"Analysis-ready dataset saved at {OUTPUT_FILE}")
print("Shape:", df.shape)
print("New Columns:", [col for col in df.columns if col not in ["id","activity_date"]][:10], "...")


Analysis-ready dataset saved at D:/Projects/Bellabeat/Data/Processed/bellabeat_analysis_ready.csv
Shape: (940, 30)
New Columns: ['total_steps', 'total_distance', 'tracker_distance', 'logged_activities_distance', 'very_active_distance', 'moderately_active_distance', 'light_active_distance', 'sedentary_active_distance', 'very_active_minutes', 'fairly_active_minutes'] ...


## Key Takeaways
- Added **weekday & weekend** features for behavioral analysis
- Converted **minutes â†’ hours** for sleep & sedentary activity
- Created **sleep efficiency** to measure sleep quality
- Categorized **BMI** for easier health interpretation

ðŸ‘‰ This dataset (`bellabeat_analysis_ready.csv`) is now ready for **SQL analysis, Python EDA, and Power BI**.