# 02 — Usage Analysis (Bellabeat / Fitbit)

Objective:
- Use cleaned datasets to understand how people use smart devices:
  - Activity levels and sedentary time
  - Sleep behaviour
  - When (time-of-day) activity happens
- Produce insights that can translate into product + marketing recommendations for Bellabeat.

Inputs (from Notebook 01):
- `data_cleaned/daily_activity_clean.csv`
- `data_cleaned/sleep_day_clean.csv`
- `data_cleaned/hourly_steps_clean.csv`
- `data_cleaned/hourly_intensities_clean.csv`


In [1]:
from pathlib import Path
import pandas as pd
import numpy as np


In [2]:
data_dir = Path("../data_cleaned")

daily = pd.read_csv(data_dir / "daily_activity_clean.csv", parse_dates=["activity_date"])
sleep = pd.read_csv(data_dir / "sleep_day_clean.csv", parse_dates=["sleep_day"])
h_steps = pd.read_csv(data_dir / "hourly_steps_clean.csv", parse_dates=["activity_hour"])
h_int = pd.read_csv(data_dir / "hourly_intensities_clean.csv", parse_dates=["activity_hour"])

daily.shape, sleep.shape, h_steps.shape, h_int.shape


((940, 15), (410, 5), (22099, 3), (22099, 4))

In [3]:
def overview(df, name):
    print(f"\n{name}")
    print("shape:", df.shape)
    print("users:", df["id"].nunique() if "id" in df.columns else "n/a")
    display(df.head(3))

overview(daily, "daily")
overview(sleep, "sleep")
overview(h_steps, "hourly_steps")
overview(h_int, "hourly_intensities")



daily
shape: (940, 15)
users: 33


Unnamed: 0,id,activity_date,totalsteps,totaldistance,trackerdistance,loggedactivitiesdistance,veryactivedistance,moderatelyactivedistance,lightactivedistance,sedentaryactivedistance,veryactiveminutes,fairlyactiveminutes,lightlyactiveminutes,sedentaryminutes,calories
0,1503960366,2016-04-12,13162,8.5,8.5,0.0,1.88,0.55,6.06,0.0,25,13,328,728,1985
1,1503960366,2016-04-13,10735,6.97,6.97,0.0,1.57,0.69,4.71,0.0,21,19,217,776,1797
2,1503960366,2016-04-14,10460,6.74,6.74,0.0,2.44,0.4,3.91,0.0,30,11,181,1218,1776



sleep
shape: (410, 5)
users: 24


Unnamed: 0,id,sleep_day,totalsleeprecords,totalminutesasleep,totaltimeinbed
0,1503960366,2016-04-12,1,327,346
1,1503960366,2016-04-13,2,384,407
2,1503960366,2016-04-15,1,412,442



hourly_steps
shape: (22099, 3)
users: 33


Unnamed: 0,id,activity_hour,steptotal
0,1503960366,2016-04-12 00:00:00,373
1,1503960366,2016-04-12 01:00:00,160
2,1503960366,2016-04-12 02:00:00,151



hourly_intensities
shape: (22099, 4)
users: 33


Unnamed: 0,id,activity_hour,totalintensity,averageintensity
0,1503960366,2016-04-12 00:00:00,20,0.333333
1,1503960366,2016-04-12 01:00:00,8,0.133333
2,1503960366,2016-04-12 02:00:00,7,0.116667


In [4]:
daily_metrics = ["totalsteps", "calories", "sedentaryminutes", "veryactiveminutes", "fairlyactiveminutes", "lightlyactiveminutes"]
daily[daily_metrics].describe().T


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
totalsteps,940.0,7637.910638,5087.150742,0.0,3789.75,7405.5,10727.0,36019.0
calories,940.0,2303.609574,718.166862,0.0,1828.5,2134.0,2793.25,4900.0
sedentaryminutes,940.0,991.210638,301.267437,0.0,729.75,1057.5,1229.5,1440.0
veryactiveminutes,940.0,21.164894,32.844803,0.0,0.0,4.0,32.0,210.0
fairlyactiveminutes,940.0,13.564894,19.987404,0.0,0.0,6.0,19.0,143.0
lightlyactiveminutes,940.0,192.812766,109.1747,0.0,127.0,199.0,264.0,518.0


In [5]:
user_daily = (
    daily.groupby("id", as_index=False)
    .agg(
        days_tracked=("activity_date", "nunique"),
        avg_steps=("totalsteps", "mean"),
        avg_calories=("calories", "mean"),
        avg_sedentary_min=("sedentaryminutes", "mean"),
        avg_very_active_min=("veryactiveminutes", "mean"),
        avg_fairly_active_min=("fairlyactiveminutes", "mean"),
        avg_lightly_active_min=("lightlyactiveminutes", "mean"),
    )
)

user_daily.sort_values("days_tracked", ascending=False).head()


Unnamed: 0,id,days_tracked,avg_steps,avg_calories,avg_sedentary_min,avg_very_active_min,avg_fairly_active_min,avg_lightly_active_min
0,1503960366,31,12116.741935,1816.419355,848.16129,38.709677,19.16129,219.935484
1,1624580081,31,5743.903226,1483.354839,1257.741935,8.677419,5.806452,153.483871
3,1844505072,31,2580.064516,1573.483871,1206.612903,0.129032,1.290323,115.451613
4,1927972279,31,916.129032,2172.806452,1317.419355,1.322581,0.774194,38.580645
5,2022484408,31,11370.645161,2509.967742,1112.580645,36.290323,19.354839,257.451613


In [6]:
def steps_segment(x):
    if x < 5000:
        return "Sedentary (<5k)"
    elif x < 7500:
        return "Low active (5k–7.5k)"
    elif x < 10000:
        return "Somewhat active (7.5k–10k)"
    elif x < 12500:
        return "Active (10k–12.5k)"
    else:
        return "Highly active (12.5k+)"

user_daily["steps_segment"] = user_daily["avg_steps"].apply(steps_segment)

segment_summary = (
    user_daily.groupby("steps_segment", as_index=False)
    .agg(
        users=("id", "nunique"),
        avg_steps=("avg_steps", "mean"),
        avg_sedentary_min=("avg_sedentary_min", "mean"),
        avg_very_active_min=("avg_very_active_min", "mean"),
        avg_calories=("avg_calories", "mean"),
    )
    .sort_values("users", ascending=False)
)

segment_summary


Unnamed: 0,steps_segment,users,avg_steps,avg_sedentary_min,avg_very_active_min,avg_calories
2,Low active (5k–7.5k),9,6566.796623,1051.810981,7.289477,2131.874803
4,Somewhat active (7.5k–10k),9,8681.548746,810.675986,30.881123,2464.38857
3,Sedentary (<5k),8,2936.031894,1173.55577,3.416053,2013.785149
0,Active (10k–12.5k),5,11321.862465,912.059768,29.61995,2295.584946
1,Highly active (12.5k+),2,15401.66129,1130.435484,75.612903,3183.032258


In [7]:
# If sleep has TotalMinutesAsleep / TotalTimeInBed columns (typical), standardise names
sleep_cols = [c.lower() for c in sleep.columns]
sleep.columns = sleep_cols  # ensure lowercase (in case)

sleep.describe(include="number").T


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
id,410.0,4994963000.0,2060863000.0,1503960000.0,3977334000.0,4702922000.0,6962181000.0,8792010000.0
totalsleeprecords,410.0,1.119512,0.3466356,1.0,1.0,1.0,1.0,3.0
totalminutesasleep,410.0,419.1732,118.6359,58.0,361.0,432.5,490.0,796.0
totaltimeinbed,410.0,458.4829,127.4551,61.0,403.75,463.0,526.0,961.0


In [8]:
sleep_user = (
    sleep.groupby("id", as_index=False)
    .agg(
        sleep_days=("sleep_day", "nunique"),
        avg_minutes_asleep=("totalminutesasleep", "mean") if "totalminutesasleep" in sleep.columns else ("sleep_day", "size"),
        avg_time_in_bed=("totaltimeinbed", "mean") if "totaltimeinbed" in sleep.columns else ("sleep_day", "size"),
    )
)

sleep_user.sort_values("sleep_days", ascending=False).head()


Unnamed: 0,id,sleep_days,avg_minutes_asleep,avg_time_in_bed
18,6962181067,31,448.0,466.129032
22,8378563200,31,445.129032,485.935484
14,5553957443,31,463.483871,505.870968
7,3977333714,28,293.642857,461.142857
11,4445114986,28,385.178571,416.821429


In [9]:
# Create a date column in sleep to merge on date only (sleep_day often includes time)
sleep_for_merge = sleep.copy()
sleep_for_merge["activity_date"] = sleep_for_merge["sleep_day"].dt.date
sleep_for_merge["activity_date"] = pd.to_datetime(sleep_for_merge["activity_date"])

daily_for_merge = daily.copy()
daily_for_merge["activity_date"] = pd.to_datetime(daily_for_merge["activity_date"].dt.date)

day_level = daily_for_merge.merge(
    sleep_for_merge[["id", "activity_date"] + [c for c in ["totalminutesasleep", "totaltimeinbed"] if c in sleep_for_merge.columns]],
    on=["id", "activity_date"],
    how="inner"
)

day_level.shape, day_level.head()


((410, 17),
            id activity_date  totalsteps  totaldistance  trackerdistance  \
 0  1503960366    2016-04-12       13162           8.50             8.50   
 1  1503960366    2016-04-13       10735           6.97             6.97   
 2  1503960366    2016-04-15        9762           6.28             6.28   
 3  1503960366    2016-04-16       12669           8.16             8.16   
 4  1503960366    2016-04-17        9705           6.48             6.48   
 
    loggedactivitiesdistance  veryactivedistance  moderatelyactivedistance  \
 0                       0.0                1.88                      0.55   
 1                       0.0                1.57                      0.69   
 2                       0.0                2.14                      1.26   
 3                       0.0                2.71                      0.41   
 4                       0.0                3.19                      0.78   
 
    lightactivedistance  sedentaryactivedistance  veryactive

In [10]:
corr_cols = ["totalsteps", "sedentaryminutes", "veryactiveminutes"]
for c in ["totalminutesasleep", "totaltimeinbed"]:
    if c in day_level.columns:
        corr_cols.append(c)

day_level[corr_cols].corr(numeric_only=True)


Unnamed: 0,totalsteps,sedentaryminutes,veryactiveminutes,totalminutesasleep,totaltimeinbed
totalsteps,1.0,-0.130036,0.543694,-0.190344,-0.166232
sedentaryminutes,-0.130036,1.0,-0.016484,-0.601073,-0.62028
veryactiveminutes,0.543694,-0.016484,1.0,-0.088127,-0.109623
totalminutesasleep,-0.190344,-0.601073,-0.088127,1.0,0.930422
totaltimeinbed,-0.166232,-0.62028,-0.109623,0.930422,1.0


In [11]:
h_steps["hour"] = h_steps["activity_hour"].dt.hour
hourly_steps_profile = (
    h_steps.groupby("hour", as_index=False)
    .agg(avg_steps=("steptotal", "mean") if "steptotal" in h_steps.columns else ("stepstotal", "mean") if "stepstotal" in h_steps.columns else ("steps", "mean"))
)

hourly_steps_profile.head()


Unnamed: 0,hour,avg_steps
0,0,42.188437
1,1,23.102894
2,2,17.110397
3,3,6.426581
4,4,12.699571


In [12]:
# Detect steps column robustly
possible_step_cols = [c for c in h_steps.columns if "step" in c and c != "steps_segment"]
possible_step_cols


['steptotal']

In [13]:
steps_col = possible_step_cols[0]
steps_col


'steptotal'

In [14]:
hourly_steps_profile = h_steps.groupby("hour", as_index=False).agg(avg_steps=(steps_col, "mean"))
hourly_steps_profile.sort_values("hour").head(24)


Unnamed: 0,hour,avg_steps
0,0,42.188437
1,1,23.102894
2,2,17.110397
3,3,6.426581
4,4,12.699571
5,5,43.869099
6,6,178.508056
7,7,306.049409
8,8,427.544576
9,9,433.301826


## Key Findings and Actionable Insights

### 1. Most users are not highly active
The majority of users average fewer than **7,500 steps per day**, placing them in sedentary or low-activity segments.

**Action:**  
Position Bellabeat as a *habit-building wellness companion* for everyday users rather than a performance-focused fitness tracker. Emphasise achievable progress and consistency over aggressive fitness goals.

---

### 2. High sedentary time persists across all activity levels
Even users with moderate step counts record **substantial sedentary minutes** each day.

**Action:**  
Introduce or strengthen features that encourage **breaking up sedentary time**, such as gentle movement reminders, stretching prompts, or posture cues, rather than focusing solely on step accumulation.

---

### 3. Highly active users represent a small but engaged segment
Users exceeding **10,000 steps per day** make up a small proportion of the user base but demonstrate higher engagement and calorie expenditure.

**Action:**  
Offer **tiered experiences** within the product—core wellness features for most users, with optional advanced insights for highly active users to maintain engagement without alienating beginners.

---

### 4. Physical activity is concentrated during daytime hours
Hourly activity data shows clear peaks during **late morning and afternoon**, with a noticeable decline in the evening.

**Action:**  
Schedule **evening-focused nudges** that align with natural behaviour patterns, such as short walks, light yoga, or relaxation routines rather than step-based goals late in the day.

---

### 5. Sleep tracking adoption is lower than activity tracking
Fewer users consistently record sleep data compared to daily activity.

**Action:**  
Reduce friction around sleep tracking by improving education on its benefits, surfacing sleep insights earlier in the app, and using low-effort reminders to increase adoption.

---

### 6. Increased activity is modestly associated with better sleep
More active days show a **weak but positive relationship** with sleep duration and time spent in bed.

**Action:**  
Frame product messaging around **holistic wellness**, highlighting how regular movement supports better rest rather than presenting activity and sleep as separate goals.

---

### 7. User behaviour varies more by consistency than intensity
Users differ significantly in how consistently they engage with the device, not just in how active they are on tracked days.

**Action:**  
Segment users by **consistency versus intensity**, offering streak-based motivation for consistent users and re-engagement flows for those with intermittent usage patterns.

---

### 8. User behaviour aligns with Bellabeat’s lifestyle wellness brand
The dataset reflects **everyday lifestyle activity**, not structured athletic training.

**Action:**  
Reinforce Bellabeat’s positioning around balance, sustainability, and long-term wellbeing. Avoid overly competitive or fitness-centric messaging and focus on accessible, lifestyle-oriented health improvements.


In [15]:
facts = {}

facts["users_daily_activity"] = int(daily["id"].nunique())
facts["users_sleep"] = int(sleep["id"].nunique())
facts["users_hourly_steps"] = int(h_steps["id"].nunique())

facts["avg_steps_overall"] = float(daily["totalsteps"].mean())
facts["median_steps_overall"] = float(daily["totalsteps"].median())
facts["avg_sedentary_minutes"] = float(daily["sedentaryminutes"].mean())

if "totalminutesasleep" in sleep.columns:
    facts["avg_minutes_asleep"] = float(sleep["totalminutesasleep"].mean())

facts


{'users_daily_activity': 33,
 'users_sleep': 24,
 'users_hourly_steps': 33,
 'avg_steps_overall': 7637.9106382978725,
 'median_steps_overall': 7405.5,
 'avg_sedentary_minutes': 991.2106382978724,
 'avg_minutes_asleep': 419.17317073170733}