# Week08 - [Le Quoc Anh]: Data Exploration for Inactivity

## Overview:
In this notebook, I will analyze the activity data for three selected users to determine if they avoid inactivity for at least 10 hours a day. The analysis will involve daily and minute-level step count information, calculating inactive hours, and providing insights into activity patterns.

The dataset includes step counts collected from Fitbit users, with data provided at daily, hourly, and minute-level.

## Initial Assumptions and Predictions:

### Assumptions:
1. Each user will have varying levels of activity based on their lifestyle and usage of Fitbit.
2. Inactivity is defined as any hour with fewer than 100 steps, and users should aim to stay active for at least 10 hours per day.
3. Minute-level data will reveal bursts of activity, while hourly data will help identify extended periods of inactivity.

### Predictions:
- At least one of the users will likely exceed 10 hours of inactivity per day.
- Users will exhibit varying step patterns, with some showing consistent activity and others displaying sporadic bursts.

## Daily Step Count Analysis

This section calculates the daily step statistics for each user. For each person, I will calculate:
1. The total number of days for which we have data.
2. The average step count per day.
3. The maximum and minimum step count.
4. An additional observation on the consistency of their daily activity levels.

## Step 1: Data Loading and Filtering

We begin by loading the data for each of the three selected users and filtering out their specific daily, hourly, and minute-level step data.

In [42]:
import pandas as pd
daily_steps = pd.read_csv('/content/dailySteps_merged.csv')
hourly_steps = pd.read_csv('/content/hourlySteps_merged.csv')
minute_steps = pd.read_csv('/content/minuteStepsWide_merged.csv')

print(daily_steps.head())
print(hourly_steps.head())
print(minute_steps.head())


           Id ActivityDay  StepTotal
0  1503960366   4/12/2016      13162
1  1503960366   4/13/2016      10735
2  1503960366   4/14/2016      10460
3  1503960366   4/15/2016       9762
4  1503960366   4/16/2016      12669
           Id           ActivityHour  StepTotal
0  1503960366  4/12/2016 12:00:00 AM        373
1  1503960366   4/12/2016 1:00:00 AM        160
2  1503960366   4/12/2016 2:00:00 AM        151
3  1503960366   4/12/2016 3:00:00 AM          0
4  1503960366   4/12/2016 4:00:00 AM          0
           Id           ActivityHour  Steps00  Steps01  Steps02  Steps03  \
0  1503960366  4/13/2016 12:00:00 AM        4       16        0        0   
1  1503960366   4/13/2016 1:00:00 AM        0        0        0        0   
2  1503960366   4/13/2016 2:00:00 AM        0        0        0        0   
3  1503960366   4/13/2016 3:00:00 AM        0        0        0        0   
4  1503960366   4/13/2016 4:00:00 AM        0        0        0        0   

   Steps04  Steps05  Steps06  Ste

In [43]:

import random

#Get seed
random.seed(42)

unique_ids = daily_steps['Id'].unique().tolist()

selected_ids = random.sample(unique_ids, 3)
print(selected_ids)

# Define the inactivity threshold (e.g., fewer than 100 steps per hour is considered inactive)
inactivity_threshold = 100

[2320127002, 1624580081, 4558609924]


In [44]:
user_analysis = {}

# Loop through each of the selected users for detailed analysis
for user_id in selected_ids:

    # Filter the daily, hourly, and minute-level data for the specific user
    user_daily_data = daily_steps[daily_steps['Id'] == user_id]
    user_hourly_data = hourly_steps[hourly_steps['Id'] == user_id]
    user_minute_data = minute_steps[minute_steps['Id'] == user_id]

    # Proceed to analyze each user

## Step 2: Daily Step Count Analysis

In this step, we calculate the total number of days of data for the user, along with the average, maximum, and minimum number of steps per day.

In [51]:
# Loop through each user for daily step count analysis
for user_id in selected_ids:
    user_daily_data = daily_steps[daily_steps['Id'] == user_id]

    # Step 1: Calculate daily statistics
    total_days = len(user_daily_data)  # Total number of days of data
    daily_avg = user_daily_data['StepTotal'].mean()  # Average steps per day
    daily_max = user_daily_data['StepTotal'].max()  # Maximum steps on a day
    daily_min = user_daily_data['StepTotal'].min()  # Minimum steps on a day

    # Store results in the dictionary
    if user_id not in user_analysis:
        user_analysis[user_id] = {}

    user_analysis[user_id]['total_days'] = total_days
    user_analysis[user_id]['daily_avg'] = daily_avg
    user_analysis[user_id]['daily_max'] = daily_max
    user_analysis[user_id]['daily_min'] = daily_min

## Step 3: Minute-Level Step Count Analysis

Now we analyze minute-level data, focusing on the number of active (non-zero) minutes, missing data, and the average, maximum, and minimum steps taken in a minute. We also make an additional observation about burst activity based on the maximum steps in a minute.

In [52]:
# Loop through each user for minute-level step count analysis
for user_id in selected_ids:
    user_minute_data = minute_steps[minute_steps['Id'] == user_id]
    user_minute_steps = user_minute_data.iloc[:, 2:]  # Step columns are from index 2 onwards

    total_non_zero_minutes = (user_minute_steps > 0).sum().sum()  # Total non-zero minutes
    missing_data = user_minute_steps.isnull().sum().sum()  # Missing data
    avg_steps_per_minute = user_minute_steps.mean().mean()  # Average steps per minute
    max_steps_per_minute = user_minute_steps.max().max()  # Maximum steps in a minute
    min_steps_per_minute = user_minute_steps.min().min()  # Minimum steps in a minute

    # Additional observation: Analyze burst activity
    active_periods_observation = "User shows frequent bursts of activity." \
        if max_steps_per_minute > 150 else "User's activity is more spread out throughout the day."

    # Store minute-level data in the user dictionary
    user_analysis[user_id]['non_zero_minutes'] = total_non_zero_minutes
    user_analysis[user_id]['missing_data'] = missing_data
    user_analysis[user_id]['avg_steps_per_minute'] = avg_steps_per_minute
    user_analysis[user_id]['max_steps_per_minute'] = max_steps_per_minute
    user_analysis[user_id]['min_steps_per_minute'] = min_steps_per_minute
    user_analysis[user_id]['active_periods_observation'] = active_periods_observation

In [56]:
# Loop through each user for inactive hours calculation
for user_id in selected_ids:
    user_hourly_data = hourly_steps[hourly_steps['Id'] == user_id]

    # Step 3: Calculate inactive hours (hours where steps < inactivity threshold)
    inactive_hours = (user_hourly_data['StepTotal'] < inactivity_threshold).sum()

    # Store the inactive hours in the user dictionary
    user_analysis[user_id]['inactive_hours'] = inactive_hours

In [58]:
# Convert the results into a DataFrame for easier viewing and interpretation
user_analysis_df = pd.DataFrame(user_analysis).T

user_analysis_df

Unnamed: 0,total_days,daily_avg,daily_max,daily_min,non_zero_minutes,missing_data,avg_steps_per_minute,max_steps_per_minute,min_steps_per_minute,active_periods_observation,inactive_hours
2320127002,31,4716.870968,10725,772,6079,0,3.161614,123,0,User's activity is more spread out throughout ...,394
1624580081,31,5743.903226,36019,1510,3679,0,3.975217,184,0,User shows frequent bursts of activity.,461
4558609924,31,7685.129032,13743,3428,9193,0,5.503231,207,0,User shows frequent bursts of activity.,336



## Final Statement

### What I Learned:
Through this analysis, I found that users exhibit different patterns of activity. While some users have frequent bursts of activity, they still fail to avoid long periods of inactivity, which is concerning for overall health.

The minute-level data provided useful insights into the distribution of steps, revealing both sedentary periods and high-activity bursts. However, despite the bursts of activity, all users failed to consistently avoid 10 hours of inactivity each day, which was the key driving question.

This analysis highlights the importance of not just focusing on total daily steps but also on how evenly activity is spread throughout the day to reduce sedentary behavior.