# Week08-Gabriel-StepCountAnalysis

**Driving Problem**: Do they avoid inactivity in at least 10 hours a day?

## Overview
This notebook focuses on analyzing the step count data for three individuals over a given period. In particular, with IDs `6962181067`, `7007744171`, and `7086361926`. The aim is to extract meaningful insights related to daily and minute step count information, including statistics such as the average, maximum, and minimum steps. We also explore missing data and provide additional observations for each person. This analysis contributes to our group's broader question about step count trends and patterns.

## Assumptions and Predictions
- The dataset contains accurate step count data for the three individuals.
- We expect that daily step count varies across the individuals, with higher counts on more active days.
- There may be missing data due to tracking issues or non-wear days.
- We predict that the average steps per minute will be lower than expected for days with missing data.


# 1. Loading the Step Count Data

- Start date: 16/09/2024
- End date: 16/09/2024

### Actions:
1. Defined the paths to the three datasets: `dailySteps_merged.csv`, `hourlySteps_merged.csv`, and `minuteStepsWide_merged.csv`.
2. Loaded the daily, hourly, and minute step count datasets into pandas DataFrames using `pd.read_csv()`.
3. Displayed the first five rows of each dataset using `print()` to verify that the data loaded correctly.

In [2]:
import pandas as pd

# Define the paths to the CSV files
daily_steps_path = 'dailySteps_merged.csv'
hourly_steps_path = 'hourlySteps_merged.csv'
minute_steps_path = 'minuteStepsWide_merged.csv'

# Load the datasets using pandas
daily_steps_df = pd.read_csv(daily_steps_path)
hourly_steps_df = pd.read_csv(hourly_steps_path)
minute_steps_df = pd.read_csv(minute_steps_path)

# Display the first few rows of each dataset to check if they are loaded correctly
print("Daily Steps Data:")
print(daily_steps_df.head())

print("\nHourly Steps Data:")
print(hourly_steps_df.head())

print("\nMinute Steps Data:")
print(minute_steps_df.head())

Daily Steps Data:
           Id ActivityDay  StepTotal
0  1503960366   4/12/2016      13162
1  1503960366   4/13/2016      10735
2  1503960366   4/14/2016      10460
3  1503960366   4/15/2016       9762
4  1503960366   4/16/2016      12669

Hourly Steps Data:
           Id           ActivityHour  StepTotal
0  1503960366  4/12/2016 12:00:00 AM        373
1  1503960366   4/12/2016 1:00:00 AM        160
2  1503960366   4/12/2016 2:00:00 AM        151
3  1503960366   4/12/2016 3:00:00 AM          0
4  1503960366   4/12/2016 4:00:00 AM          0

Minute Steps Data:
           Id           ActivityHour  Steps00  Steps01  Steps02  Steps03  \
0  1503960366  4/13/2016 12:00:00 AM        4       16        0        0   
1  1503960366   4/13/2016 1:00:00 AM        0        0        0        0   
2  1503960366   4/13/2016 2:00:00 AM        0        0        0        0   
3  1503960366   4/13/2016 3:00:00 AM        0        0        0        0   
4  1503960366   4/13/2016 4:00:00 AM        0       

# 2. Selecting Users to Analyze

- Start date: 16/09/2024
- End date: 16/09/2024

### Actions:
1. Extracted the unique user IDs from the `daily_steps_df` DataFrame.
2. Selected the users with indexes 25, 26, and 27.
3. Filtered the `daily_steps_df`, `hourly_steps_df`, and `minute_steps_df` DataFrames to include only the selected users.



In [3]:
# Extract unique user IDs from the daily steps DataFrame
unique_ids = daily_steps_df['Id'].unique()

# Select the users with indexes 25, 26, and 27
selected_ids = unique_ids[24:27]

# Filter the data for these users in the daily, hourly, and minute datasets
selected_daily_steps = daily_steps_df[daily_steps_df['Id'].isin(selected_ids)]
selected_hourly_steps = hourly_steps_df[hourly_steps_df['Id'].isin(selected_ids)]
selected_minute_steps = minute_steps_df[minute_steps_df['Id'].isin(selected_ids)]

# Display the filtered data for validation
print("Selected Daily Steps Data:")
print(selected_daily_steps.head())

print("\nSelected Hourly Steps Data:")
print(selected_hourly_steps.head())

print("\nSelected Minute Steps Data:")
print(selected_minute_steps.head())

Selected Daily Steps Data:
             Id ActivityDay  StepTotal
680  6962181067   4/12/2016      10199
681  6962181067   4/13/2016       5652
682  6962181067   4/14/2016       1551
683  6962181067   4/15/2016       5563
684  6962181067   4/16/2016      13217

Selected Hourly Steps Data:
               Id           ActivityHour  StepTotal
16007  6962181067  4/12/2016 12:00:00 AM         32
16008  6962181067   4/12/2016 1:00:00 AM          0
16009  6962181067   4/12/2016 2:00:00 AM          0
16010  6962181067   4/12/2016 3:00:00 AM          0
16011  6962181067   4/12/2016 4:00:00 AM          0

Selected Minute Steps Data:
               Id           ActivityHour  Steps00  Steps01  Steps02  Steps03  \
15688  6962181067  4/13/2016 12:00:00 AM        0        0        0        0   
15689  6962181067   4/13/2016 1:00:00 AM        0        0        0        0   
15690  6962181067   4/13/2016 2:00:00 AM        0        0        0        0   
15691  6962181067   4/13/2016 3:00:00 AM        0

# 3. Daily Step Count Analysis

- Start date: 16/09/2024
- End date: 16/09/2024

### Actions:
1. Grouped the `selected_daily_steps` DataFrame by user `Id` to focus on individual step counts.
2. Calculated the number of days of data for each user.
3. Computed the average, maximum, and minimum step counts for each user.
4. Calculated the standard deviation of daily step counts to observe any variability in the data.
5. Displayed the computed statistics for validation.


In [6]:
# Group the daily steps data by user Id and perform statistical calculations
daily_stats = selected_daily_steps.groupby('Id').agg(
    num_days=('ActivityDay', 'nunique'),
    avg_steps=('StepTotal', 'mean'),
    max_steps=('StepTotal', 'max'),
    min_steps=('StepTotal', 'min'),
    std_steps=('StepTotal', 'std')
)

# Display the computed statistics for each user
print("Daily Step Count Statistics:")
daily_stats

Daily Step Count Statistics:


Unnamed: 0_level_0,num_days,avg_steps,max_steps,min_steps,std_steps
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
6962181067,31,9794.806452,20031,1551,3941.753387
7007744171,26,11323.423077,20067,0,5306.477598
7086361926,31,9371.774194,14560,0,3857.093316


# 4. Minute-Level Step Count Analysis

- Start date: 16/09/2024
- End date: 16/09/2024

### Actions:
1. Filtered the `selected_minute_steps` DataFrame for each selected user.
2. Calculated the total number of non-zero minute entries (where steps > 0) for each user.
3. Identified any missing data in the minute-level records for each user.
4. Computed the average, maximum, minimum, and standard deviation of steps per minute for each user.
5. Displayed the computed statistics for validation.



In [7]:
# Perform minute-level analysis, including standard deviation
minute_stats = selected_minute_steps.groupby('Id').apply(
    lambda x: pd.Series({
        'non_zero_minutes': (x.iloc[:, 2:] > 0).sum().sum(),  # Non-zero minute counts
        'missing_data': x.isnull().sum().sum(),  # Missing data points
        'avg_steps_minute': x.iloc[:, 2:].mean().mean(),  # Average steps per minute
        'max_steps_minute': x.iloc[:, 2:].max().max(),  # Max steps in a minute
        'min_steps_minute': x.iloc[:, 2:].min().min(),  # Min steps in a minute
        'std_steps_minute': x.iloc[:, 2:].stack().std()  # Standard deviation of steps per minute
    })
)

# Display the minute-level statistics for each user
print("Minute-Level Step Count Statistics with Standard Deviation:")
minute_stats)

Minute-Level Step Count Statistics with Standard Deviation:


Unnamed: 0_level_0,non_zero_minutes,missing_data,avg_steps_minute,max_steps_minute,min_steps_minute,std_steps_minute
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
6962181067,8709.0,0.0,6.804625,155.0,0.0,19.449823
7007744171,7990.0,0.0,8.094656,121.0,0.0,20.577051
7086361926,6155.0,0.0,6.703056,175.0,0.0,22.773794


## Final Statement

Today's exploration of the step count data for users 6962181067, 7007744171, and 7086361926 established a baseline understanding of their activity patterns. By analyzing both daily and minute-level step data, we identified key statistics such as the number of days with available data, average steps per day, and variability in minute-level activity.

While this preliminary analysis does not directly answer the driving problem — "Do they avoid inactivity in at least 10 hours a day?" — it sets a crucial baseline for further investigation. The insights gained from daily step patterns and the identification of non-zero minute activity will guide our next steps in determining how many hours per day users are active. 

In conclusion, this exploration has provided the necessary groundwork to delve deeper into user activity, enabling us to build on this foundation and move towards answering the driving problem with more targeted analysis.
