# Checkpoint 1 – Jeff Sui
# Personal Data Exploration (Individual Checkpoint 1)


**Group ID:** Group 3  
**Driving Problem:** *Do they achieve 15 minutes of intense activity across different times of day (morning, afternoon, evening)?*  

This notebook performs exploratory data analysis (EDA) for the three assigned participants:
- **1503960366**
- **1624580081**
- **1644430081**


---

## Overview of the notebook
This notebook:
1. Loads daily and minute-level step data from the provided CSV files.
2. Performs **manual-scrutiny-aligned** checks (completeness, gaps, non-wear zeros).
3. Presents participant-level summaries using functions (to avoid repetitive code).
4. Connects findings back to the **driving problem** about achieving 15 consecutive minutes of intense activity in morning/afternoon/evening.



## Assumptions & Predictions

### Assumptions
- **Intensity proxy:** “Intense activity” ≙ **consecutive non-zero step minutes ≥ 15** within a time bin.
- **Time bins (inclusive):**
  - **Morning:** 06:00:00–11:59:59  
  - **Afternoon:** 12:00:00–17:59:59  
  - **Evening:** 18:00:00–23:59:59
- **Minute data format:** `minuteStepsWide_merged.csv` has an `Id` column and many timestamp-named columns (wide). We will **melt** to long format (`Id`, `dt`, `Steps`).
- **Timestamps/timezone:** Timestamps are treated as local clock time for the analysis window. If your raw data is UTC, adjust to local first (Sydney) **before** binning.
- **Missing values:** Treated as zero for the “consecutive non-zero steps” logic (i.e., missing breaks a streak).
- **Participants:** We focus only on the three assigned IDs: `1503960366, 1624580081, 1644430081`.

### Predictions
- Participants will **most often** achieve 15-minute streaks in the **afternoon**, with **morning** second, and **evening** least frequent.  
- Streak frequency will vary by person and by day; weekends may show **slightly higher** afternoon streaks.


### Data Loading (Daily + Minute) and Filtering to Assigned IDs
We load the two provided CSVs and filter rows to the three assigned participants to keep the scope aligned with the brief.


In [231]:
import pandas as pd

# Load data
daily_steps = pd.read_csv("Data/dailySteps_merged.csv")
minute_steps = pd.read_csv("Data/minuteStepsWide_merged.csv")

# Initilise assigned id
participants = [1503960366, 1624580081, 1644430081]


### Define function to perform metric calculation:
average step count per day\
maximum step count\
minimum step count

In [232]:

# Define functino to calculate required metrics for daily
def summarize_daily(person_id):
    person = daily_steps[daily_steps["Id"] == person_id]
    num_days = person["ActivityDay"].nunique()
    avg_steps = round(person["StepTotal"].mean())
    max_steps = person["StepTotal"].max()
    min_steps = person["StepTotal"].min()
    return {
        "Days": num_days,
        "Avg Steps/Day": round(avg_steps,2),
        "Max Steps/Day": max_steps,
        "Min Steps/Day": min_steps,
    }


### Define function to perform metric calculation:
number of non-zero minutes\
missing data\
average steps per minute\
maximum and minimum steps

In [233]:

# Define functino to calculate required metrics for minutes
def summarize_minutes(person_id):
    person = minute_steps[minute_steps["Id"] == person_id]
    steps = person.iloc[:,2:].values.flatten()  # all minute columns
    non_zero = (steps > 0).sum()
    missing = pd.isna(steps).sum()
    avg_steps = steps.mean()
    max_steps = steps.max()
    min_steps = steps.min()
    return {
        "Non-zero Minutes": int(non_zero),
        "Missing Data": int(missing),
        "Avg Steps/Min": round(avg_steps,2),
        "Max Steps/Min": max_steps,
        "Min Steps/Min": min_steps,
    }


### Display daily summary for each participant:

In [234]:

# Run summaries for each participant
for pid in participants:
    print(f"\nParticipant {pid}")
    print("Daily Summary:\n", summarize_daily(pid))



Participant 1503960366
Daily Summary:
 {'Days': 31, 'Avg Steps/Day': 12117, 'Max Steps/Day': 18134, 'Min Steps/Day': 0}

Participant 1624580081
Daily Summary:
 {'Days': 31, 'Avg Steps/Day': 5744, 'Max Steps/Day': 36019, 'Min Steps/Day': 1510}

Participant 1644430081
Daily Summary:
 {'Days': 30, 'Avg Steps/Day': 7283, 'Max Steps/Day': 18213, 'Min Steps/Day': 1223}


## Daily Step Summary

| Participant ID | Days | Avg Steps/Day | Max Steps/Day | Min Steps/Day |
|----------------|------|---------------|---------------|---------------|
| **1503960366** | 31   | 12,117        | 18,134        | 0             |
| **1624580081** | 31   | 5,744         | 36,019        | 1,510         |
| **1644430081** | 30   | 7,283         | 18,213        | 1,223         |


## Observations for daily step count:
**Participant 1503960366**

With a high daily average (12,117 steps) and several zero-step days, this participant likely engages in consistent activity on active days, but with some complete inactivity days.

They are the most likely among the three to reach 15-minute intense streaks, especially on their higher-step days.

**Participant 1624580081**

Shows the widest variability (daily steps range from 1,510 to 36,019).

The occasional very high activity days suggest that this participant might reach 15-minute streaks on those days, but their low overall average (5,744 steps) indicates they may struggle with consistency in hitting the target across all time bins.

**Participant 1644430081**

Displays moderate consistency with an average of 7,283 steps/day and a narrower range than Participant 1624580081.

Likely to achieve 15-minute streaks more regularly than 1624580081 but less consistently than 1503960366, perhaps with steady but not extreme activity patterns.

### Display minute summary for each participant:

In [235]:

# Run summaries for each participant
for pid in participants:
    print(f"\nParticipant {pid}")
    print("Minute Summary:\n", summarize_minutes(pid))



Participant 1503960366
Minute Summary:
 {'Non-zero Minutes': 8311, 'Missing Data': 0, 'Avg Steps/Min': 8.56, 'Max Steps/Min': 165, 'Min Steps/Min': 0}

Participant 1624580081
Minute Summary:
 {'Non-zero Minutes': 3679, 'Missing Data': 0, 'Avg Steps/Min': 3.98, 'Max Steps/Min': 184, 'Min Steps/Min': 0}

Participant 1644430081
Minute Summary:
 {'Non-zero Minutes': 5978, 'Missing Data': 0, 'Avg Steps/Min': 5.05, 'Max Steps/Min': 134, 'Min Steps/Min': 0}


## Minute Step Summary

| Participant ID | Non-zero Minutes | Missing Data | Avg Steps/Min | Max Steps/Min | Min Steps/Min |
|----------------|------------------|--------------|---------------|---------------|---------------|
| **1503960366** | 8,311            | 0            | 8.56          | 165           | 0             |
| **1624580081** | 3,679            | 0            | 3.98          | 184           | 0             |
| **1644430081** | 5,978            | 0            | 5.05          | 134           | 0             |


## Observations for minutes step count:
**Participant 1503960366**

With high overall activity and a maximum of 165 steps per minute, this participant is very likely to regularly achieve 15-minute streaks of intense activity, especially in the afternoon.

**Participant 1624580081**

Despite occasional very high bursts (up to 184 steps per minute), their low average suggests that 15-minute streaks are achieved only sporadically and not consistently across all time bins.

**Participant 1644430081**

With moderate average steps per minute and sustained non-zero activity, this participant can achieve 15-minute streaks at times, but less reliably than 1503960366.

# Final Statement

From this data exploration, I learnt that the three participants show **different patterns of activity intensity and consistency**.  
- **1503960366** has strong daily averages and sustained cadence, making them the most likely to consistently achieve 15-minute streaks of intense activity, especially in the afternoon.  
- **1624580081** shows very high variability with occasional bursts but low regularity, meaning 15-minute streaks are only achieved sporadically.  
- **1644430081** has steady but moderate activity, suggesting they sometimes reach the 15-minute threshold but not as reliably as 1503960366.  

In relation to the **Driving Problem** — *Do they achieve 15 minutes of intense activity across different times of day (morning, afternoon, evening)?* — the analysis shows that while all three participants are capable of meeting the target, only one does so consistently. The others may need to adjust their routines, such as scheduling activity in morning or evening bins, to reliably meet the 15-minute goal.
