# Assignment 2 - Individual Checkpoint 1
**Name:** Haihui Duan  
**Group ID:** CC07-Group-3  
**Driving Problem:** Do they achieve 15 minutes of intense activity across different times of day (morning, afternoon, evening)?  

### Overview
This notebook explores the step count data for the three assigned participants (IDs: 2873212765, 3372868164, 3977333714).  
The goal is to conduct an initial exploratory data analysis (EDA), both daily and minute-level, and to reflect on what we learn in relation to the driving problem.


### Initial Assumptions
I expected that the three participants would have similar activity levels, around 8,000 steps per day, 
with some variation at the minute level. I also assumed that one of them might show evidence of 
sustained high-intensity activity.


In [17]:
import pandas as pd


daily = pd.read_csv("Data/dailySteps_merged.csv")
hourly = pd.read_csv("Data/hourlySteps_merged.csv")
minute = pd.read_csv("Data/minuteStepsWide_merged.csv")


ids = [2873212765, 3372868164, 3977333714]


daily_sub = daily[daily["Id"].isin(ids)]
minute_sub = minute[minute["Id"].isin(ids)]

daily_sub.head()


Unnamed: 0,Id,ActivityDay,StepTotal
265,2873212765,4/12/2016,8796
266,2873212765,4/13/2016,7618
267,2873212765,4/14/2016,7910
268,2873212765,4/15/2016,8482
269,2873212765,4/16/2016,9685


### Daily Step Analysis
Here I calculate the number of days of data, the average steps per day, and the maximum and minimum step counts for each of the three participants.


In [18]:
def daily_summary(df, pid):
    person = df[df["Id"] == pid]
    return {
        "Id": pid,
        "Days": person["ActivityDay"].nunique(),
        "Avg Steps": round(person["StepTotal"].mean(), 2),
        "Max Steps": person["StepTotal"].max(),
        "Min Steps": person["StepTotal"].min()
    }

daily_results = [daily_summary(daily_sub, pid) for pid in ids]
pd.DataFrame(daily_results)


Unnamed: 0,Id,Days,Avg Steps,Max Steps,Min Steps
0,2873212765,31,7555.77,9685,2524
1,3372868164,20,6861.65,9715,3077
2,3977333714,30,10984.57,16520,746


### Minute Step Analysis
Here I analyse the step count at the minute level. For each participant, I calculate the number of non-zero minutes, missing values, the average steps per minute, and the maximum and minimum step counts.


In [19]:
def minute_summary(df, pid):

    person = df[df["Id"] == pid]
    

    steps = person.drop(columns=["Id"]).values.flatten()
    steps = pd.Series(steps)
    

    steps = pd.to_numeric(steps, errors="coerce")
    
    return {
        "Id": pid,
        "Non-zero minutes": int((steps > 0).sum()),
        "Missing": int(pd.isna(steps).sum()),
        "Avg per minute": round(steps.mean(), 2),
        "Max per minute": int(steps.max()),
        "Min per minute": int(steps.min())
    }

minute_results = [minute_summary(minute_sub, pid) for pid in ids]
pd.DataFrame(minute_results)




Unnamed: 0,Id,Non-zero minutes,Missing,Avg per minute,Max per minute,Min per minute
0,2873212765,7052,725,5.21,164,0
1,3372868164,4517,448,4.92,164,0
2,3977333714,8033,725,7.97,190,0


### Findings

**Daily Step Analysis**
- Participant 2873212765: 31 days of data, average 7556 steps/day. Max 9685, min 2524. This shows a relatively stable pattern with consistent activity.  
- Participant 3372868164: 20 days of data, average 6862 steps/day. Max 9715, min 3077. Their activity is moderate, with less variation than others.  
- Participant 3977333714: 30 days of data, average 10,985 steps/day. Max 16,520, min 746. This participant shows the highest daily activity, but also the largest fluctuations.  

**Minute Step Analysis**
- Participant 2873212765: 7052 non-zero minutes, average 5.2 steps/minute, max 164. Missing data 725 minutes. Suggests moderate activity with some missing records.  
- Participant 3372868164: 4517 non-zero minutes, average 4.9 steps/minute, max 164. Missing 452. Generally lower activity intensity compared to others.  
- Participant 3977333714: 8033 non-zero minutes, average 7.9 steps/minute, max 190. Missing 725. This participant consistently records higher minute-level activity, possibly more sustained or intense exercise.  

---

### Reflection

The daily and minute-level analyses reveal differences in overall activity patterns between the three participants.  
- Participant 3977333714 stands out with both the highest daily totals and higher intensity at the minute level, suggesting regular or more intense activity.  
- Participants 2873212765 and 3372868164 are more moderate, with consistent but lower step counts.  

These insights are relevant to our **driving problem ("Do they achieve 15 minutes of intense activity across different times of day?")** because they highlight variation in both the **total amount** and **intensity** of steps. This indicates that further analysis at the hourly level will be needed to identify whether these intense activities cluster in morning, afternoon, or evening.
