# Assignment 2 — Product Notebook (GROUP 3)  
**Unit:** DATA3406 / DATA2002  
**Date:** 2025-10-23  
**Repository:** _[Link to GitHub repo]_  
**Team:** _[List names & roles: Manager, Tracker, etc.]_

---

## Driving Problem
> _Do they achieve 15 minutes of intense activity across different times of day (morning, afternoon, evening)?_

**Why it matters (1 sentence):** _[Brief statement]_

**Notebook Purpose (overview):** This group Product Notebook presents the full pipeline — from raw Fitbit data to cleaned datasets, descriptive analytics, analysis aligned to the driving problem, and a concise answer supported by figures, tables, and algebra.

> **Note:** Ethical analysis, stakeholder analysis, and data characterisation are documented in the repository README and Wiki. This notebook references those materials where relevant.

## Reproducibility & How to Run

- Place the supplied CSV files under the `data/` directory:  
  - `data/dailySteps_merged.csv`  
  - `data/hourlySteps_merged.csv`  
  - `data/minuteStepsWide_merged.csv`  
- Python 3.11+ recommended.  
- Run all cells in order (top to bottom).  
- Dependencies: `pandas`, `numpy`, `matplotlib`.  
- This notebook uses **matplotlib only** for plotting.

## Environment Setup & Imports

**Assumptions & Predictions**  
- _Assumptions:_ We will use standard scientific Python libraries and configure matplotlib defaults for a consistent look.  
- _Predictions:_ We will use standard scientific Python libraries and configure matplotlib defaults for a consistent look.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
# Imports & basic configuration
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Matplotlib default sizing for visual consistency across the notebook
plt.rcParams['figure.figsize'] = (8, 4.5)
plt.rcParams['axes.titlesize'] = 12
plt.rcParams['axes.labelsize'] = 11
plt.rcParams['legend.fontsize'] = 10
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10

# Project constants
DATA_DIR = "data"  # per project convention
DAILY_PATH = os.path.join(DATA_DIR, "dailySteps_merged.csv")
HOURLY_PATH = os.path.join(DATA_DIR, "hourlySteps_merged.csv")
MINUTE_WIDE_PATH = os.path.join(DATA_DIR, "minuteStepsWide_merged.csv")

# Utility: simple guard to check files exist
for p in [DAILY_PATH, HOURLY_PATH, MINUTE_WIDE_PATH]:
    print(f"Found: {p} ->", os.path.exists(p))

**Observation & Interpretation**  
- _What changed / what do the results show?_ Imports succeed and plotting defaults are set. Data file presence is reported.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Utility Functions

**Assumptions & Predictions**  
- _Assumptions:_ Wrapping repeated routines as functions improves clarity and avoids duplicate code.  
- _Predictions:_ Wrapping repeated routines as functions improves clarity and avoids duplicate code.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
from typing import Tuple, Dict

def load_csvs(daily_path: str, hourly_path: str, minute_wide_path: str) -> Dict[str, pd.DataFrame]:
    dfs = {
        "daily": pd.read_csv(daily_path),
        "hourly": pd.read_csv(hourly_path),
        "minute_wide": pd.read_csv(minute_wide_path)
    }
    return dfs

def ensure_datetime(df: pd.DataFrame, cols: list) -> pd.DataFrame:
    for c in cols:
        if c in df.columns:
            df[c] = pd.to_datetime(df[c], errors='coerce')
    return df

def label_time_of_day(ts: pd.Timestamp) -> str:
    h = ts.hour
    if 5 <= h < 12: return "morning"
    if 12 <= h < 17: return "afternoon"
    if 17 <= h < 23: return "evening"
    return "night"  # retained for completeness, may be excluded later depending on design

**Observation & Interpretation**  
- _What changed / what do the results show?_ Reusable helpers defined for loading data, datetime conversion, and time-of-day labeling.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Manual Scrutiny of Data (Reference)

Before running code, manually open the CSVs in a text editor/Excel to scan for separators, header rows, NA patterns, and oddities.  
Document notes in the Wiki/README as per the spec. This cell serves as a reminder and cross-reference.

## Load Data

**Assumptions & Predictions**  
- _Assumptions:_ Files exist under `data/` and standard CSV parsing will succeed.  
- _Predictions:_ Files exist under `data/` and standard CSV parsing will succeed.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
dfs = load_csvs(DAILY_PATH, HOURLY_PATH, MINUTE_WIDE_PATH)

for name, df in dfs.items():
    print(name, df.shape)
    display(df.head(3))

**Observation & Interpretation**  
- _What changed / what do the results show?_ DataFrames loaded with expected shapes; quick head checks clarify schema.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Clean & Harmonise Data

**Assumptions & Predictions**  
- _Assumptions:_ We will parse datetime columns and standardise column names as needed.  
- _Predictions:_ We will parse datetime columns and standardise column names as needed.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
daily = dfs["daily"].copy()
hourly = dfs["hourly"].copy()
minute_wide = dfs["minute_wide"].copy()

# Example: normalise datetime columns (placeholders; adjust to actual column names)
daily = ensure_datetime(daily, [c for c in daily.columns if "date" in c.lower() or "time" in c.lower()])
hourly = ensure_datetime(hourly, [c for c in hourly.columns if "date" in c.lower() or "time" in c.lower()])
minute_wide = ensure_datetime(minute_wide, [c for c in minute_wide.columns if "date" in c.lower() or "time" in c.lower()])

# Example: standardise id column naming (adjust to actual column names in dataset)
for df in [daily, hourly, minute_wide]:
    for c in df.columns:
        if c.lower() in {"id", "memberid", "participantid"}:
            df.rename(columns={c: "Id"}, inplace=True)

print("Columns after harmonisation:")
print("daily:", list(daily.columns)[:8], "...")
print("hourly:", list(hourly.columns)[:8], "...")
print("minute_wide:", list(minute_wide.columns)[:8], "...")

**Observation & Interpretation**  
- _What changed / what do the results show?_ Datetime parsing applied; IDs harmonised; columns inspected for downstream alignment.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Define Intensity & Adherence Rules

**Assumptions & Predictions**  
- _Assumptions:_ We adopt a literature-informed threshold for 'intense' minute counts and define morning/afternoon/evening bins.  
- _Predictions:_ We adopt a literature-informed threshold for 'intense' minute counts and define morning/afternoon/evening bins.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
# Placeholders; replace with group-agreed definitions and citations in README/Wiki
INTENSE_STEPS_PER_MIN = 100  # EXAMPLE PLACEHOLDER
INTENSE_WINDOW_MINUTES = 15

TIME_OF_DAY_BINS = ["morning", "afternoon", "evening"]  # we may exclude 'night' for our driving question
print("Intensity definition and time-of-day bins set.")

**Observation & Interpretation**  
- _What changed / what do the results show?_ Rules are explicit and centralised for transparency and later revision.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Transform: Compute Intense Activity by Time-of-Day

**Assumptions & Predictions**  
- _Assumptions:_ Hourly or minute-level data provide sufficient granularity to compute continuous intense windows.  
- _Predictions:_ Hourly or minute-level data provide sufficient granularity to compute continuous intense windows.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
# Placeholder sketch: reshape minute_wide to long, compute rolling windows, label time-of-day
# NOTE: Replace with actual column names and logic based on minute_wide schema.
long_minutes = minute_wide.copy()  # TODO: melt to long format if in wide

# Example function placeholders
def is_intense(steps_per_min: float) -> bool:
    return steps_per_min >= INTENSE_STEPS_PER_MIN

# TODO: compute continuous 15-minute windows meeting is_intense, then map to time-of-day via timestamp
print("Placeholder transformation complete (implement real logic).")

**Observation & Interpretation**  
- _What changed / what do the results show?_ We will verify that windows are correctly identified and binned to morning/afternoon/evening.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Descriptive Analytics

**Assumptions & Predictions**  
- _Assumptions:_ Basic summaries will inform data quality and prevalence of intense windows across participants/time.  
- _Predictions:_ Basic summaries will inform data quality and prevalence of intense windows across participants/time.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
# Examples: participant counts, days per participant, distribution of steps
summary = {
    "n_participants_daily": daily["Id"].nunique() if "Id" in daily else None,
    "n_participants_hourly": hourly["Id"].nunique() if "Id" in hourly else None,
    "n_participants_minute": minute_wide["Id"].nunique() if "Id" in minute_wide else None,
}
summary

**Observation & Interpretation**  
- _What changed / what do the results show?_ Initial counts help confirm expectations and data coverage.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Visualisation: Examples (Matplotlib Only)

**Assumptions & Predictions**  
- _Assumptions:_ All figures will use a shared style for coherence across the notebook.  
- _Predictions:_ All figures will use a shared style for coherence across the notebook.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
# Example placeholder plot (replace with meaningful plots aligned to the driving problem)
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot([0, 1, 2], [0, 1, 0])
ax.set_title("Placeholder Plot — Replace with analysis-driven visual")
ax.set_xlabel("X")
ax.set_ylabel("Y")
plt.show()

**Observation & Interpretation**  
- _What changed / what do the results show?_ Figures will have clear titles, axis labels, and legend (if needed) with concise textual guidance nearby.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Synthesis: Answer the Driving Problem

**Assumptions & Predictions**  
- _Assumptions:_ We combine evidence from the transformations and visuals to reach a concise, defensible answer.  
- _Predictions:_ We combine evidence from the transformations and visuals to reach a concise, defensible answer.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
# TODO: assemble metrics that directly support the answer, e.g. fraction of participants achieving >=15 min intense in morning/afternoon/evening
result = {
    "morning_achievers": None,
    "afternoon_achievers": None,
    "evening_achievers": None,
    "notes": "Fill with computed proportions / counts"
}
result

**Observation & Interpretation**  
- _What changed / what do the results show?_ Interpret quantities in clear language; connect to health relevance and practical implications.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Reporting: Tables, Algebra, Significant Figures

**Assumptions & Predictions**  
- _Assumptions:_ Results will be presented with appropriate significant figures and supporting algebra where necessary.  
- _Predictions:_ Results will be presented with appropriate significant figures and supporting algebra where necessary.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
# TODO: create a small summary table for the report section
report_table = pd.DataFrame([
    {"time_of_day": "morning", "prop_achieving_15min": None},
    {"time_of_day": "afternoon", "prop_achieving_15min": None},
    {"time_of_day": "evening", "prop_achieving_15min": None},
])
report_table

**Observation & Interpretation**  
- _What changed / what do the results show?_ Tables concisely support the narrative and match what the reader should 'see' in the figures.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

## Limitations & Sensitivity Checks

**Assumptions & Predictions**  
- _Assumptions:_ We will test sensitivity to thresholds (e.g., steps/min) and window definitions.  
- _Predictions:_ We will test sensitivity to thresholds (e.g., steps/min) and window definitions.

**Authorship & GAI use**  
- Author(s): _[Full Name(s)]_  
- Pair Programming: _[Name(s) if applicable]_  
- GAI/External Help: _[Describe prompts, tools, and assistance used; include date]_

In [None]:
# TODO: sketch sensitivity analysis hooks (e.g., vary INTENSE_STEPS_PER_MIN; vary window lengths)
# Example:
for th in [90, 100, 110]:
    pass  # compute and compare key outcomes
print("Sensitivity hooks ready (implement).")

**Observation & Interpretation**  
- _What changed / what do the results show?_ Document how conclusions change (or not) under reasonable parameter variations.  
- _Implications for driving problem:_ [Explain how this step moves us toward answering the question.]

# Final Commentary & Next Steps

- **Final Answer (draft placeholder):** _[Concise sentence answering the driving problem]._  
- **Implications:** _[Briefly relate to health guidelines/stakeholders]._  
- **Reproducibility:** _[Mention data path `data/`, versions, and deterministic steps]._  
- **Hand-off:** _[Notes for Week 12 think-aloud and Week 13 demo]._