# Seasonal Patterns in Daily Step Counts from iPhone Health Data

Do I tend to walk more during warmer months compared to colder months, and how large are those seasonal differences in my overall activity?

## Data Collection

For this project, I am using my own step-count data exported from the Apple Health app on my iPhone. In the Health app, I went to my profile and chose **Export All Health Data**, which generated a compressed `export.zip` file. Inside that archive is an `health.xml` file that contains all of my recorded health metrics in XML format.

In this project I focus specifically on records of type `HKQuantityTypeIdentifierStepCount`, which represent step counts over specific time intervals.


## Data Structure

The raw data is stored in an XML file with a root `<HealthData>` element. 

Important attributes for my analysis are:

- `type`: the kind of record. I filter for `HKQuantityTypeIdentifierStepCount`.
- `value`: the number of steps in that time interval (in `count` units).
- `startDate` and `endDate`: timestamps for when those steps occurred.
- `creationDate`: when the record was written.

In [3]:
import xml.etree.ElementTree as ET
import pandas as pd

# Parse the XML file
tree = ET.parse("/Users/chris/Desktop/health.xml")
root = tree.getroot()

records = []
for record in root.findall("Record"):
    if record.get("type") == "HKQuantityTypeIdentifierStepCount":
        records.append({
            "type": record.get("type"),
            "unit": record.get("unit"),
            "creationDate": record.get("creationDate"),
            "startDate": record.get("startDate"),
            "endDate": record.get("endDate"),
            "value": int(record.get("value"))
        })

steps_df = pd.DataFrame(records)

# Convert dates to datetime
steps_df["startDate"] = pd.to_datetime(steps_df["startDate"])
steps_df["endDate"] = pd.to_datetime(steps_df["endDate"])
steps_df["creationDate"] = pd.to_datetime(steps_df["creationDate"])

# Add a calendar date column
steps_df["date"] = steps_df["startDate"].dt.date

steps_df.head()

Unnamed: 0,type,unit,creationDate,startDate,endDate,value,date
0,HKQuantityTypeIdentifierStepCount,count,2025-02-15 16:12:46-05:00,2025-02-15 16:02:38-05:00,2025-02-15 16:04:15-05:00,133,2025-02-15
1,HKQuantityTypeIdentifierStepCount,count,2025-02-15 16:25:47-05:00,2025-02-15 16:14:44-05:00,2025-02-15 16:21:12-05:00,151,2025-02-15
2,HKQuantityTypeIdentifierStepCount,count,2025-02-15 16:38:48-05:00,2025-02-15 16:25:39-05:00,2025-02-15 16:25:49-05:00,4,2025-02-15
3,HKQuantityTypeIdentifierStepCount,count,2025-02-15 16:50:30-05:00,2025-02-15 16:39:27-05:00,2025-02-15 16:41:19-05:00,216,2025-02-15
4,HKQuantityTypeIdentifierStepCount,count,2025-02-15 17:17:56-05:00,2025-02-15 17:07:36-05:00,2025-02-15 17:15:55-05:00,87,2025-02-15


In [4]:
print("Number of step records:", len(steps_df))
print("Date range:", steps_df["startDate"].min(), "→", steps_df["startDate"].max())

daily_steps = (
    steps_df
    .groupby("date", as_index=False)["value"]
    .sum()
    .rename(columns={"value": "total_steps"})
)

daily_steps.head(50)

Number of step records: 14263
Date range: 2025-02-15 16:02:38-05:00 → 2025-11-16 16:13:03-05:00


Unnamed: 0,date,total_steps
0,2025-02-15,853
1,2025-02-16,7878
2,2025-02-17,16250
3,2025-02-18,29036
4,2025-02-19,15114
5,2025-02-20,26121
6,2025-02-21,14209
7,2025-02-22,20115
8,2025-02-23,8464
9,2025-02-24,21114


## Data Structure

Right now for the sake of this checkpoint, this table is only showing 50 entries which span over only two months. This is just for the sake of showing how I am going to organize the data. Of course later, I will find a way to organize the data by season instead of dates. Please let me know if you need me to figure that out now, as I'm not exactly sure the scope of what this checkpoint entailed.

I am running into a bottleneck as the data collection from my health file only seemed to begin on February 15th of this year, and ended in mid November of this year (when I exported it).

As it seems the iPhone Health app export did not take all of my step history (it goes back 5+ years in my actual iPhone), I am either going to need to find another way to export the data, or change the scope of my project. It seems there are other third party ways to export my data, which may be the answer to my question and what I may need to explore this week. On the other hand if that does not work, I may need to change the scope of my question (with professor approval of course).

## Analysis Plan

The goal of this project is to determine whether my daily step counts show meaningful seasonal patterns. Because step data is recorded in irregular time intervals within the iPhone Health export, my analysis begins by collecting these entries into daily totals. This gives a consistent unit of measurement (steps per day) that can be compared across the entire date range.

To examine seasonal differences, I will create additional variables such as `month` and `season`. This allows grouping data into seasons (Winter, Spring, Summer, Fall) or into individual months. Visualizing average steps by month or season will help reveal whether physical activity is higher in certain parts of the year.

My analysis will include several key components:

1. **Daily Aggregation**  
   I have already consolidated the step-counts for each calendar day. This provides a dataset of daily totals that cleans up the irregular intervals found in the raw XML.

2. **Monthly and Seasonal Summaries**  
   I plan to compute average daily steps for each month and each season. These summaries will help determine whether patterns align with weather or lifestyle differences across the year.

3. **Visualizations**  
   I will generate line plots for daily totals over time, bar charts or boxplots for month-to-month comparisons, and seasonal averages.

4. **Starting Attempts at Analysis**  
   - Parsed the `export.xml` file into a structured DataFrame  
   - Converted timestamps into usable datetime formats  
   - Aggregated steps by day  
   - Displayed a preview of the daily steps table  

   My next steps will be grouping by month and season and generating the first visualizations to explore whether seasonal variation is present.

In [5]:
daily_steps

Unnamed: 0,date,total_steps
0,2025-02-15,853
1,2025-02-16,7878
2,2025-02-17,16250
3,2025-02-18,29036
4,2025-02-19,15114
...,...,...
270,2025-11-12,10999
271,2025-11-13,6408
272,2025-11-14,10124
273,2025-11-15,11429
