# EDA

This notebook is to be used to open the XML files present in an Apple Health export ZIP and conduct some initial Exploratory Data Analysis (EDA).

In [None]:
import os

In [None]:
APPLE_HEALTH_DIR = os.path.join(os.path.dirname(os.getcwd()), "apple_health")

print(
    f"Using {APPLE_HEALTH_DIR} as Apple Health directory (directory exists? {os.path.isdir(APPLE_HEALTH_DIR)})"
)
print("Contents of Apple Health directory:")
for root, _, files in os.walk(APPLE_HEALTH_DIR):
    for name in files:
        print(os.path.join(root, name))

## Overview of files

In my export, I see the following files:

* `export.xml`: Large file containing all data as [Apple HealthKit](https://developer.apple.com/documentation/healthkit) records
* `export_cda.xml`: Large file containing all data as [Clinical Document Architecture](https://en.wikipedia.org/wiki/Clinical_Document_Architecture) records
* `workout-routes/route_YYYY-MM-DD_hh.mm{a,p}m.gpx`: [GPS Exchange Format](https://en.wikipedia.org/wiki/GPS_Exchange_Format) files of workouts
* `clinical-records/{Observation, DiagnosticReport, and more}-{UUID string}.json`: JSON files from clinical visits
* `electrocardiograms/ecg_YYYY-MM-DD{_1,2,etc.}.csv`: CSV files describing ECGs taken on Apple Watch

These may not be present in your export, and there may even be other file types that I don't have in mine.

### `export.xml`

Let's start with this file and see what information it contains:

In [None]:
import xml.etree.ElementTree as ET

EXPORT = os.path.join(APPLE_HEALTH_DIR, "export.xml")
if not os.path.isfile(EXPORT):
    raise FileNotFoundError(f"Did not find export.xml in {APPLE_HEALTH_DIR}")

print(f"Parsing {EXPORT} -- this may take a while")
export_tree = ET.parse(EXPORT)
print("Done parsing")

export_root = export_tree.getroot()
if export_root.tag != "HealthData":
    raise ValueError(f"Unable to find HealthData tag in {EXPORT}")

#### Exploring `HealthData` contents

There are many records here under the `HealthData` root corresponding to [HealthKit identifiers](https://developer.apple.com/documentation/healthkit?language=objc). Let's dig in and explore which ones are present!

In [None]:
from collections import defaultdict

tag_type_to_count = defaultdict(int)
for elem in export_root:
    tag_type = f"{elem.tag} tag, type {elem.attrib.get('type')}"
    tag_type_to_count[tag_type] += 1

print(f"{len(tag_type_to_count)} tag types found in HealthData:")
for tag_type in sorted(tag_type_to_count.keys(), key=lambda k: -tag_type_to_count[k]):
    print(f"- {tag_type}: {tag_type_to_count[tag_type]} tags")

##### Record tags: HKQuantityTypeIdentifier and HKCategoryTypeIdentifier

Most records seem to be subclasses of these, which we can visualize in several ways at various timescales (weeks, days, hours, minutes, etc.)

* Scatter plots
* Line plots
* Box-and-whisker plots
* ...and many more

Let's leave specifics for individual notebooks.

There is also a single Record tag with HKDataTypeSleepDurationGoal, but it has only two entries and isn't particularly interesting.

##### ActivitySummary tags

These are daily tags showing the goals set for [Apple Watch rings](https://www.apple.com/watch/close-your-rings/) (Move, Exercise and Stand) and the actual values achieved by day. Probably not particularly interesting either, unless compared to other quantities over time like body weight, etc.

##### Workout tags

These have detailed information on each workout performed, including source (e.g., third-party apps, Apple Watch, etc.), calories burned, distance, and links out to workout routes present elsewhere in the `apple_health/` export.

##### ClinicalRecord tags

These are records from clinical visits, including Observations, DiagnosticReports, and more. I don't personally have a ton of these and they are pretty coarse-grained, so probably won't look much deeper into these.

##### Audiogram tags

I only have two of these, both from AirPods Pro hearing tests. Also not terribly interesting to look at.

##### ExportDate tag and Me tag

The day the Apple Health export was made and metadata on myself... pretty sure I already know these lol, so definitely not following up on.

#### Conclusion

The `export.xml` file has a lot of interesting data in it, mainly in the form of various quantities recorded by different biometric sensors, activity summaries from Apple Watch, and workout metadata, but also some less interesting or relevant data. **Will take a look at these in depth in other notebooks.**