# Exercise 14: Polar Training Data

The goal of this Notebook is to show:

1. Conceptual data model  
2. Logical data model  
3. Physical schema  
4. Basic workflow (DAG)  
5. Data quality considerations

## Conceptual Data Model (Exercise 14A)

User → Training Session → Time-series Samples

- **User**: demographics and physiological metrics  
- **Training Session**: summary of an exercise  
- **Samples**: altitude, heart rate, speed

## Logical Data Model (Exercise 14A)

### Tables and fields

**users**
- user_id
- sex
- birthday
- height_cm
- weight_kg
- vo2_max
- max_hr
- resting_hr

**training_sessions**
- session_id
- user_id
- start_time
- stop_time
- duration_sec
- distance_m
- calories
- sport
- avg_hr
- max_hr

**samples_altitude**
- id
- session_id
- timestamp
- altitude

---

## Physical Data Model (SQLite schema, Exercise 14A)

```sql
CREATE TABLE users (...);
CREATE TABLE training_sessions (...);
CREATE TABLE samples_altitude (...);
```

## DAG Workflow Plan (Exercise 14B)

A simple workflow for processing the Polar JSON data could be:

1. Extract raw JSON files  
2. Validate required fields  
3. Transform into users, sessions, and sample tables  
4. Load tables into a database

### **Simplified DAG Structure**
extract -> validate -> transform -> load

## Data Quality Measures (Exercise 14C)

To ensure consistent data, the following checks should be applied:

- **Schema checks:** required fields such as exercises, startTime, duration, and samples must exist.  
- **Type/range checks:** heart rate should be reasonable (e.g., 30–250 bpm), distance ≥ 0, altitude numeric.  
- **Completeness:** sessions should contain at least one sample.  
- **Referential integrity:** every session must match an existing user.  
- **Time consistency:** start_time < stop_time and sample timestamps increasing.  
- **Duplicates:** avoid duplicate session IDs or duplicate sample timestamps.